Creating Inference Checkpoints
Overviews how to convert your inference checkpoint for various model servers
Overviews how to convert your inference checkpoint for various model servers
This tutorial guides you through fine-tuning the Gemma 3-1B-it language model on Google Kubernetes Engine (GKE) using L4 GPU, leveraging Parameter Efficient Fine Tuning (PEFT) and LoRA. It covers setting up a GKE cluster, containerizing the fine-tuning code, running the fine-tuning job, and uploading the resulting model to Hugging Face. Finally, it demonstrates how to deploy and interact with the fine-tuned model using vLLM on GKE.