Efficient GPU Resource Management for ML Workloads using SkyPilot, Kueue on GKE
This tutorial expands on the SkyPilot Tutorial by leveraging Dynamic Workload Scheduler with the help of an open-source project called Kueue Different from the SkyPilot tutorial, this guide shows how to use SkyPilot with Kueue on GKE to efficiently manage ML workloads with dynamic GPU provisioning.
Overview
This tutorial is designed for ML Platform engineers who plan to use SkyPilot to train or serve LLM models on Google Kubernetes Engine (GKE) while utilizing Dynamic Workload Scheduler (DWS) to acquire GPU resources as they become available. It covers installing Kueue and Skypilot, creating a GKE cluster with queue processing enabled GPU node pools, and deploying and running a LLM model. This setup enhances resource efficiency and reduces cost for ML workloads through dynamic GPU provisioning.
Before you begin
- Ensure you have a gcp project with billing enabled and the GKE API activated.
Learn how to enable billing and activate the GKE API.
You can use
gcloud
cli to activate GKE API.
gcloud services enable container.googleapis.com
- Ensure you have the following tools installed on your workstation
Setting up your GKE cluster with Terraform
Weβll use Terraform to provision:
- Create your environment configuration(.tfvar) file and edit based on example_environment.tfvars.
project_id = "skypilot-project" cluster_name = "skypilot-tutorial" autopilot_cluster = true # Set to false for Standard cluster
- (Optional) For Standard clusters: Configure GPU node pools in example_environment.tfvars by uncommenting and adjusting the gpu_pools block as needed.
gpu_pools = [ { name = "gpu-pool" queued_provisioning = true machine_type = "g2-standard-24" disk_type = "pd-balanced" autoscaling = true min_count = 0 max_count = 3 initial_node_count = 0 } ]
Deployment
- Initialize the modules
terraform init
- Apply while referencing the
.tfvar
file we createdAnd you should see your resources created:terraform apply -var-file=your_environment.tfvar
Apply complete! Resources: 24 added, 0 changed, 0 destroyed. Outputs: gke_cluster_location = "us-central1" gke_cluster_name = "skypilot-tutorial" kubernetes_namespace = "ai-on-gke" project_id = "skypilot-project" service_account = "tf-gke-skypilot-tutorial@skypilot-project.iam.gserviceaccount.com"
- Get kubernetes access
gcloud container clusters get-credentials $(terraform output -raw gke_cluster_name) --region $ (terraform output -raw gke_cluster_location) --project $(terraform output -raw project_id)
- Verify your GKE cluster’s version, run:
kubectl version
Make sure you meet the minimum version requirements (1.30.3-gke.1451000 or later for Autopilot, 1.28.3-gke.1098000 or later for Standard)
Server Version: v1.30.6-gke.1596000
If not, you can change the version in Terraform with the kubectl_version
variable
Install and configure Kueue
- Install Kueue from the official manifest.
Note that--server-side
switch . Without it the client cannot render the CRDs because of annotation size limitations. For more configuration options visit Kueue’s installation guide.VERSION=v0.10.2 kubectl apply --server-side -f https://github.com/kubernetes-sigs/kueue/releases/download/$VERSION/ manifests.yaml
- Configure Kueue for pod provisioning by patching the Kueue configmap.
# Extract and patch the config # This is required because SkyPilot creates and manages workloads as pods kubectl -n kueue-system get cm kueue-manager-config -o jsonpath={.data.controller_manager_config\\. yaml} | yq '.integrations.frameworks += ["pod"]' > /tmp/kueueconfig.yaml
- Apply the changes
kubectl -n kueue-system create cm kueue-manager-config --from_file=controller_manager_config.yaml=/ tmp/kueueconfig.yaml --dry-run=client -o yaml | kubectl -n kueue-system apply -f -
- Restart the kueue-controller-manager pod with the following command
kubectl -n kueue-system rollout restart deployment kueue-controller-manager # Wait for the restart to complete kubectl -n kueue-system rollout status deployment kueue-controller-manager
- Install Kueue resources using the provided
kueue_resources.yaml
.kubectl apply -f kueue_resources.yaml
Kueue should be up and running now.
Install SkyPilot
- Create a python virtual environment.
cd ~ python -m venv skypilot-test cd skypilot-test source bin/activate
- Install SkyPilot
pip install -U "skypilot[kubernetes]" # Verify the installation sky -v
- Find the context names
kubectl config get-contexts # Find the context name, for example: # gke_${PROJECT_NAME}_us-central1-c_demo-us-central1
Create SkyPilot configuration. Add autoscaler: gke
to enable SkyPilot to work with GKE’s cluster autoscaling capabilities, allowing you to run workloads without pre-provisioned GPU nodes.
# Create and edit ~/.sky/config.yaml
# Change PROJECT_NAME, LOCATION and CLUSTER_NAME
allowed_clouds:
- kubernetes
kubernetes:
# Use the context's name
allowed_contexts:
- gke_${PROJECT_NAME}_${LOCATION}_${CLUSTER_NAME}
autoscaler: gke
And verify again:
bash sky check
And you should the the following output
```
Kubernetes: enabled
To enable a cloud, follow the hints above and rerun: sky check
If any problems remain, refer to detailed docs at: https://skypilot.readthedocs.io/en/latest/ getting-started/installation.html
Note: The following clouds were disabled because they were not included in allowed_clouds in ~/. sky/config.yaml: GCP, AWS, Azure, Cudo, Fluidstack, IBM, Lambda, OCI, Paperspace, RunPod, SCP, vSphere, Cloudflare (for R2 object store)
π Enabled clouds π
β Kubernetes
```
Configure and Run SkyPilot Job
For SkyPilot to create pods with the necessary pod config we need to add the following config to train_dws.yaml
.
experimental:
config_overrides:
kubernetes:
pod_config:
metadata:
annotations:
provreq.kueue.x-k8s.io/maxRunDurationSeconds: "3600"
provision_timeout: 900
And labels config to the resources section
labels:
kueue.x-k8s.io/queue-name: dws-local-queue
Launch the workload
sky launch -c skypilot-dws train_dws.yaml
SkyPilot will wait in Launching state until the node is provisioned.
βοΈ Launching on Kubernetes.
In another terminal, you can kubectl get pods
and it will be in SchedulingGated state
NAME READY STATUS RESTARTS AGE
skypilot-dws-00b5-head 0/1 SchedulingGated 0 44s
If you run kubectl describe provisioningrequests
you can see in the Conditions: what is happening with the request.
Conditions:
Last Transition Time: 2024-12-20T11:40:46Z
Message: Provisioning Request was successfully queued.
Observed Generation: 1
Reason: SuccessfullyQueued
Status: True
Type: Accepted
Last Transition Time: 2024-12-20T11:40:47Z
Message: Waiting for resources. Currently there are not enough resources available to fulfill the request.
Observed Generation: 1
Reason: ResourcePoolExhausted
Status: False
Type: Provisioned
When the requested resource is availaible the provisioningrequest
will reflect that in the Conditions:
Last Transition Time: 2024-12-20T11:42:55Z
Message: Provisioning Request was successfully provisioned.
Observed Generation: 1
Reason: Provisioned
Status: True
Type: Provisioned
Now the workload will be running
NAME READY STATUS RESTARTS AGE
skypilot-dws-00b5-head 1/1 Running 0 4m49s
And later finished
β Job finished (status: SUCCEEDED).
π Useful Commands
Job ID: 1
βββ To cancel the job: sky cancel skypilot-dws 1
βββ To stream job logs: sky logs skypilot-dws 1
βββ To view job queue: sky queue skypilot-dws
Cluster name: skypilot-dws
βββ To log into the head VM: ssh skypilot-dws
βββ To submit a job: sky exec skypilot-dws yaml_file
βββ To stop the cluster: sky stop skypilot-dws
βββ To teardown the cluster: sky down skypilot-dws
You can now ssh into the pod, run different workloads and experiment.
Fine-tune and Serve Gemma 2B on GKE
This section details how to fine-tune Gemma 2B for SQL generation on GKE Autopilot using SkyPilot. Model artifacts stored in Google Cloud Storage (GCS) bucket and shared across pods using gcsfuse. The workflow separates training and serving into distinct pods, managed through finetune.yaml and serve.yaml. We’ll use two SkyPilot commands for this workflow:
sky launch
: For running the fine-tuning jobsky serve
: For deploying the model as a persistent service
Prerequisites
- A GKE cluster configured with SkyPilot
- HuggingFace account with access to Gemma model
Fine-tuning Implementation
The finetune.py script uses QLoRA with 4-bit quantization to fine-tune Gemma 2B on SQL generation tasks.
Configure GCS Storage Access
The infrastructure Terraform configuration in main.tf includes Workload Identity and GCS bucket setup:
module "skypilot-workload-identity" {
source = "terraform-google-modules/kubernetes-engine/google//modules/ workload-identity"
name = "skypilot-service-account"
namespace = "default"
project_id = var.project_id
roles = ["roles/storage.admin", "roles/compute.admin"]
cluster_name = module.infra[0].cluster_name
location = var.cluster_location
use_existing_gcp_sa = true
gcp_sa_name = data.google_service_account.gke_service_account.email
use_existing_k8s_sa = true
annotate_k8s_sa = false
}
- Get project and service account details
terraform output project_id terraform output service_account
- Configure Workload Identity
Run additional commands to connect the Google Cloud Service Account that was created with Terraform with Identity Federation enabled to be able to use gcsfuse.This will create policy binding that will allow the kubernetes service account to impersonate the google service account. Also note thatgcloud iam service-accounts add-iam-policy-binding SERVICE_ACCOUNT \ --role roles/iam.workloadIdentityUser \ --member "serviceAccount:PROJECT_ID.svc.id.goog[default/skypilot-service-account]"
[default/skypilot-service-account]
is the kubernetes namespace and service account name that is deployed by SkyPilot by default. Change if you specifically changed SkyPilot configuration or used another namespace. - Annotate Kubernetes service account
kubectl annotate serviceaccount skypilot-service-account --namespace default iam.gke.io/ gcp-service-account=SERVICE_ACCOUNT
- Get the bucket name
terraform output model_bucket_name
- Update gcsfuse configuration in
finetune.yaml
andserve.yaml
Replace the BUCKET_NAME
Fine-tune the Model
-
Set up HuggingFace access: Finetune script needs a HuggingFace token and to sign the licence consent agreement.
Follow instructions on the following link: Get access to the modelexport HF_TOKEN=tokenvalue
-
Launch a fine-tuning job:
sky launch -c finetune finetune.yaml --retry-until-up --env HF_TOKEN=$HF_TOKEN
After finetuning is finished you should see the following output
(gemma-finetune, pid=1837) 100%|ββββββββββ| 5000/5000 [12:49<00:00, 6.50it/s]00 [12:49<00:00, 6.81it/s] (gemma-finetune, pid=1837) /home/sky/miniconda3/lib/python3.10/site-packages/huggingface_hub/ file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. (gemma-finetune, pid=1837) warnings.warn( (gemma-finetune, pid=1837) (gemma-finetune, pid=1837) Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] Loading checkpoint shards: 100%|ββββββββββ| 2/2 [00:07<00:00, 3.93s/it]/2 [00:07<00:07, 7.61s/it] β Job finished (status: SUCCEEDED).
Serve the Model
Next, run the finetuned model with the serve.yaml
and serve cli
sky serve up serve.yaml
When the serve pods are provisioned you should see the following output
```
βοΈ Launching serve controller on Kubernetes.
βββ Pod is up.
β Cluster launched: sky-serve-controller-00b550a3. View logs at: ~/sky_logs/ sky-2025-01-08-19-25-40-242969/provision.log
βοΈ Mounting files.
Syncing (to 1 node): /tmp/service-task-sky-service-7e46-jyxqvkgh -> ~/.sky/serve/ sky_service_7e46/task.yaml.tmp
Syncing (to 1 node): /tmp/tmpd_sj9qpw -> ~/.sky/serve/sky_service_7e46/config.yaml
β Files synced. View logs at: ~/sky_logs/sky-2025-01-08-19-25-40-242969/file_mounts.log
βοΈ Running setup on serve controller.
Check & install cloud dependencies on controller: done.
β Setup completed. View logs at: ~/sky_logs/sky-2025-01-08-19-25-40-242969/setup-*.log
βοΈ Service registered.
Service name: sky-service-7e46
Endpoint URL: 35.226.190.154:30002
π Useful Commands
βββ To check service status: sky serve status sky-service-7e46 [--endpoint]
βββ To teardown the service: sky serve down sky-service-7e46
βββ To see replica logs: sky serve logs sky-service-7e46 [REPLICA_ID]
βββ To see load balancer logs: sky serve logs --load-balancer sky-service-7e46
βββ To see controller logs: sky serve logs --controller sky-service-7e46
βββ To monitor the status: watch -n10 sky serve status sky-service-7e46
βββ To send a test request: curl 35.226.190.154:30002
β Service is spinning up and replicas will be ready shortly.
Check if the serving api is ready by running
sky status
And wait for the PROVISIONING
status to appear READY
Services
NAME VERSION UPTIME STATUS REPLICAS ENDPOINT
sky-service-7e46 - - NO_REPLICA 0/1 35.226.190.154:30002
Service Replicas
SERVICE_NAME ID VERSION ENDPOINT LAUNCHED RESOURCES STATUS REGION
sky-service-7e46 1 1 - 1 min ago 1x Kubernetes({'A100': 1}) PROVISIONING gke_skypilot_project_us-central1_-skypilot-test
After that take the url from the ENDPOINT
and use curl to prompt the served model
curl -X POST http://SKYPILOT_ADDRESS/generate \
-H "Content-Type: application/json" \
-d '{ "prompt": "Question: What is the total number of attendees with age over 30 at kubecon eu? Context: CREATE TABLE attendees (name VARCHAR, age INTEGER, kubecon VARCHAR) Answer:","top_p": 1.0, "temperature": 0 , "max_tokens":128 }' \
| jq
And you should see the reply
Answer: SELECT COUNT(name) FROM attendees WHERE age > 30 AND kubecon = \"kubecon eu\"\
Cleanup
- Remove the skypilot cluster and serve endpoints:
sky down skypilot-dws sky down test-finetune sky serve down --all
- Finally destory the provisioned infrastructure.
terraform destroy -var-file=your_environment.tfvar
Troubleshooting
-
If Kueue install gives the error:
the CustomResourceDefinition "workloads.kueue.x-k8s.io" is invalid: metadata.annotations: Too long: must have at most 262144 bytes
Make sure you include the
--server-side
argument to thekubectl apply
command when installing Kueue. Delete it first if repeating the step -
If you get an error with the kueue-webhook-service.
Error from server (InternalError): error when creating "kueue_resources.yaml": Internal error occurred: failed calling webhook "mresourceflavor.kb.io": failed to call webhook: Post "https:// kueue-webhook-service.kueue-system.svc:443/mutate-kueue-x-k8s-io-v1beta1-resourceflavor? timeout=10s": no endpoints available for service "kueue-webhook-service"
Wait for endpoints for the kueue-webhook-service to be populated with the kubectl wait command
kubectl -n kueue-system wait endpoints/kueue-webhook-service --for=jsonpath={.subsets}
-
If SkyPilot refuses to start the cluster because there is no nodes that would satify the requirement for GPU
Task from YAML spec: train_dws.yaml No resource satisfying Kubernetes({'L4': 1}) on Kubernetes. sky.exceptions.ResourcesUnavailableError: Kubernetes cluster does not contain any instances satisfying the request: 1x Kubernetes({'L4': 1}). To fix: relax or change the resource requirements. Hint: `sky show-gpus --cloud kubernetes` to list available accelerators. `sky check` to check the enabled clouds.
Make sure you added
autoscaling: gke
to the sky config in step Install SkyPilot -
Permission denied when trying to write to the mounted gcsfuse volume.
Make sure you added
uid=1000,gid=1000
to themountOptions:
YAML inside of the task yaml file. SkyPilot by default uses 1000 gid and uidvolumes: - name: gcsfuse-test csi: driver: gcsfuse.csi.storage.gke.io volumeAttributes: bucketName: MODEL_BUCKET_NAME mountOptions: "implicit-dirs,uid=1000,gid=1000"
-
Denied by autogke-gpu-limitation
When running
sky serve
on Autopilot cluster GKE Warden rejects the pods"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"admission webhook \"warden-validating.common-webhooks.networking.gke.io\" denied the request: GKE Warden rejected the request because it violates one or more constraints.\nViolations details: {\"[denied by autogke-gpu-limitation]\":[\"The toleration with key 'nvidia.com/gpu' and operator 'Exists' cannot be specified if the pod does not request to use GPU in Autopilot.\"
Update SkyPilot to version 0.8.0 and above.
Feedback
Was this page helpful?
Thank you for your feedback.
Thank you for your feedback.