NIM on GKE
Before you begin
Before you proceed further, ensure you have the NVIDIA AI Enterprise License (NVAIE) to access the NIMs. To get started, go to build.nvidia.com and provide your company email address
-
Get access to NVIDIA NIMs
-
In the Google Cloud console, on the project selector page, select or create a new project with billing enabled
-
Ensure you have the following tools installed on your workstation
-
Enable the required APIs
gcloud services enable \ container.googleapis.com \ file.googleapis.com
Set up your GKE Cluster
-
Choose your region and set your project and machine variables:
export PROJECT_ID=$(gcloud config get project) export REGION=us-central1 export ZONE=${REGION?}-a
-
Create a GKE cluster:
gcloud container clusters create nim-demo --location ${REGION?} \ --workload-pool ${PROJECT_ID?}.svc.id.goog \ --enable-image-streaming \ --enable-ip-alias \ --node-locations ${ZONE?} \ --workload-pool=${PROJECT_ID?}.svc.id.goog \ --addons=GcpFilestoreCsiDriver \ --machine-type n2d-standard-4 \ --num-nodes 1 --min-nodes 1 --max-nodes 5 \ --ephemeral-storage-local-ssd=count=2 \ --labels=created-by=ai-on-gke,guide=nim-on-gke
-
Get cluster credentials
kubectl config set-cluster nim-demo
-
Create a nodepool
gcloud container node-pools create g2-standard-24 --cluster nim-demo \ --accelerator type=nvidia-l4,count=2,gpu-driver-version=latest \ --machine-type g2-standard-24 \ --ephemeral-storage-local-ssd=count=2 \ --enable-image-streaming \ --num-nodes=1 --min-nodes=1 --max-nodes=2 \ --node-locations $REGION-a,$REGION-b --region $REGION
Set Up Access to NVIDIA NIMs and prepare environment
If you have not set up NGC, see NGC Setup to get your access key and begin using NGC.
-
Get your NGC_API_KEY from NGC
export NGC_CLI_API_KEY="<YOUR_API_KEY>"
-
As a part of the NGC setup, set your configs
ngc config set
-
Ensure you have access to the repository by listing the models
ngc registry model list
-
Create a Kuberntes namespace
kubectl create namespace nim
Deploy a PVC to persist the model
This PVC will dynamically provision a PV with the necessary storage to persist model weights across replicas of your pods.
-
Create a PVC to persist the model weights - recommended for deployments with more than one (1) replica. Save the following yaml as
pvc.yaml
.apiVersion: v1 kind: PersistentVolumeClaim metadata: name: model-store-pvc namespace: nim spec: accessModes: - ReadWriteMany resources: requests: storage: 30Gi storageClassName: standard-rwx
-
Apply PVC
kubectl apply -f pvc.yaml
Deploy the NIM with the generated engine using a Helm chart
-
Clone the nim-deploy repository
git clone https://github.com/NVIDIA/nim-deploy.git cd nim-deploy/helm
-
Deploy chart with minimal configurations
helm --namespace nim install demo-nim nim-llm/ --set model.ngcAPIKey=$NGC_CLI_API_KEY --set persistence.enabled=true --set persistence.existingClaim=model-store-pvc
Test the NIM
Expect the demo-nim deployment to take a few minutes as the Llama3 model downloads.
-
Expose the service
kubectl port-forward --namespace nim services/demo-nim-nim-llm 8000
-
Send a test prompt
curl -X 'POST' \ 'http://localhost:8000/v1/chat/completions' \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{ "messages": [ { "content": "You are a polite and respectful poet.", "role": "system" }, { "content": "Write a limerick about the wonders of GPUs and Kubernetes?", "role": "user" } ], "model": "meta/llama3-8b-instruct", "max_tokens": 256, "top_p": 1, "n": 1, "stream": false, "frequency_penalty": 0.0 }' | jq '.choices[0].message.content' -
-
Browse the API by navigating to http://localhost:8000/docs
Clean up
Remove the cluster and deployment by runnign the following command:
gcloud container clusters delete l4-demo --location ${REGION}
Feedback
Was this page helpful?
Thank you for your feedback.
Thank you for your feedback.