Fine-Tuning ESM2 LLM on GKE using BioNeMo Framework 2.0
This sample walks through setting up a Google Cloud GKE environment to fine-tune ESM2 (Evolutionary Scale Modeling) using NVIDIA BioNeMo Framework 2.0
Table of Contents
Prerequisites
- GCloud SDK: Ensure you have the Google Cloud SDK installed and configured.
- Project: A Google Cloud project with billing enabled.
- Permissions: Sufficient permissions to create GKE clusters and other related resources.
- kubectl: kubectl command-line tool installed and configured.
- NVIDIA GPUs: One of the below GPUs should work
Note: Google Cloud shell is recommended to run this sample.
You can skip this step, if done previously.
git clone https://github.com/ai-on-gke/nvidia-ai-solutions
cd nvidia-ai-solutions/bionemo
Setup
- Set Project:
gcloud config set project "your-project-id"
Replace “your-project-id” with your actual project ID.
- Set Environment Variables:
export PROJECT_ID="your-project-id"
export PUBLIC_REPOSITORY=$PROJECT_ID
export REGION=us-central1
export ZONE=us-central1-a
export CLUSTER_NAME=bionemo-demo
export NODE_POOL_MACHINE_TYPE=g2-standard-24 #e.g., a2-highgpu-1g (A100 40GB) or a2-ultragpu-1g (A100 80GB)
export CLUSTER_MACHINE_TYPE=e2-standard-2
export GPU_TYPE=nvidia-l4 # e.g., nvidia-tesla-a100 for A100 40GB OR nvidia-a100-80gb (A100 80GB)
export GPU_COUNT=2 # e.g., 1 (A100)
export NETWORK_NAME="default"
Adjust the zone, machine type, accelerator type, count, and number of nodes as per your requirements. Refer to Google Cloud documentation for available options. Consider smaller machine types for development to manage costs.
NOTE: Skip steps 3-5 if you are reusing the same GKE cluster from pretraining.
- Enable the Filestore API
gcloud services enable file.googleapis.com
- Create GKE Cluster
gcloud container clusters create ${CLUSTER_NAME} \
--location=${ZONE} \
--network=${NETWORK_NAME} \
--addons=GcpFilestoreCsiDriver \
--machine-type=${CLUSTER_MACHINE_TYPE} \
--num-nodes=1 \
--workload-pool=${PROJECT_ID}.svc.id.goog
- Create GPU Node Pool:
gcloud container node-pools create gpupool \
--location=${ZONE} \
--cluster=${CLUSTER_NAME} \
--machine-type=${NODE_POOL_MACHINE_TYPE} \
--num-nodes=1 \
--accelerator type=${GPU_TYPE},count=${GPU_COUNT},gpu-driver-version=latest
This creates a node pool specifically for GPU workloads.
- Get Cluster Credentials:
gcloud container clusters get-credentials "${CLUSTER_NAME}" \
--location="${ZONE}"
- Create an Artifact Registry to store container images
gcloud artifacts repositories create ${PUBLIC_REPOSITORY} \
--repository-format=docker --location=${REGION}
- Create service account to allow GKE to pull images
gcloud iam service-accounts create esm2-inference-gsa
- Create namespace, training job, tensorboard microservice, and mount Google cloud Filestore for storage
alias k=kubectl
k create namespace bionemo-training
k create serviceaccount esm2-inference-sa -n bionemo-training
- Create identity binding
This is needed to allow the GKE POD to pull the custom image from the artifact registry we just created in a previous step
gcloud iam service-accounts add-iam-policy-binding esm2-inference-gsa@${PROJECT_ID}.iam.gserviceaccount.com --role="roles/iam.workloadIdentityUser" --member="serviceAccount:${PROJECT_ID}.svc.id.goog[bionemo-training/esm2-inference-sa]"
k annotate serviceaccount esm2-inference-sa -n bionemo-training iam.gke.io/gcp-service-account=esm2-inference-gsa@$PROJECT_ID.iam.gserviceaccount.com
Note: This requires workload identity to be configured at the cluster level.
- Launch fine-tuning job by applying the kustomize file
k apply -k fine-tuning/job
Check job status by running:
k get job esm2-finetuning -n bionemo-training
You will need if the job has succeded once its status is Complete
.
- build and push inference server docker image
docker build -t ${REGION}-docker.pkg.dev/${PROJECT_ID}/${PUBLIC_REPOSITORY}/esm2-inference:latest fine-tuning/inference/.
Authenticate to artifact registry:
gcloud auth configure-docker ${REGION}-docker.pkg.dev
docker push ${REGION}-docker.pkg.dev/${PROJECT_ID}/${PUBLIC_REPOSITORY}/esm2-inference:latest
- Launch inference deployment
ensure job status is Complete
by running:
k get job esm2-finetuning -n bionemo-training
Ensure environment variables REGION
, PROJECT_ID
, and PUBLIC_REPOSITORY
are fully set.
envsubst < fine-tuning/inference/kustomization.yaml | sponge fine-tuning/inference/kustomization.yaml
k apply -k fine-tuning/inference
- Port Forwarding (for inference):
List deployment PODs
k get pods -l app=esm2-inference -n bionemo-training
Wait for a few minutes for the inference POD to be in Running
status, run:
k port-forward -n bionemo-training svc/esm2-inference 8080:80
in a separate shell window, run:
curl -X POST http://localhost:8080/predict \
-H "Content-Type: application/json" \
-d '{"sequence": "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG"}'
See here documentation reference.
Cleanup
To delete the cluster and all associated resources:
k delete namespace bionemo-training --cascade=background
gcloud container clusters delete "${CLUSTER_NAME}" --location="${ZONE}" --quiet
gcloud artifacts repositories delete ${PUBLIC_REPOSITORY} \
--location=${REGION} \
--quiet
gcloud iam service-accounts delete esm2-inference-gsa@${PROJECT_ID}.iam.gserviceaccount.com \
--quiet
docker rmi ${REGION}-docker.pkg.dev/${PROJECT_ID}/${PUBLIC_REPOSITORY}/esm2-inference:latest