Fine-Tuning ESM2 LLM on GKE using BioNeMo Framework 2.0

This sample walks through setting up a Google Cloud GKE environment to fine-tune ESM2 (Evolutionary Scale Modeling) using NVIDIA BioNeMo Framework 2.0

Table of Contents

Prerequisites

Note: Google Cloud shell is recommended to run this sample.

You can skip this step, if done previously.

git clone https://github.com/ai-on-gke/nvidia-ai-solutions
cd nvidia-ai-solutions/bionemo

Setup

  1. Set Project:
gcloud config set project "your-project-id"

Replace “your-project-id” with your actual project ID.

  1. Set Environment Variables:
export PROJECT_ID="your-project-id"
export PUBLIC_REPOSITORY=$PROJECT_ID
export REGION=us-central1
export ZONE=us-central1-a
export CLUSTER_NAME=bionemo-demo
export NODE_POOL_MACHINE_TYPE=g2-standard-24 #e.g., a2-highgpu-1g (A100 40GB) or a2-ultragpu-1g (A100 80GB)
export CLUSTER_MACHINE_TYPE=e2-standard-2
export GPU_TYPE=nvidia-l4 # e.g., nvidia-tesla-a100 for A100 40GB OR nvidia-a100-80gb (A100 80GB)
export GPU_COUNT=2 # e.g., 1 (A100)

export NETWORK_NAME="default"

Adjust the zone, machine type, accelerator type, count, and number of nodes as per your requirements. Refer to Google Cloud documentation for available options. Consider smaller machine types for development to manage costs.

NOTE: Skip steps 3-5 if you are reusing the same GKE cluster from pretraining.

  1. Enable the Filestore API
gcloud services enable file.googleapis.com
  1. Create GKE Cluster
gcloud container clusters create ${CLUSTER_NAME} \
    --location=${ZONE} \
    --network=${NETWORK_NAME} \
    --addons=GcpFilestoreCsiDriver \
    --machine-type=${CLUSTER_MACHINE_TYPE} \
    --num-nodes=1 \
    --workload-pool=${PROJECT_ID}.svc.id.goog
  1. Create GPU Node Pool:
gcloud container node-pools create gpupool \
    --location=${ZONE} \
    --cluster=${CLUSTER_NAME} \
    --machine-type=${NODE_POOL_MACHINE_TYPE} \
    --num-nodes=1 \
    --accelerator type=${GPU_TYPE},count=${GPU_COUNT},gpu-driver-version=latest

This creates a node pool specifically for GPU workloads.

  1. Get Cluster Credentials:
gcloud container clusters get-credentials "${CLUSTER_NAME}" \
  --location="${ZONE}"
  1. Create an Artifact Registry to store container images
gcloud artifacts repositories create ${PUBLIC_REPOSITORY} \
  --repository-format=docker --location=${REGION}
  1. Create service account to allow GKE to pull images
gcloud iam service-accounts create esm2-inference-gsa
  1. Create namespace, training job, tensorboard microservice, and mount Google cloud Filestore for storage
alias k=kubectl

k create namespace bionemo-training

k create serviceaccount esm2-inference-sa -n bionemo-training
  1. Create identity binding

This is needed to allow the GKE POD to pull the custom image from the artifact registry we just created in a previous step

gcloud iam service-accounts add-iam-policy-binding esm2-inference-gsa@${PROJECT_ID}.iam.gserviceaccount.com --role="roles/iam.workloadIdentityUser" --member="serviceAccount:${PROJECT_ID}.svc.id.goog[bionemo-training/esm2-inference-sa]"
k annotate serviceaccount esm2-inference-sa -n bionemo-training iam.gke.io/gcp-service-account=esm2-inference-gsa@$PROJECT_ID.iam.gserviceaccount.com

Note: This requires workload identity to be configured at the cluster level.

  1. Launch fine-tuning job by applying the kustomize file
k apply -k fine-tuning/job

Check job status by running:

k get job esm2-finetuning -n bionemo-training

You will need if the job has succeded once its status is Complete.

  1. build and push inference server docker image
docker build -t ${REGION}-docker.pkg.dev/${PROJECT_ID}/${PUBLIC_REPOSITORY}/esm2-inference:latest fine-tuning/inference/.

Authenticate to artifact registry:

gcloud auth configure-docker ${REGION}-docker.pkg.dev
docker push ${REGION}-docker.pkg.dev/${PROJECT_ID}/${PUBLIC_REPOSITORY}/esm2-inference:latest
  1. Launch inference deployment

ensure job status is Complete by running:

k get job esm2-finetuning -n bionemo-training

Ensure environment variables REGION, PROJECT_ID, and PUBLIC_REPOSITORY are fully set.

envsubst < fine-tuning/inference/kustomization.yaml | sponge fine-tuning/inference/kustomization.yaml
k apply -k fine-tuning/inference
  1. Port Forwarding (for inference):

List deployment PODs

k get pods -l app=esm2-inference -n bionemo-training

Wait for a few minutes for the inference POD to be in Running status, run:

k port-forward -n bionemo-training svc/esm2-inference 8080:80

in a separate shell window, run:

curl -X POST http://localhost:8080/predict \
  -H "Content-Type: application/json" \
  -d '{"sequence": "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG"}'

See here documentation reference.

Cleanup

To delete the cluster and all associated resources:

k delete namespace bionemo-training --cascade=background
gcloud container clusters delete "${CLUSTER_NAME}" --location="${ZONE}" --quiet
gcloud artifacts repositories delete ${PUBLIC_REPOSITORY} \
    --location=${REGION} \
    --quiet
gcloud iam service-accounts delete esm2-inference-gsa@${PROJECT_ID}.iam.gserviceaccount.com \
    --quiet
docker rmi ${REGION}-docker.pkg.dev/${PROJECT_ID}/${PUBLIC_REPOSITORY}/esm2-inference:latest