Tags:

Fine-Tuning ESM2 LLM on GKE using BioNeMo Framework 2.0

This sample walks through setting up a Google Cloud GKE environment to fine-tune ESM2 (Evolutionary Scale Modeling) using NVIDIA BioNeMo Framework 2.0

Prerequisites
Setup
Cleanup

Prerequisites

GCloud SDK: Ensure you have the Google Cloud SDK installed and configured.
Project: A Google Cloud project with billing enabled.
Permissions: Sufficient permissions to create GKE clusters and other related resources.
kubectl: kubectl command-line tool installed and configured.
NVIDIA GPUs: One of the below GPUs should work

Note: Google Cloud shell is recommended to run this sample.

You can skip this step, if done previously.

git clone https://github.com/ai-on-gke/nvidia-ai-solutions
cd nvidia-ai-solutions/bionemo

Setup

Set Project:

gcloud config set project "your-project-id"

Replace “your-project-id” with your actual project ID.

Set Environment Variables:

export PROJECT_ID="your-project-id"
export PUBLIC_REPOSITORY=$PROJECT_ID
export REGION=us-central1
export ZONE=us-central1-a
export CLUSTER_NAME=bionemo-demo
export NODE_POOL_MACHINE_TYPE=g2-standard-24 #e.g., a2-highgpu-1g (A100 40GB) or a2-ultragpu-1g (A100 80GB)
export CLUSTER_MACHINE_TYPE=e2-standard-2
export GPU_TYPE=nvidia-l4 # e.g., nvidia-tesla-a100 for A100 40GB OR nvidia-a100-80gb (A100 80GB)
export GPU_COUNT=2 # e.g., 1 (A100)

export NETWORK_NAME="default"

Adjust the zone, machine type, accelerator type, count, and number of nodes as per your requirements. Refer to Google Cloud documentation for available options. Consider smaller machine types for development to manage costs.

NOTE: Skip steps 3-5 if you are reusing the same GKE cluster from pretraining.

Enable the Filestore API

gcloud services enable file.googleapis.com

Create GKE Cluster

gcloud container clusters create ${CLUSTER_NAME} \
    --location=${ZONE} \
    --network=${NETWORK_NAME} \
    --addons=GcpFilestoreCsiDriver \
    --machine-type=${CLUSTER_MACHINE_TYPE} \
    --num-nodes=1 \
    --workload-pool=${PROJECT_ID}.svc.id.goog

Create GPU Node Pool:

gcloud container node-pools create gpupool \
    --location=${ZONE} \
    --cluster=${CLUSTER_NAME} \
    --machine-type=${NODE_POOL_MACHINE_TYPE} \
    --num-nodes=1 \
    --accelerator type=${GPU_TYPE},count=${GPU_COUNT},gpu-driver-version=latest

This creates a node pool specifically for GPU workloads.

Get Cluster Credentials:

gcloud container clusters get-credentials "${CLUSTER_NAME}" \
  --location="${ZONE}"

Create an Artifact Registry to store container images

gcloud artifacts repositories create ${PUBLIC_REPOSITORY} \
  --repository-format=docker --location=${REGION}

Create service account to allow GKE to pull images

gcloud iam service-accounts create esm2-inference-gsa

Create namespace, training job, tensorboard microservice, and mount Google cloud Filestore for storage

alias k=kubectl

k create namespace bionemo-training

k create serviceaccount esm2-inference-sa -n bionemo-training

Create identity binding

This is needed to allow the GKE POD to pull the custom image from the artifact registry we just created in a previous step

gcloud iam service-accounts add-iam-policy-binding esm2-inference-gsa@${PROJECT_ID}.iam.gserviceaccount.com --role="roles/iam.workloadIdentityUser" --member="serviceAccount:${PROJECT_ID}.svc.id.goog[bionemo-training/esm2-inference-sa]"

k annotate serviceaccount esm2-inference-sa -n bionemo-training iam.gke.io/gcp-service-account=esm2-inference-gsa@$PROJECT_ID.iam.gserviceaccount.com

Note: This requires workload identity to be configured at the cluster level.

Launch fine-tuning job by applying the kustomize file

k apply -k fine-tuning/job

Check job status by running:

k get job esm2-finetuning -n bionemo-training

You will need if the job has succeded once its status is Complete.

build and push inference server docker image

docker build -t ${REGION}-docker.pkg.dev/${PROJECT_ID}/${PUBLIC_REPOSITORY}/esm2-inference:latest fine-tuning/inference/.

Authenticate to artifact registry:

gcloud auth configure-docker ${REGION}-docker.pkg.dev

docker push ${REGION}-docker.pkg.dev/${PROJECT_ID}/${PUBLIC_REPOSITORY}/esm2-inference:latest

Launch inference deployment

ensure job status is Complete by running:

k get job esm2-finetuning -n bionemo-training

Ensure environment variables REGION, PROJECT_ID, and PUBLIC_REPOSITORY are fully set.

envsubst < fine-tuning/inference/kustomization.yaml | sponge fine-tuning/inference/kustomization.yaml

k apply -k fine-tuning/inference

Port Forwarding (for inference):

List deployment PODs

k get pods -l app=esm2-inference -n bionemo-training

Wait for a few minutes for the inference POD to be in Running status, run:

k port-forward -n bionemo-training svc/esm2-inference 8080:80

in a separate shell window, run:

curl -X POST http://localhost:8080/predict \
  -H "Content-Type: application/json" \
  -d '{"sequence": "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG"}'

See here documentation reference.

Cleanup

To delete the cluster and all associated resources:

k delete namespace bionemo-training --cascade=background

gcloud container clusters delete "${CLUSTER_NAME}" --location="${ZONE}" --quiet

gcloud artifacts repositories delete ${PUBLIC_REPOSITORY} \
    --location=${REGION} \
    --quiet

gcloud iam service-accounts delete esm2-inference-gsa@${PROJECT_ID}.iam.gserviceaccount.com \
    --quiet

docker rmi ${REGION}-docker.pkg.dev/${PROJECT_ID}/${PUBLIC_REPOSITORY}/esm2-inference:latest

Continue reading:

NVIDIA BioNeMo

Deploying and managing servers dedicated to performing inference tasks for machine learning models.

Training ESM2

This samples walks through setting up a Google Cloud GKE environment to train ESM2 (Evolutionary Scale Modeling) using NVIDIA BioNeMo Framework 2.0

Jupyter

This guide details how to deploy JupyterHub on Google Kubernetes Engine (GKE) using a provided Terraform template, including options for persistent storage and Identity-Aware Proxy (IAP) for secure access. It covers the necessary prerequisites, configuration steps, and installation process, emphasizing the use of Terraform for automation and IAP for authentication. The guide also provides instructions for accessing JupyterHub, setting up user access, and running an example notebook.

Digital Human for Customer Service

This sample walks through creatinb intelligent, interactive avatars for customer service across industries in GKE by using NVIDIA NIM services.

Generative Virtual Screening

This guide outlines the steps to deploy NVIDIA's NIM blueprint for [Generative Virtual screening for Drug Discovery](https://build.nvidia.com/nvidia/generative-virtual-screening-for-drug-discovery) on a Google Kubernetes Engine (GKE) cluster. Three NIMs - AlphaFold2, MolMIM & DiffDock are used to demonstrate Protein folding, Molecular generation and Protein docking.