Tags:

Digital Human for Customer Service on GKE

Deploying the digital human blueprint based on few NIMs on GKE.

Digital Human for Customer Service on GKE

Prerequisites

GCloud SDK: Ensure you have the Google Cloud SDK installed and configured.
Project: A Google Cloud project with billing enabled.
NGC API Key: An API key from NVIDIA NGC. Please read the prerequisites to access this key here
kubectl: kubectl command-line tool installed and configured.
NVIDIA GPUs: One of the below GPUs should work

Clone the repo before proceeding further:


git clone https://github.com/ai-on-gke/nvidia-ai-solutions
cd nvidia-ai-solutions/nim/blueprints/digitalhuman

Setup

Environment setup: You’ll set up several environment variables to make the following steps easier and more flexible. These variables store important information like cluster names, machine types, and API keys. You need to update the variable values to match your needs and context.

gcloud config set project "<GCP Project ID>"

export CLUSTER_NAME="gke-nimbp-dighuman"
export NP_NAME="gke-nimbp-dighuman-gpunp"

export ZONE="us-west4-a"            # e.g., us-west4-a
export NP_CPU_MACHTYPE="e2-standard-2" # e.g., e2-standard-2
export NP_GPU_MACHTYPE="g2-standard-96" # e.g., a2-ultragpu-1g

export ACCELERATOR_TYPE="nvidia-l4"     # e.g., nvidia-a100-80gb
export ACCELERATOR_COUNT="8"            # Or higher, as needed
export NODE_POOL_NODES=1                # Or higher, as needed

export NGC_API_KEY="<NGC API Key>"

GKE Cluster and Node pool creation:

gcloud container clusters create "${CLUSTER_NAME}" \
  --num-nodes="1" \
  --location="${ZONE}" \
  --machine-type="${NP_CPU_MACHTYPE}" \
  --addons=GcpFilestoreCsiDriver

gcloud container node-pools create "${NP_NAME}" \
  --cluster="${CLUSTER_NAME}" \
  --location="${ZONE}" \
  --node-locations="${ZONE}" \
  --num-nodes="${NODE_POOL_NODES}" \
  --machine-type="${NP_GPU_MACHTYPE}" \
  --accelerator="type=${ACCELERATOR_TYPE},count=${ACCELERATOR_COUNT},gpu-driver-version=LATEST" \
  --placement-type="COMPACT" \
  --disk-type="pd-ssd" \
  --disk-size="300GB"

Get Cluster Credentials:

gcloud container clusters get-credentials "${CLUSTER_NAME}" --location="${ZONE}"

Set kubectl Alias (Optional):
```
alias k=kubectl
```

Create NGC API Key Secret: Creates secrets for pulling images from NVIDIA NGC and pods that need the API key at startup.

k create secret docker-registry secret-nvcr \
  --docker-username=\$oauthtoken \
  --docker-password="${NGC_API_KEY}" \
  --docker-server="nvcr.io"

k create secret generic ngc-api-key \
  --from-literal=NGC_API_KEY="${NGC_API_KEY}"

Deploy NIMs:

k apply -f digital-human-nimbp.yaml

The NIM deployment takes upto 15mins for it to be complete. You can check the pods are in Running status: k get pods should list below pods.

NAME	READY	STATUS
`dighum-embedqa-e5v5-aa-aa`	1/1	Running
`dighum-rerankqa-mistral4bv3-bb-bb`	1/1	Running
`dighum-llama3-8b-cc-cc`	1/1	Running
`dighum-audio2face-3d-dd-dd`	1/1	Running
`dighum-fastpitch-tts-ee-ee`	1/1	Running
`dighum-maxine-audio2face-2d-ff-ff`	1/1	Running
`dighum-parakeet-asr-1-1b-gg-gg`	1/1	Running

Access NIM endpoints

SERVICES=$(k get svc | awk '{print $1}' | grep -v NAME | grep '^dighum')

for service in $SERVICES; do
  # Get the pod name.
  POD=$(k get pods -o go-template --template '{{range .items}}{{.metadata.name}}{{"\n"}}{{end}}' | grep $(echo $service | sed 's/-lb//'))

  # Get external IP.
  EXTERNAL_IP=$(k get svc $service -o jsonpath='{.status.loadBalancer.ingress[0].ip}')

  echo "----------------------------------"
  echo "Testing service: $service at ${EXTERNAL_IP}"
  curl http://${EXTERNAL_IP}/v1/health/ready
  echo " "
  echo "----------------------------------"
done

Click here if you need HTTPS endpoints

Test

Below are curl statements to test each of the endpoints

nv-embedqa-e5-v5

Set EXTERNAL_IP from above output for dighum-embedqa-e5v5
```
export EXTERNAL_IP=<IP>
curl -X "POST" \
"http://${EXTERNAL_IP}/v1/embeddings" \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "input": ["Hello world"],
  "model": "nvidia/nv-embedqa-e5-v5",
  "input_type": "query"
}'
```
nv-rerankqa-mistral-4b-v3

Set EXTERNAL_IP from above output for dighum-rerankqa-mistral4bv3
```
export EXTERNAL_IP=<IP>

curl -X "POST" \
"http://${EXTERNAL_IP}/v1/ranking" \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
      "model": "nvidia/nv-rerankqa-mistral-4b-v3",
      "query": {"text": "which way should i go?"},
      "passages": [
        {"text": "two roads diverged in a yellow wood, and sorry i could not travel both and be one traveler, long i stood and looked down one as far as i could to where it bent in the undergrowth;"}
      ],
      "truncate": "END"
    }'
```
llama3-8b-instruct

Set EXTERNAL_IP from above output for dighum-llama3-8b
```
export EXTERNAL_IP=<IP>

curl -X "POST" \
"http://${EXTERNAL_IP}/v1/chat/completions" \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"model": "meta/llama3-8b-instruct",
"messages": [{"role":"user", "content":"Write a limerick about the wonders of GPU computing."}],
  "max_tokens": 64
}'
```
parakeet-ctc-1.1b-asr
- Install the Riva Python client package
```
python3 -m venv venv
source venv/bin/activate
pip install nvidia-riva-client
```
- Download Riva sample clients
```
git clone https://github.com/nvidia-riva/python-clients.git
```
- Run Speech to Text inference in streaming modes. Riva ASR supports Mono, 16-bit audio in WAV, OPUS and FLAC formats.
```
k port-forward $(k get pod --selector="app=dighum-parakeet-asr-1-1b" --output jsonpath='{.items[0].metadata.name}') 50051:50051

python3 python-clients/scripts/asr/transcribe_file.py --server 0.0.0.0:50051 --input-file ./output.wav --language-code en-US

deactivate
```

For more details on getting started with this NIM, visit the Riva ASR NIM Docs

fastpitch-hifigan-tts
- Install the Riva Python client package
```
python3 -m venv venv
source venv/bin/activate
pip install nvidia-riva-client
```
- Download Riva sample clients
```
git clone https://github.com/nvidia-riva/python-clients.git
```
- Use kubectl to port forward
```
k port-forward $(k get pod --selector="app=dighum-parakeet-asr-1-1b" --output jsonpath='{.items[0].metadata.name}') 50051:50051 &
```
- Run Speech to Text inference in streaming modes. Riva ASR supports Mono, 16-bit audio in WAV, OPUS and FLAC formats.
```
python3 python-clients/scripts/tts/talk.py --server 0.0.0.0:50051 --text "Hello, this is a speech synthesizer." --language-code en-US --output output.wav

deactivate
```
On running the above command, the synthesized audio file named output.wav will be created.
audio2face-2d
- Setup a virtual env
```
python3 -m venv venv
source venv/bin/activate
```
- Download the Audio2Face-2D client code
```
git clone https://github.com/NVIDIA-Maxine/nim-clients.git
cd nim-clients/audio2face-2d/
pip install -r python/requirements.txt
```
- Compile the protos
```
cd protos/linux/python
chmod +x compile_protos.sh
./compile_protos.sh
```
- Run test inference
```
cd python/scripts

python audio2face-2d.py --target <server_ip:port> \
  --audio-input <input audio file path> \
  --portrait-input <input portrait image file path> \
  --output <output file path and the file name> \
  --head-rotation-animation-filepath <rotation animation filepath> \
  --head-translation-animation-filepath <translation animation filepath> \
  --ssl-mode <ssl mode value> \
  --ssl-key <ssl key file path> \
  --ssl-cert <ssl cert filepath> \
  --ssl-root-cert <ssl root cert filepath>
```
  Refer the documentation audio2face-2d NIM to set the values.
audio2face-3d
- Setup a virtual env
```
python3 -m venv venv
source venv/bin/activate
```
- Download the Audio2Face-2D client code
```
git clone https://github.com/NVIDIA/Audio2Face-3D-Samples.git
cd Audio2Face-3D-Samples/scripts/audio2face_3d_microservices_interaction_app

pip3 install ../../proto/sample_wheel/nvidia_ace-1.2.0-py3-none-any.whl

pip3 install -r requirements.txt
```
- Perform a health check
```
python3 a2f_3d.py health_check --url 0.0.0.0:52000
```
- Run a test inference
```
python3 a2f_3d.py run_inference ../../example_audio/Claire_neutral.wav config/config_claire.yml \
-u 0.0.0.0:52000
```
  Refer the documentation of audio2face-3d NIM for more information.

Tear down

Tear down the environment NOTE: Please note all the NIMs deployed and cluster will be deleted.

  k delete -f digital-human-nimbp.yaml
  k delete secret secret-nvcr
  k delete secret ngc-api-key
  gcloud container clusters delete "${CLUSTER_NAME}" \
  --location="${ZONE}" --quiet

Continue reading:

NVIDIA BioNeMo

Deploying and managing servers dedicated to performing inference tasks for machine learning models.

Fine-Tuning ESM2

This sample walks through setting up a Google Cloud GKE environment to fine-tune ESM2 (Evolutionary Scale Modeling) using NVIDIA BioNeMo Framework 2.0

Training ESM2

This samples walks through setting up a Google Cloud GKE environment to train ESM2 (Evolutionary Scale Modeling) using NVIDIA BioNeMo Framework 2.0

Jupyter

This guide details how to deploy JupyterHub on Google Kubernetes Engine (GKE) using a provided Terraform template, including options for persistent storage and Identity-Aware Proxy (IAP) for secure access. It covers the necessary prerequisites, configuration steps, and installation process, emphasizing the use of Terraform for automation and IAP for authentication. The guide also provides instructions for accessing JupyterHub, setting up user access, and running an example notebook.

LLMs

This guide explains how to deploy NVIDIA NIM inference microservices on a Google Kubernetes Engine (GKE) cluster, requiring an NVIDIA AI Enterprise License for access to the models. It details the process of setting up a GKE cluster with GPU-enabled nodes, configuring access to the NVIDIA NGC registry, and deploying a NIM using a Helm chart with persistent storage. Finally, it demonstrates how to test the deployed NIM service by sending a sample prompt and verifying the response, ensuring the inference microservice is functioning correctly.

Generative Virtual Screening

This guide outlines the steps to deploy NVIDIA's NIM blueprint for [Generative Virtual screening for Drug Discovery](https://build.nvidia.com/nvidia/generative-virtual-screening-for-drug-discovery) on a Google Kubernetes Engine (GKE) cluster. Three NIMs - AlphaFold2, MolMIM & DiffDock are used to demonstrate Protein folding, Molecular generation and Protein docking.