Populate a Hyperdisk ML Disk from Google Cloud Storage

Tags:

Overview

This guide uses the Google Cloud API to create a Hyperdisk ML disk from data in Cloud Storage and then use it in a GKE cluster. Refer to this documentation for instructions all in the GKE API

Before you begin

Ensure you have a GCP project with billing enabled and have enabled the GKE API.
- Follow this link to learn how to enable billing for your project.
- GCE and GKE APIs can be enabled by running:
```
gcloud services enable compute.googleapis.com
gcloud services enable container.googleapis.com
```
Ensure you have the following tools installed on your workstation:
- gcloud
- kubectl
- terraform
- helm

Setting up your cluster and Hyperdisk ML Disk

Set the default environment variables:

export VM_NAME=hydrator
export MACHINE_TYPE=c3-standard-44
export IMAGE_FAMILY=debian-12
export IMAGE_PROJECT=debian-cloud
export ZONE=us-central1-a
export SNAP_SHOT_NAME=hdmlsnapshot
export PROJECT_ID=$(gcloud config get project)
export DISK_NAME=model1

Create a new GCE instance that you will use to hydrate the new Hyperdisk ML disk with data. Note a c3-standard-44 instance is used to provide the max throughput while populating the hyperdisk(Instance to throughput rates).
```
gcloud compute instances create $VM_NAME \
    --image-family=$IMAGE_FAMILY \
    --image-project=$IMAGE_PROJECT \
    --zone=$ZONE \
    --machine-type=$MACHINE_TYPE
```

Create and attach the disk to the new GCE VM.

SIZE=140
THROUGHPUT=2400

gcloud compute disks create $DISK_NAME --type=hyperdisk-ml \
--size=$SIZE --provisioned-throughput=$THROUGHPUT  \
--zone $ZONE

gcloud compute instances attach-disk $VM_NAME --disk=$DISK_NAME --zone=$ZONE

Create a template snapshot of the disk with the content of the GCS bucket

Get your current IP to create a new Firewall rule
```
curl ifconfig.me
```

Replace your network and add a Firewall rule to enable SSH access into the virutal machine

gcloud compute firewall-rules create allow-ssh-ingress-from-iap \
   --direction=INGRESS \
   --action=allow \
   --rules=tcp:22 \
   --source-ranges=<replace-your-ip>/20

Log into the virtual machine
```
gcloud compute ssh $VM_NAME
```

Update and authenticate the instance

sudo apt-get update
sudo apt-get install google-cloud-cli
gcloud init
gcloud auth login

Identify the device name (eg: /dev/nvme0n2) by looking at the output of lsblk. This should correspond to the disk that was attached in the previous step.
```
lsblk
```
Save device name given by lsblk command (example /dev/nvme0n2)
```
DEVICE=/dev/nvme0n2
```

Mount the disk and copy the content form the GCS bucket into the disk

GCS_DIR=gs://vertex-model-garden-public-us/llama3.3/Llama-3.3-70B-Instruct
sudo /sbin/mkfs -t ext4 -E lazy_itable_init=0,lazy_journal_init=0,discard $DEVICE

sudo mount $DEVICE /mnt
sudo gcloud storage cp -r $GCS_DIR /mnt
sudo umount /mnt

Close the connection to the VM
```
exit
```

Detach disk from the hydrator and switch to READ_ONLY_MANY access mode.

gcloud compute instances detach-disk $VM_NAME --disk=$DISK_NAME --zone=$ZONE
gcloud compute disks update $DISK_NAME --access-mode=READ_ONLY_MANY  --zone=$ZONE

Create a snapshot from the disk to use as a template.

gcloud compute snapshots create $SNAP_SHOT_NAME \
    --source-disk-zone=$ZONE \
    --source-disk=$DISK_NAME \
    --project=$PROJECT_ID

Delete the VM and connect the disk to the GKE cluster

You now have a hyperdisk ML snapshot populated with your data from Google Cloud Storage. You can delete the hydrator GCE instance and the original disk.
```
gcloud compute instances delete $VM_NAME --zone=$ZONE
gcloud compute disks delete $DISK_NAME --project $PROJECT_ID --zone $ZONE
```

In your GKE cluster create your Hypedisk ML multi zone and Hyperdisk ML storage classes. Hyperdisk ML disks are zonal and the Hyperdisk-ml-multi-zone storage class automatically provisions disks in zones where the pods using them are. Replace the zones in this class with the zones you want to allow the Hyperdisk ML snapshot to create disks in.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: hyperdisk-ml-multi-zone
parameters:
type: hyperdisk-ml
provisioned-throughput-on-create: "2400Mi"
enable-multi-zone-provisioning: "true"
provisioner: pd.csi.storage.gke.io
allowVolumeExpansion: false
reclaimPolicy: Delete
volumeBindingMode: Immediate
allowedTopologies:
- matchLabelExpressions:
- key: topology.gke.io/zone
    values:
    - us-central1-a
    - us-central1-c
--- 
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
    name: hyperdisk-ml
parameters:
    type: hyperdisk-ml
provisioner: pd.csi.storage.gke.io
allowVolumeExpansion: false
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer

Create a volumeSnapshotClass and VolumeSnapshotContent config to use your snapshot. Replace the VolumeSnapshotContent.spec.source.snapshotHandle with the path to your snapshot.

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
name: my-snapshotclass
driver: pd.csi.storage.gke.io
deletionPolicy: Delete
---
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: restored-snapshot
spec:
volumeSnapshotClassName: my-snapshotclass
source:
    volumeSnapshotContentName: restored-snapshot-content
---
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotContent
metadata:
name: restored-snapshot-content
spec:
deletionPolicy: Retain
driver: pd.csi.storage.gke.io
source:
    snapshotHandle: projects/[project_ID]/global/snapshots/[snapshotname]
volumeSnapshotRef:
    kind: VolumeSnapshot
    name: restored-snapshot
    namespace: default

Reference your snapshot in the persistent volume claim. Be sure to adjust the spec.dataSource.name and spec.resources.requests.storage to your snapshot name and size.

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: hdml-consumer-pvc
spec:
dataSource:
    name: restored-snapshot
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io
accessModes:
- ReadOnlyMany
storageClassName: hyperdisk-ml-multi-zone
resources:
    requests:
    storage: 140Gi

Add a reference to this PVC in your deployment spec.template.spec.volume.persistentVolumeClaim.claimName parameter.

---
apiVersion: apps/v1
kind: Deployment
metadata:
name: busybox
labels:
    app: busybox
spec:
replicas: 1
selector:
    matchLabels:
    app: busybox
strategy:
    type: Recreate
template:
    metadata:
    labels:
        app: busybox
    spec:
    containers:
    - image: busybox:latest
        name: busybox
        command:
        - "sleep"
        - "infinity"
        volumeMounts:
        - name: busybox-persistent-storage
        mountPath: /var/www/html
    volumes:
    - name: busybox-persistent-storage
        persistentVolumeClaim:
        claimName: hdml-consumer-pvc

Continue reading:

Agent ADK using GKE Autopilot Cluster with Llama and vLLM

This tutorial demonstrates how to deploy the Llama-3.1-8B-Instruct model on Google Kubernetes Engine (GKE) and vLLM for efficient inference. Additionally, it shows how to integrate an ADK agent to interact with the model, supporting both basic chat completions and tool usage. The setup leverages a GKE Autopilot cluster to handle the computational requirements.

ADK VertexAI Example

This tutorial guides you through deploying a containerized agent built with the [Google Agent Development Kit (ADK)](https://google.github.io/adk-docs/) to [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine/docs/concepts/kubernetes-engine-overview). The agent uses [VertexAI](https://cloud.google.com/vertex-ai/docs) to access LLMs. GKE provides a managed environment for deploying, managing, and scaling your containerized applications using Google infrastructure.

Storage

Providing persistent and high-performance storage solutions for AI/ML workloads running on Google Kubernetes Engine (GKE).

Hugging Face to GCS

This guide provides instructions for how to hydrate GCS buckets with models from Hugging Face with a Kubernetes Job.

Workflow orchestration

Workflow orchestration in the ai-on-gke project involves managing and automating the execution of complex, multi-step processes, primarily for AI/ML workloads on Google Kubernetes Engine (GKE).