Storage

Providing persistent and high-performance storage solutions for AI/ML workloads running on Google Kubernetes Engine (GKE).

Load Hugging Face Models into Cloud Storage

This guide provides instructions for how to hydrate GCS buckets with models from Hugging Face with a Kubernetes Job.

Populate a Hyperdisk ML Disk from Google Cloud Storage

This guide uses the Google Cloud API to create a Hyperdisk ML disk from data in Cloud Storage and then use it in a GKE cluster. Refer to this documentation for instructions all in the GKE API.

Models as OCI

This project allows you to download a Hugging Face model and package it as a Docker image. The Docker image can then be pushed to Google Artifact Registry for deployment or distribution. Build time can be significant for large models, it is recommended to not exceed models above 10 billion parameters. For reference 8b model roughly takes 35 minutes to build and push with this cloudbuild config.

Continue reading:

Agent ADK using GKE Autopilot Cluster with Llama and vLLM

This tutorial demonstrates how to deploy the Llama-3.1-8B-Instruct model on Google Kubernetes Engine (GKE) and vLLM for efficient inference. Additionally, it shows how to integrate an ADK agent to interact with the model, supporting both basic chat completions and tool usage. The setup leverages a GKE Autopilot cluster to handle the computational requirements.

Google ADK Agent with Memory Tools

ADK VertexAI Example

This tutorial guides you through deploying a containerized agent built with the [Google Agent Development Kit (ADK)](https://google.github.io/adk-docs/) to [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine/docs/concepts/kubernetes-engine-overview). The agent uses [VertexAI](https://cloud.google.com/vertex-ai/docs) to access LLMs. GKE provides a managed environment for deploying, managing, and scaling your containerized applications using Google infrastructure.

LlamaFirewall with Google ADK

This tutorial provides instructions on how to securely deploy a self-hosted [ADK](https://google.github.io/adk-docs/) agent with [LlamaFirewall](https://meta-llama.github.io/PurpleLlama/LlamaFirewall/) on [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine/docs/concepts/kubernetes-engine-overview).

Hugging Face to GCS

This guide provides instructions for how to hydrate GCS buckets with models from Hugging Face with a Kubernetes Job.

Hyperdisk ML Disk

This guide uses the Google Cloud API to create a Hyperdisk ML disk from data in Cloud Storage and then use it in a GKE cluster. Refer to [this documentation](https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/hyperdisk-ml) for instructions all in the GKE API.