Tutorials

Deploying a containerized agent built with the Google Agent Development Kit (ADK) that uses VertexAI API

This tutorial guides you through deploying a containerized agent built with the Google Agent Development Kit (ADK) to Google Kubernetes Engine (GKE). The agent uses VertexAI to access LLMs. GKE provides a managed environment for deploying, managing, and scaling your containerized applications using Google infrastructure.

Tags:

Make ADK application agent use Mem0 to persist information

Tags:

Securing Google ADK application agent with LlamaFirewall

This tutorial provides instructions on how to securely deploy a self-hosted ADK agent with LlamaFirewall on Google Kubernetes Engine (GKE).

Tags:

Building Agents with Agent Development Kit (ADK) on GKE Autopilot cluster using Self-Hosted LLM

This tutorial demonstrates how to deploy the Llama-3.1-8B-Instruct model on Google Kubernetes Engine (GKE) and vLLM for efficient inference. Additionally, it shows how to integrate an ADK agent to interact with the model, supporting both basic chat completions and tool usage. The setup leverages a GKE Autopilot cluster to handle the computational requirements.

Tags:

Storage

Providing persistent and high-performance storage solutions for AI/ML workloads running on Google Kubernetes Engine (GKE).

Tags:

Workflow orchestration

Workflow orchestration in the ai-on-gke project involves managing and automating the execution of complex, multi-step processes, primarily for AI/ML workloads on Google Kubernetes Engine (GKE).

Tags:

Building Agents with Agent Development Kit (ADK) on GKE using Ray Serve for Self-Hosted LLMs

This tutorial demonstrates how to deploy the Llama-3.1-8B-Instruct model on Google Kubernetes Engine (GKE) using Ray Serve and vLLM for efficient inference. Additionally, it shows how to integrate an ADK agent to interact with the model, supporting both basic chat completions and tool usage. The setup leverages a GKE Standard cluster with GPU-enabled nodes to handle the computational requirements.

Tags:

Creating Inference Checkpoints

Overviews how to convert your inference checkpoint for various model servers

Tags:

GKE cross region capacity chasing with SkyPilot

In this tutorial, we will demonstrate how to leverage the open-source software SkyPilot to help GKE customers efficiently obtain accelerators across regions, ensuring workload continuity and optimized resource utilization.

Tags:

Deploying MCP Servers on GKE: Building AI Agents with ADK and Ray-Served Models

This guide provides instructions for deploying a Ray cluster with the AI Device Kit (ADK) and a custom Model Context Protocol (MCP) server on Google Kubernetes Engine (GKE). It covers setting up the infrastructure with Terraform, containerizing and deploying the Ray Serve application, deploying a custom MCP server for real-time weather data, and finally deploying an ADK agent that utilizes these components. The guide also includes steps for verifying deployments and cleaning up resources.

Tags:

DWS

This guide provides examples of how to use Dynamic Workload Scheduler (DWS) within Google Kubernetes Engine (GKE), leveraging Kueue for queue management and resource provisioning. It includes sample configurations for Kueue queues with DWS support (dws-queue.yaml) and a sample job definition (job.yaml) that demonstrates how to request resources and set a maximum run duration using DWS.

Tags:

Fine-tuning Gemma 3-1B-it on L4

This tutorial guides you through fine-tuning the Gemma 3-1B-it language model on Google Kubernetes Engine (GKE) using L4 GPU, leveraging Parameter Efficient Fine Tuning (PEFT) and LoRA. It covers setting up a GKE cluster, containerizing the fine-tuning code, running the fine-tuning job, and uploading the resulting model to Hugging Face. Finally, it demonstrates how to deploy and interact with the fine-tuned model using vLLM on GKE.

Tags:

Running Flyte on GKE

This guide illustrates the deployment of Flyte on Google Kubernetes Engine (GKE) using Helm, utilizing Google Cloud Storage for scalable data storage and Cloud SQL PostgreSQL for a reliable metadata store. By the end of this tutorial, you will have a fully functional Flyte instance on GKE, offering businesses seamless integration with the GCP ecosystem, improved resource efficiency, and cost-effectiveness.

Tags:

Hugging Face TGI

This guide demonstrates how to deploy a Hugging Face Text Generation Inference (TGI) server on Google Kubernetes Engine (GKE) using NVIDIA L4 GPUs, enabling you to serve large language models like Mistral-7b-instruct. It walks you through creating a GKE cluster, deploying the TGI application, sending prompts to the model, and monitoring the service’s performance using metrics, while also providing instructions for cleaning up the cluster.

Tags:

Load Hugging Face Models into Cloud Storage

This guide provides instructions for how to hydrate GCS buckets with models from Hugging Face with a Kubernetes Job.

Tags:

Populate a Hyperdisk ML Disk from Google Cloud Storage

This guide uses the Google Cloud API to create a Hyperdisk ML disk from data in Cloud Storage and then use it in a GKE cluster. Refer to this documentation for instructions all in the GKE API.

Tags:

Securing Application with Identity Aware Proxy

Overviews how to secure application endpoints with Identity Aware Proxy (IAP)

Tags:

Inference servers

Deploying and managing servers dedicated to performing inference tasks for machine learning models.

Tags:

Kserve

This tutorial will guide you step by step through the process of installing KServe in a GKE Autopilot cluster.

Tags:

Deploying a Persistent Chatbot on Google Cloud Platform with LangChain, Streamlit, and IAP

In this tutorial, you will learn how to deploy a chatbot application using LangChain and Streamlit on Google Cloud Platform (GCP).

Tags:

Fine-Tuning Gemma 2-9B on GKE using Metaflow and Argo Workflows

This tutorial will provide instructions on how to deploy and use the Metaflow framework on GKE (Google Kubernetes Engine) and operate AI/ML workloads using Argo-Workflows.

Tags:

Fine-tune gemma-2-9b and track as an experiment in MLFlow

In this tutorial we will fine-tune gemma-2-9b using LoRA as an experiment in MLFlow. We will deploy MLFlow on a GKE cluster and set up MLFlow to store artifacts inside a GCS bucket. In the end, we will deploy a fine-tuned model using KServe.

Tags:

Enable Model Armor for vLLM deployment with Inference Gateway

Overviews how to set up Inference Gateway with Model Armor to secure interaction with LLM models

Tags:

Models as OCI

This project allows you to download a Hugging Face model and package it as a Docker image. The Docker image can then be pushed to Google Artifact Registry for deployment or distribution. Build time can be significant for large models, it is recommended to not exceed models above 10 billion parameters. For reference 8b model roughly takes 35 minutes to build and push with this cloudbuild config.

Tags:

Multikueue, DWS and GKE Autopilot

In this guide you will learn how to set up a multi-cluster environment where job computation is distributed across three GKE clusters in different regions using MultiKueue, Dynamic Workload Scheduler (DWS), and GKE Autopilot.

Tags:

Efficient GPU Resource Management for ML Workloads using SkyPilot, Kueue on GKE

This tutorial expands on the SkyPilot Tutorial by leveraging Dynamic Workload Scheduler with the help of an open-source project called Kueue

Tags:

Security

Tags:

Skypilot

In this tutorial, we will demonstrate how to leverage the open-source software SkyPilot to help GKE customers efficiently obtain accelerators across regions, ensuring workload continuity and optimized resource utilization.

Tags:

vLLM GPU/TPU Fungibility

This tutorial shows you who to serve a large language model (LLM) using both Tensor Processing Units (TPUs) and GPUs on Google Kubernetes Engine (GKE) using the same deployment with vLLM

Tags:

Building a Multi-Agent Code Development Flow with Flowise on GKE

This tutorial will provide instructions on how to deploy and use FlowiseAI on GKE (Google Kubernetes Engine) to build and operate AI applications using a low-code/no-code approach.

Tags:

Deploying n8n with agent and tool workflow

This tutorial will provide instructions on how to deploy and use n8n on GKE (Google Kubernetes Engine) to build and operate AI applications using a low-code/no-code approach.

Tags: Tutorials