Securing Google ADK application agent with LlamaFirewall

Overview

This tutorial provides instructions on how to securely deploy a self-hosted ADK agent with LlamaFirewall on Google Kubernetes Engine (GKE).

The goal is to host a general purpose LLM locally on cluster and secure interaction with it by also using locally hosted LLM of the LLamaFirewall. In order to sanitize its input, LlamaFirewall uses the Llama PromptGuard 2 model.

At the moment of writing this tutorial, the PromptGuard 2 model is used by LlamaFirewall internally on the same machine, without ability to connect to the PromptGuard 2 that is hosted remotely (for example on a separate vLLM deployment), that’s why in this tutorial we attach GPU to the pod that runs ADK app itself.

This example uses Google ADK to interact with LLM, but it is more focused on LLamaFirewall. For more detailed guide in Google ADK, please refer to our another guide - Agent Development Kit (ADK) on GKE

The tutorial will cover:

Setting up your Google Cloud environment.
Building a container image for your agent.
Deploying the LLMs via vLLM.
Deploying the agent to a GKE cluster.
Testing your deployed agent.

Before you begin

Ensure you have the following tools installed on your workstation

If you previously installed the gcloud CLI, get the latest version by running:

gcloud components update

Ensure that you are signed in using the gcloud CLI tool. Run the following command:

gcloud auth application-default login

Infrastructure Setup

Clone the repository

Clone the repository with our guides and cd to the LlamaFirewall tutorial directory by running these commands:

git clone https://github.com/ai-on-gke/tutorials-and-examples.git
cd tutorials-and-examples/security/llama-firewall/

Filesystem structure

├── adk-app    # ADK application source
│   ├── Dockerfile
│   ├── llama_firewall_secured_agent
│   │   ├── agent.py
│   │   ├── __init__.py
│   │   └── __pycache__
│   │       ├── agent.cpython-312.pyc
│   │       ├── __init__.cpython-312.pyc
│   │       └── llama_firewall.cpython-312.pyc
│   ├── main.py
│   └── requirements.txt
├── gen    # This folder will contain all files that are generated by the terraform
│   └── secured-agent.yaml    # This is the manifest of the ADK app deployment
├── terraform    # The terraform config source
│   ├── example.tfvars
│   ├── main.tf
│   ├── network.tf
│   ├── outputs.tf
│   ├── providers.tf
│   ├── templates
│   │   └── secured-agent.yaml.tftpl
│   ├── terraform.tfstate
│   ├── terraform.tfstate.backup
│   ├── variables.tf
│   └── versions.tf
└── vllm    # Manifests for vLLM deployment that hosts a common models
    └── vllm-llama.yaml

Enable Necessary APIs

Enable the APIs required for GKE, Artifact Registry, Cloud Build, and Vertex AI

gcloud services enable \
    container.googleapis.com \
    artifactregistry.googleapis.com \
    cloudbuild.googleapis.com \

Create cluster and other resources

In this section we will use Terraform to automate the creation of infrastructure resources. For more details how it is done please refer to the terraform config in the terraform/ folder. By default, the configuration provisions an Autopilot GKE cluster, but it can be changed to standard by setting autopilot_cluster = false.

It creates the following resources. For more information such as resource names and other details, please refer to the Terraform config:

Service Accounts:
- Cluster IAM Service Account (derives name from a cluster name, e.g. tf-gke-<cluster name>) – manages permissions for the GKE cluster.
Artifact registry – stores container images for the application.

Go the the terraform directory:
```
cd terraform
```
Specify the following values inside the example.tfvars file (or make a separate copy):
- <PROJECT_ID> – replace with your project id (you can find it in the project settings).
Other values can be changed, if needed, but can be left with default values.
Init terraform modules:
```
terraform init
```
Optionally run the plan command to view an execution plan:
```
terraform plan -var-file=example.tfvars
```

Execute the plan:

terraform apply -var-file=example.tfvars

And you should see your resources created:

Apply complete! Resources: 16 added, 0 changed, 0 destroyed.

Outputs:

Configure your kubectl context:

gcloud container clusters get-credentials $(terraform output -raw gke_cluster_name) --region $(terraform output -raw gke_cluster_location)

Deploy vLLM Models

We need to deploy vLLM servers that will serve the model that the LlamaFirewall will secure.

Create a secret with your HuggingFace token:

kubectl create secret generic hf-token-secret --from-literal=token="<YOUR_TOKEN>"

Apply vLLM deployment manifest with base Llama model:
```
kubectl apply -f ../vllm/vllm-llama.yaml
```

Wait until these deployments are ready:

kubectl rollout status deployment/vllm-llama3

Deploy and Configure the Agent Application

This application consists of a simple ADK agent that uses Callbacks. It uses Before Model Callback and After Model Callback to invoke LlamaFirewall on the user prompt and the model’s response respectively.

For more info you can also look at the adk-app/secured_agent/agent.py file which has the variable secured_agent, which is an instance of the LlmAgent class and its two constructor arguments: before_model_callback and after_model_callback. These callbacks are used to invoke LLama Firewall.

Build image with our ADK application:

gcloud builds submit \
    --tag $(terraform output -raw image_repository_full_name)/secured-agent:latest \
    --project=$(terraform output -raw project_id) \
    ../adk-app

Deploy the manifest for the ADK application:

kubectl apply -f ../gen/secured-agent.yaml

Wait until the deployment is ready:

kubectl rollout status deployment/adk-agent

Testing

Open our ADK application’s URL in the web browser. If you enabled Identity Aware Proxy from our other guide, then use its URL. Otherwise, you can just use the port-forward command and open the link http://127.0.0.1:8000:
```
kubectl port-forward svc/adk-agent 8000:80
```
The ADK application web UI has to be opened with our llama_firewall_secured_agent agent being already selected. Try to do some prompting and make a normal prompt:
Then try some malicious prompt:

As you can see, the model’s response is the same as we specified in the agent’s code.

Cleaning up

Destroy the provisioned infrastructure.

terraform destroy -var-file=example.tfvars

Continue reading:

Agent ADK using GKE Autopilot Cluster with Llama and vLLM

This tutorial demonstrates how to deploy the Llama-3.1-8B-Instruct model on Google Kubernetes Engine (GKE) and vLLM for efficient inference. Additionally, it shows how to integrate an ADK agent to interact with the model, supporting both basic chat completions and tool usage. The setup leverages a GKE Autopilot cluster to handle the computational requirements.

Google ADK Agent with Memory Tools

ADK VertexAI Example

This tutorial guides you through deploying a containerized agent built with the [Google Agent Development Kit (ADK)](https://google.github.io/adk-docs/) to [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine/docs/concepts/kubernetes-engine-overview). The agent uses [VertexAI](https://cloud.google.com/vertex-ai/docs) to access LLMs. GKE provides a managed environment for deploying, managing, and scaling your containerized applications using Google infrastructure.

Deploying MCP Servers on GKE

This guide provides instructions for deploying a **Ray cluster with the AI Device Kit (ADK)** and a **custom Model Context Protocol (MCP) server** on **Google Kubernetes Engine (GKE)**. It covers setting up the infrastructure with Terraform, containerizing and deploying the Ray Serve application, deploying a custom MCP server for real-time weather data, and finally deploying an ADK agent that utilizes these components. The guide also includes steps for verifying deployments and cleaning up resources.

Agent on GKE using vLLM and Ray Serve

This tutorial demonstrates how to deploy the Llama-3.1-8B-Instruct model on Google Kubernetes Engine (GKE) using Ray Serve and vLLM for efficient inference. Additionally, it shows how to integrate an ADK agent to interact with the model, supporting both basic chat completions and tool usage. The setup leverages a GKE Standard cluster with GPU-enabled nodes to handle the computational requirements.