Deploying MCP Servers on GKE: Building AI Agents with ADK and Ray-Served Models
Introduction
This guide shows how to host a Model Context Protocol (MCP) server with Server Sent Events (SSE) transport on Google Kubernetes Engine (GKE). MCP is an open protocol that standardizes how AI agents interact with their environment and external data sources. MCP clients can communicate with the MCP servers using two distinct transport mechanisms:
- Standard Input/Output (stdio) - direct process communication
- Server-Sent Events (SSE) or Streamable HTTP - web-based streaming communication
You have several options for deploying MCP:
-
Local Development: Host both MCP clients and servers on the same local machine
-
Hybrid Setup: Run an MCP client locally and have it communicate with remote MCP servers hosted on a cloud platform like GKE
-
Full Cloud Deployment: Host both MCP clients and servers on a cloud platform.
While GKE supports hosting MCP servers with stdio transport (through multi-container pods or sidecar patterns), streamable HTTP transport is the recommended approach for Kubernetes deployments. HTTP-based transport aligns better with Kubernetes networking principles, enables independent scaling of components, and provides better observability and debugging capabilities, and offers better security isolation between components.
Before you begin
Ensure you have the following tools installed on your workstation
If you previously installed the gcloud CLI, get the latest version by running:
gcloud components update
Ensure that you are signed in using the gcloud CLI tool. Run the following command:
gcloud auth application-default login
MCP server Development
You have two main approaches for implementing an MCP server:
- Developing your own MCP server: We recommend that you use an MCP server SDK to develop your MCP server, such as the official Python, TypeScript, Java, or Kotlin SDKs or FastMCP.
- Existing MCP servers: You can find a list of official and community MCP servers on the MCP servers GitHub repository. Docker Hub also provides a curated list of MCP servers.
Overview:
In the Building Agents with Agent Development Kit (ADK) on GKE Autopilot cluster using Self-Hosted LLM tutorial, we successfully built a weather agent. However, the weather agent cannot answer questions such as “What’s tomorrow’s weather in Seattle” because it lacks access to a live weather data source. In this tutorial, we’ll address this limitation by building and deploying a custom MCP server using FastMCP. This server will provide our agent with real-time weather capabilities and will be deployed on GKE. We will continue to use the same LLM backend powered by Ray Serve/vLLM (per Ray Serve for Self-Hosted LLMs tutorial).
Folder structure:
tutorials-and-examples/adk/ray-mcp/
├── adk_agent/
│ └── weather_agent/
│ │ ├── __init__.py
│ │ ├── weather_agent.py
│ │ └── deployment_agent.yaml
│ ├── main.py
│ └── requirements.txt
│ └── Dockerfile
│
└── mcp_server/
│ ├── weather_mcp.py
│ └── deployment_weather_mcp.yaml
│ └── Dockerfile
│ └── requirements.txt
│
└── terraform/
├── artifact_registry.tf
└── main.tf
└── outputs.tf
└── variables.tf
└── default_env.tfvars
└── network.tf
└── providers.tf
└── workload_identity.tf
Step 1: Set Up the Infrastructure with Terraform
Start by setting up the GKE cluster, service account, IAM roles, and Artifact Registry using Terraform.
Download the code and navigate to the tutorial directory:
git clone https://github.com/ai-on-gke/tutorials-and-examples.git
cd tutorials-and-examples/adk/ray-mcp/terraform
Set the environment variables, replacing <PROJECT_ID>
and <MY_HF_TOKEN>
:
gcloud config set project <PROJECT_ID>
export PROJECT_ID=$(gcloud config get project)
export REGION=$(terraform output -raw gke_cluster_location)
export HF_TOKEN=<MY_HF_TOKEN>
export CLUSTER_NAME=$(terraform output -raw gke_cluster_name)
Update the <PROJECT_ID>
placeholder in default_env.tfvars
with your own Google Cloud Project ID Name.
Initialize Terraform, inspect plan and apply the configuration:
terraform init
terraform plan --var-file=./default_env.tfvars
terraform apply --var-file=./default_env.tfvars
Review the plan and type yes to confirm. This will create:
- A GKE Autopilot cluster named
llama-ray-cluster
. - A service account
adk-ray-agent-sa
. - An IAM role binding granting the service account
roles/artifactregistry.reader
. - An Artifact Registry repository
llama-ray
.
Configure kubectl
to communicate with the cluster:
gcloud container clusters get-credentials $CLUSTER_NAME --region=$REGION --project $PROJECT_ID
Step 2: Containerize and Deploy the Ray Serve Application
Before deploying our MCP server, we need to set up the Ray Serve application that will power our LLM backend. Follow the Ray Serve for Self-Hosted LLMs tutorial to deploy the LLM backend. Specifically, complete Step 2: Containerize and Deploy the Ray Serve Application
from that guide. You can access the content of this step by running this command:
cd tutorials-and-examples/ray-serve/ray-serve-vllm
After completing the Step 2
in Ray Serve tutorial, verify the deployment:
kubectl get pods | grep ray
kubectl get services | grep ray
You should see Ray head and worker pods running, plus Ray services.
Next: Once Ray Serve is running, proceed to Step 3 to deploy our MCP server that will connect to this LLM backend.
Step 3: Deploy the MCP server
Navigate to MCP Server Directory
cd tutorials-and-examples/adk/ray-mcp/mcp_server
Let’s create a new namespace where we deploy our ADK application and MCP Server:
kubectl create namespace adk-weather-tutorial
Build and push the MCP Server container image:
gcloud builds submit \
--tag us-docker.pkg.dev/$PROJECT_ID/llama-ray/mcp-server:latest \
--project=$PROJECT_ID .
Update the <PROJECT_ID>
placeholders in the ./deployment_weather_mcp.yaml
file where applicable. Apply the manifest:
kubectl apply -f deployment_weather_mcp.yaml
Test with MCP Inspector
Let’s validate our MCP server using the official MCP Inspector tool.
Run this command to port-forward the MCP Server:
kubectl -n adk-weather-tutorial port-forward svc/weather-mcp-server 8000:8080
In another terminal session, run this command:
npx @modelcontextprotocol/inspector@0.14.2
Expected output:
Starting MCP inspector...
⚙️ Proxy server listening on 127.0.0.1:6277
🔑 Session token: <SESSION_TOKEN>
Use this token to authenticate requests or set DANGEROUSLY_OMIT_AUTH=true to disable auth
🔗 Open inspector with token pre-filled:
http://localhost:6274/?MCP_PROXY_AUTH_TOKEN=<SESSION_TOKEN>
(Auto-open is disabled when authentication is enabled)
🔍 MCP Inspector is up and running at http://127.0.0.1:6274
To connect to your MCP Server, you need to do the following:
Transport Type
- chooseSSE
.URL
- pastehttp://127.0.0.1:8000/sse
.Configuration
->Proxy Session Token
- paste<SESSION_TOKEN>
from the terminal (see example logs above).
Press the Connect
button, and navigate to the tools
tab. Here you can click the List Tools
button and check how these tools work.
Now you can cancel the port-forwarding and close the inspector.
Step 4: Deploy the ADK Agent
Navigate to the ADK agent directory:
cd tutorials-and-examples/adk/ray-mcp/adk_agent
Build and push the ADK agent container image:
gcloud builds submit \
--tag us-docker.pkg.dev/$PROJECT_ID/llama-ray/adk-agent:latest \
--project=$PROJECT_ID .
Update the ./deployment_agent.yaml
file <PROJECT-ID>
placeholders where applicable. Apply the manifest:
kubectl apply -f deployment_agent.yaml
Verify the deployment:
-
Check the pods:
kubectl -n adk-weather-tutorial get pods
You should see five pods: the two Ray pods and the ADK agent pod.
NAME READY STATUS RESTARTS AGE adk-agent-6c8488db64-hjt86 1/1 Running 0 61m kuberay-operator-bb8d4d9c4-kwjml 1/1 Running 2 (177m ago) 3h1m llama-31-8b-raycluster-v8vj4-gpu-group-worker-ttfp7 1/1 Running 0 162m llama-31-8b-raycluster-v8vj4-head-ppt6t 1/1 Running 0 162m weather-mcp-server-79748fd6b5-8h4m7 1/1 Running 0 43m
-
Check the services:
kubectl -n adk-weather-tutorial get services
You should see seven services, including the ADK service.
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE adk-agent ClusterIP 34.118.235.225 <none> 80/TCP 64m kuberay-operator ClusterIP 34.118.236.198 <none> 8080/TCP 3h5m kubernetes ClusterIP 34.118.224.1 <none> 443/TCP 3h40m llama-31-8b-head-svc ClusterIP None <none> 10001/TCP,8265/TCP,6379/TCP,8080/TCP,8000/TCP 153m llama-31-8b-raycluster-v8vj4-head-svc ClusterIP None <none> 10001/TCP,8265/TCP,6379/TCP,8080/TCP,8000/TCP 165m llama-31-8b-serve-svc ClusterIP 34.118.233.111 <none> 8000/TCP 153m weather-mcp-server ClusterIP 34.118.239.33 <none> 8080/TCP 46m
-
Access your ADK Agent using port-forwarding:
kubectl -n adk-weather-tutorial port-forward svc/adk-agent 8000:80
You should see the following output:
Forwarding from 127.0.0.1:8000 -> 8080 Forwarding from [::1]:8000 -> 8080
Navigate to http://127.0.0.1:8000 and test your agent.
Step 5: Clean Up
Destroy the provisioned infrastructure.
cd tutorials-and-examples/adk/ray-mcp/terraform
terraform destroy -var-file=default_env.tfvars
Feedback
Was this page helpful?
Thank you for your feedback.
Thank you for your feedback.