Benchmarking

GKE at 65,000 Nodes: Simulated AI Workload Benchmark

This guide outlines the process of benchmarking a 65,000-node Google Kubernetes Engine (GKE) cluster using CPU-only machines to simulate AI workloads and evaluate the Kubernetes control plane’s performance. It details how to deploy the cluster with Terraform, run diverse simulated AI workloads (including training and inference) using ClusterLoader2, and collect performance metrics to assess scalability and stability. The benchmark results provide insights into pod state transitions, scheduling throughput, and API server latency under extreme load, allowing for a comprehensive evaluation of the control plane’s capabilities.

Inference Benchmark

A model server agnostic inference benchmarking tool that can be used to benchmark LLMs running on differet infrastructure like GPU and TPU. It can also be run on a GKE cluster as a container.

Continue reading:

KServe

This tutorial will guide you step by step through the process of installing KServe in a GKE Autopilot cluster.

DWS

This guide provides examples of how to use Dynamic Workload Scheduler (DWS) within Google Kubernetes Engine (GKE), leveraging Kueue for queue management and resource provisioning. It includes sample configurations for Kueue queues with DWS support (dws-queue.yaml) and a sample job definition (job.yaml) that demonstrates how to request resources and set a maximum run duration using DWS.

Checkpoints

Overviews how to convert your inference checkpoint for various model servers

Multi-Agent Code Development with Flowise

This tutorial will provide instructions on how to deploy and use [FlowiseAI](https://flowiseai.com/) on GKE (Google Kubernetes Engine) to build and operate AI applications using a low-code/no-code approach.

LangChain Chatbot

In this tutorial, you will learn how to deploy a chatbot application using [LangChain](https://python.langchain.com/) and [Streamlit](https://streamlit.io/) on Google Cloud Platform (GCP).

65k-nodes-benchmark

This guide outlines the process of benchmarking a 65,000-node Google Kubernetes Engine (GKE) cluster using CPU-only machines to simulate AI workloads and evaluate the Kubernetes control plane's performance. It details how to deploy the cluster with Terraform, run diverse simulated AI workloads (including training and inference) using ClusterLoader2, and collect performance metrics to assess scalability and stability. The benchmark results provide insights into pod state transitions, scheduling throughput, and API server latency under extreme load, allowing for a comprehensive evaluation of the control plane's capabilities.