Benchmarking


GKE at 65,000 Nodes: Simulated AI Workload Benchmark

This guide outlines the process of benchmarking a 65,000-node Google Kubernetes Engine (GKE) cluster using CPU-only machines to simulate AI workloads and evaluate the Kubernetes control plane’s performance. It details how to deploy the cluster with Terraform, run diverse simulated AI workloads (including training and inference) using ClusterLoader2, and collect performance metrics to assess scalability and stability. The benchmark results provide insights into pod state transitions, scheduling throughput, and API server latency under extreme load, allowing for a comprehensive evaluation of the control plane’s capabilities.

Inference Benchmark

A model server agnostic inference benchmarking tool that can be used to benchmark LLMs running on differet infrastructure like GPU and TPU. It can also be run on a GKE cluster as a container.

Continue reading: