Serving

Kserve

This tutorial will guide you step by step through the process of installing KServe in a GKE Autopilot cluster.

Tags:

Deploying a Persistent Chatbot on Google Cloud Platform with LangChain, Streamlit, and IAP

In this tutorial, you will learn how to deploy a chatbot application using LangChain and Streamlit on Google Cloud Platform (GCP).

Tags:

Llamaindex in GKE cluster

This tutorial will guide you through creating a robust Retrieval-Augmented Generation (RAG) system using LlamaIndex and deploying it on Google Kubernetes Engine (GKE).

Tags:

Efficient GPU Resource Management for ML Workloads using SkyPilot, Kueue on GKE

This tutorial expands on the SkyPilot Tutorial by leveraging Dynamic Workload Scheduler with the help of an open-source project called Kueue

Tags:

vLLM GPU/TPU Fungibility

This tutorial shows you who to serve a large language model (LLM) using both Tensor Processing Units (TPUs) and GPUs on Google Kubernetes Engine (GKE) using the same deployment with vLLM

Tags: Serving

Kserve

Deploying a Persistent Chatbot on Google Cloud Platform with LangChain, Streamlit, and IAP

Llamaindex in GKE cluster

Efficient GPU Resource Management for ML Workloads using SkyPilot, Kueue on GKE

vLLM GPU/TPU Fungibility