Experimentation

Feedback

Was this page helpful?

Thank you for your feedback.

Continue reading:

Hugging Face TGI

This guide demonstrates how to deploy a Hugging Face Text Generation Inference (TGI) server on Google Kubernetes Engine (GKE) using NVIDIA L4 GPUs, enabling you to serve large language models like Mistral-7b-instruct. It walks you through creating a GKE cluster, deploying the TGI application, sending prompts to the model, and monitoring the service's performance using metrics, while also providing instructions for cleaning up the cluster.

NIM on GKE

This guide explains how to deploy NVIDIA NIM inference microservices on a Google Kubernetes Engine (GKE) cluster, requiring an NVIDIA AI Enterprise License for access to the models. It details the process of setting up a GKE cluster with GPU-enabled nodes, configuring access to the NVIDIA NGC registry, and deploying a NIM using a Helm chart with persistent storage. Finally, it demonstrates how to test the deployed NIM service by sending a sample prompt and verifying the response, ensuring the inference microservice is functioning correctly.

Llamaindex

This tutorial will guide you through creating a robust Retrieval-Augmented Generation (RAG) system using LlamaIndex and deploying it on Google Kubernetes Engine (GKE).

Slurm on GKE

This guide shows you how to deploy [Slurm](https://slurm.schedmd.com/documentation.html) on a Google Kubernetes Engine (GKE) cluster.

Jetstream

This tutorial shows you how to serve a large language model (LLM) using Tensor Processing Units (TPUs) on Google Kubernetes Engine (GKE) with [JetStream](https://github.com/google/JetStream) and [MaxText](https://github.com/google/maxtext).

LangChain Chatbot

In this tutorial, you will learn how to deploy a chatbot application using [LangChain](https://python.langchain.com/) and [Streamlit](https://streamlit.io/) on Google Cloud Platform (GCP).