This tutorial guides you through creating a movie recommendation Retrieval-Augmented Generation (RAG) system using Agentic LlamaIndex and deploying it on Google Kubernetes Engine (GKE).
This tutorial will guide you through creating a robust Retrieval-Augmented Generation (RAG) system using LlamaIndex and deploying it on Google Kubernetes Engine (GKE).
This tutorial shows you who to serve a large language model (LLM) using both Tensor Processing Units (TPUs) and GPUs on Google Kubernetes Engine (GKE) using the same deployment with vLLM