Workflow orchestration
Workflow orchestration in the ai-on-gke project involves managing and automating the execution of complex, multi-step processes, primarily for AI/ML workloads on Google Kubernetes Engine (GKE).
Workflow orchestration in the ai-on-gke project involves managing and automating the execution of complex, multi-step processes, primarily for AI/ML workloads on Google Kubernetes Engine (GKE).
In this tutorial, we will demonstrate how to leverage the open-source software SkyPilot to help GKE customers efficiently obtain accelerators across regions, ensuring workload continuity and optimized resource utilization.
This guide provides examples of how to use Dynamic Workload Scheduler (DWS) within Google Kubernetes Engine (GKE), leveraging Kueue for queue management and resource provisioning. It includes sample configurations for Kueue queues with DWS support (dws-queue.yaml) and a sample job definition (job.yaml) that demonstrates how to request resources and set a maximum run duration using DWS.
This guide illustrates the deployment of Flyte on Google Kubernetes Engine (GKE) using Helm, utilizing Google Cloud Storage for scalable data storage and Cloud SQL PostgreSQL for a reliable metadata store. By the end of this tutorial, you will have a fully functional Flyte instance on GKE, offering businesses seamless integration with the GCP ecosystem, improved resource efficiency, and cost-effectiveness.
This guide demonstrates how to deploy a Hugging Face Text Generation Inference (TGI) server on Google Kubernetes Engine (GKE) using NVIDIA L4 GPUs, enabling you to serve large language models like Mistral-7b-instruct. It walks you through creating a GKE cluster, deploying the TGI application, sending prompts to the model, and monitoring the service’s performance using metrics, while also providing instructions for cleaning up the cluster.
This project allows you to download a Hugging Face model and package it as a Docker image. The Docker image can then be pushed to Google Artifact Registry for deployment or distribution. Build time can be significant for large models, it is recommended to not exceed models above 10 billion parameters. For reference 8b model roughly takes 35 minutes to build and push with this cloudbuild config.
In this guide you will learn how to set up a multi-cluster environment where job computation is distributed across three GKE clusters in different regions using MultiKueue, Dynamic Workload Scheduler (DWS), and GKE Autopilot.
This tutorial expands on the SkyPilot Tutorial by leveraging Dynamic Workload Scheduler with the help of an open-source project called Kueue
In this tutorial, we will demonstrate how to leverage the open-source software SkyPilot to help GKE customers efficiently obtain accelerators across regions, ensuring workload continuity and optimized resource utilization.