Workflow orchestration

Workflow orchestration in the ai-on-gke project involves managing and automating the execution of complex, multi-step processes, primarily for AI/ML workloads on Google Kubernetes Engine (GKE).

DWS

This guide provides examples of how to use Dynamic Workload Scheduler (DWS) within Google Kubernetes Engine (GKE), leveraging Kueue for queue management and resource provisioning. It includes sample configurations for Kueue queues with DWS support (dws-queue.yaml) and a sample job definition (job.yaml) that demonstrates how to request resources and set a maximum run duration using DWS.

Running Flyte on GKE

This guide illustrates the deployment of Flyte on Google Kubernetes Engine (GKE) using Helm, utilizing Google Cloud Storage for scalable data storage and Cloud SQL PostgreSQL for a reliable metadata store. By the end of this tutorial, you will have a fully functional Flyte instance on GKE, offering businesses seamless integration with the GCP ecosystem, improved resource efficiency, and cost-effectiveness.

Multikueue, DWS and GKE Autopilot

In this guide you will learn how to set up a multi-cluster environment where job computation is distributed across three GKE clusters in different regions using MultiKueue, Dynamic Workload Scheduler (DWS), and GKE Autopilot.

Ray on GKE

This guide provides instructions and examples for deploying and managing Ray clusters on Google Kubernetes Engine (GKE) using KubeRay and Terraform. It covers setting up a GKE cluster, deploying a Ray cluster, submitting Ray jobs, and using the Ray Client for interactive sessions. The guide also points to various resources, including tutorials, best practices, and examples for running different types of Ray applications on GKE, such as serving LLMs, using TPUs, and integrating with GCS.

Skypilot

In this tutorial, we will demonstrate how to leverage the open-source software SkyPilot to help GKE customers efficiently obtain accelerators across regions, ensuring workload continuity and optimized resource utilization.

Continue reading: