Tutorials & Notebooks


Fine-tuning

Learn how to fine-tune machine learning and AI models for your specific use cases. This section covers best practices, step-by-step guides, and practical examples to help you adapt pre-trained models to your data and tasks, improving performance and achieving better results with custom fine-tuning workflows.

Frameworks & Pipelines

Explore leading frameworks and pipelines for building, training, and deploying machine learning and AI models. This section provides overviews, best practices, and hands-on guides for integrating tools like Metaflow, MLflow, LangChain, and LlamaIndex into your AI/ML workflows, enabling efficient experiment tracking, workflow automation, and scalable model management.

GPU/TPU

Discover how to leverage GPUs and TPUs to accelerate machine learning and AI workloads. This section covers setup guides, best practices, and practical examples for utilizing GPU and TPU resources, enabling faster training, efficient inference, and scalable deployment of advanced models.

Job Schedulers

Learn how to efficiently manage and automate machine learning and AI workloads with job schedulers. This section covers popular job scheduling tools, configuration tips, and practical examples to help you orchestrate complex workflows, optimize resource utilization, and streamline large-scale model training and deployment.

Storage

Providing persistent and high-performance storage solutions for AI/ML workloads running on Google Kubernetes Engine (GKE).

Workflow orchestration

Workflow orchestration in the ai-on-gke project involves managing and automating the execution of complex, multi-step processes, primarily for AI/ML workloads on Google Kubernetes Engine (GKE).

Inference servers

Deploying and managing servers dedicated to performing inference tasks for machine learning models.

Security