GPU fungibility with DRA and Custom Compute Classes
This tutorial guides you through how to achieve GPU fungibility on GKE using Custom Compute Classes and Dynamic Resource Allocation (DRA).
Dynamic Resource Allocation (DRA) is a Kubernetes feature designed to modernize how workloads request and share specialized hardware, such as GPUs and other attached accelerators. By providing an experience similar to how Kubernetes handles storage, DRA allows developers to claim the exact hardware they need without getting bogged down in the manual complexities of per-node device management.
Historically, Kubernetes managed accelerators through the static Device Plugin model, which treated hardware as simple integer counts (e.g., “1 GPU”) and required platform teams to pre-configure rigid, dedicated node pools for every hardware variant or sharing configuration.
DRA shifts this paradigm by enabling:
ResourceClaims to dynamically request hardware, decoupling the application requirements from the underlying node configuration.Ultimately, DRA empowers developers to build high-performance applications more efficiently by providing a consistent, self-service, and scalable way to leverage specialized infrastructure across the entire cluster.
To learn more about the concepts, specifications, and architecture of DRA, refer to the official documentation:
This tutorial guides you through how to achieve GPU fungibility on GKE using Custom Compute Classes and Dynamic Resource Allocation (DRA).
This tutorial guides you through how to do device sharing of NVIDIA GPUs with Dynamic Resource Allocation on Google Kubernetes Engine (GKE) with the time slicing mode
This tutorial guides you through how to do device sharing of NVIDIA GPUs with Dynamic Resource Allocation on Google Kubernetes Engine (GKE) using Multi-Process Service (MPS) mode.
This tutorial guides you through how to do device sharing of NVIDIA GPUs with Dynamic Resource Allocation on Google Kubernetes Engine (GKE) using Multi-Instance GPU (MIG) mode.