Dynamic Resource Allocation

Learn how to use Dynamic Resource Allocation (DRA) in Kubernetes to optimize the utilization of GPUs and TPUs.

Dynamic Resource Allocation (DRA) is a Kubernetes feature designed to modernize how workloads request and share specialized hardware, such as GPUs and other attached accelerators. By providing an experience similar to how Kubernetes handles storage, DRA allows developers to claim the exact hardware they need without getting bogged down in the manual complexities of per-node device management.

Why DRA Matters

Historically, Kubernetes managed accelerators through the static Device Plugin model, which treated hardware as simple integer counts (e.g., “1 GPU”) and required platform teams to pre-configure rigid, dedicated node pools for every hardware variant or sharing configuration.

DRA shifts this paradigm by enabling:

  • Storage-Like Claims: Workloads use ResourceClaims to dynamically request hardware, decoupling the application requirements from the underlying node configuration.
  • Infrastructure Flexibility: The same physical hardware pool can be dynamically partitioned or shared (using Time-Slicing, MPS, or MIG) on the fly, depending on active workload requests.
  • Declarative Scheduling Constraints: Developers can use CEL (Common Expression Language) selectors to request specific hardware attributes (like memory sizes or interconnect topologies), ensuring the scheduler automatically matches the application with the most suitable equipment.

Ultimately, DRA empowers developers to build high-performance applications more efficiently by providing a consistent, self-service, and scalable way to leverage specialized infrastructure across the entire cluster.

Resources

To learn more about the concepts, specifications, and architecture of DRA, refer to the official documentation:


GPU fungibility with DRA and Custom Compute Classes

This tutorial guides you through how to achieve GPU fungibility on GKE using Custom Compute Classes and Dynamic Resource Allocation (DRA).

Time slicing of GPUs with DRA

This tutorial guides you through how to do device sharing of NVIDIA GPUs with Dynamic Resource Allocation on Google Kubernetes Engine (GKE) with the time slicing mode

Device Sharing of GPUs with DRA using Multi-Process Service (MPS)

This tutorial guides you through how to do device sharing of NVIDIA GPUs with Dynamic Resource Allocation on Google Kubernetes Engine (GKE) using Multi-Process Service (MPS) mode.

Device sharing of GPUs with DRA using MIG

This tutorial guides you through how to do device sharing of NVIDIA GPUs with Dynamic Resource Allocation on Google Kubernetes Engine (GKE) using Multi-Instance GPU (MIG) mode.

Device alignment of GPU, NIC, and CPU with DRA

Learn how to achieve optimal hardware alignment for GPUs, NICs, and exclusive CPUs on GKE using Dynamic Resource Allocation (DRA) to maximize performance.

Dynamic Resource Allocation of TPUs

This tutorial guides you through how to dynamically allocate Google Cloud TPUs using Dynamic Resource Allocation (DRA) on Google Kubernetes Engine (GKE).

Continue reading: