

# Allocating compute quota in Amazon SageMaker HyperPod task governance
<a name="sagemaker-hyperpod-eks-operate-console-ui-governance-policies-compute-allocation"></a>

Cluster administrators can decide how the organization uses purchased compute. Doing so reduces waste and idle resources. You can allocate compute quota such that teams can borrow unused resources from each other. Compute quota allocation in HyperPod task governance lets administrators allocate resources at the instance level and at a more granular resource level. This capability provides flexible and efficient resource management for teams by allowing granular control over individual compute resources instead of requiring entire instance allocations. Allocating at a granular level eliminates inefficiencies of traditional instance-level allocation. Through this approach, you can optimize resource utilization and reduce idle compute.

Compute quota allocation supports three types of resource allocation: accelerators, vCPU, and memory. Accelerators are components in accelerated computer instances that that perform functions, such as floating point number calculations, graphics processing, or data pattern matching. Accelerators include GPUs, Trainium accelerators, and neuron cores. For multi-team GPU sharing, different teams can receive specific GPU allocations from the same instance type, maximizing utilization of accelerator hardware. For memory-intensive workloads that require additional RAM for data preprocessing or model caching scenarios, you can allocate memory quota beyond the default GPU-to-memory ratio. For CPU-heavy preprocessing tasks that need substantial CPU resources alongside GPU training, you can allocate independent CPU resource allocation.

Once you provide a value, HyperPod task governance calculates the ratio using the formula **allocated resource divided by the total amount of resources available in the instance**. HyperPod task governance then uses this ratio to apply default allocations to other resources, but you can override these defaults and customize them based on your use case. The following are sample scenarios of how HyperPod task governance allocates resources based on your values:
+ **Only accelerator specified** - HyperPod task governance applies the default ratio to vCPU and memory based on the accelerator values.
+ **Only vCPU specified** - HyperPod task governance calculates the ratio and applies it to memory. Accelerators are set to 0.
+ **Only memory specified** - HyperPod task governance calculates the ratio and applies it to vCPU because compute is required to run memory-specified workloads. Accelerators are set to 0.

To programmatically control quota allocation, you can use the [ ComputeQuotaResourceConfig](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ComputeQuotaResourceConfig.html) object and specify your allocations in integers.

```
{
    "ComputeQuotaConfig": {
        "ComputeQuotaResources": [{
            "InstanceType": "ml.g5.24xlarge",
            "Accelerators": "16",
            "vCpu": "200.0",
            "MemoryInGiB": "2.0"
        }]
    }
}
```

To see all of the allocated allocations, including the defaults, use the [ DescribeComputeQuota](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeComputeQuota.html) operation. To update your allocations, use the [ UpdateComputeQuota](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateComputeQuota.html) operation.

You can also use the HyperPod CLI to allocate compute quotas. For more information about the HyperPod CLI, see [Running jobs on SageMaker HyperPod clusters orchestrated by Amazon EKS](sagemaker-hyperpod-eks-run-jobs.md). The following example demonstrates how to set compute quotas using the HyperPod CLI.

```
hyp create hyp-pytorch-job --version 1.1 --job-name sample-job \
--image 123456789012.dkr.ecr.us-west-2.amazonaws.com/ptjob:latest \
--pull-policy "Always" \
--tasks-per-node 1 \
--max-retry 1 \
--priority high-priority \
--namespace hyperpod-ns-team-name \
--queue-name hyperpod-ns-team-name-localqueue \
--instance-type sample-instance-type \
--accelerators 1 \
--vcpu 3 \
--memory 1 \
--accelerators-limit 1 \
--vcpu-limit 4 \
--memory-limit 2
```

To allocate quotas using the AWS console, follow these steps.

1. Open the Amazon SageMaker AI console at [https://console.aws.amazon.com/sagemaker/](https://console.aws.amazon.com/sagemaker/).

1. Under HyperPod clusters, choose **Cluster management**.

1. Under **Compute allocations**, choose **Create**.

1. If you don’t already have instances, choose **Add allocation** to add an instance.

1. Under **Allocations**, choose to allocate by instances or individual resources. If you allocate by individual resources, SageMaker AI automatically assigns allocations to other resources by the ratio that you chose. To override this ratio-based allocation, use the corresponding toggle to override that compute.

1. Repeat steps 4 and 5 to configure additional instances.

After allocating compute quota, you can then submit jobs through the HyperPod CLI or `kubectl`. HyperPod efficiently schedules workloads based on available quota. 

# Allocating GPU partition quota
<a name="sagemaker-hyperpod-eks-operate-console-ui-governance-policies-compute-allocation-gpu-partitions"></a>

You can extend compute quota allocation to support GPU partitioning, enabling fine-grained resource sharing at the GPU partition level. When GPU partitioning is enabled on supported GPUs in the cluster, each physical GPU can be partitioned into multiple isolated GPUs with defined compute, memory, and streaming multiprocessor allocations. For more information about GPU partitioning, see [Using GPU partitions in Amazon SageMaker HyperPod](sagemaker-hyperpod-eks-gpu-partitioning.md). You can allocate specific GPU partitions to teams, allowing multiple teams to share a single GPU while maintaining hardware-level isolation and predictable performance.

For example, an ml.p5.48xlarge instance with 8 H100 GPUs can be partitioned into GPU partitions, and you can allocate individual partitions to different teams based on their task requirements. When you specify GPU partition allocations, HyperPod task governance calculates proportional vCPU and memory quotas based on the GPU partition, similar to GPU-level allocation. This approach maximizes GPU utilization by eliminating idle capacity and enabling cost-effective resource sharing across multiple concurrent tasks on the same physical GPU.

## Creating Compute Quotas
<a name="sagemaker-hyperpod-eks-operate-console-ui-governance-policies-compute-allocation-gpu-partitions-creating"></a>

```
aws sagemaker create-compute-quota \
  --name "fractional-gpu-quota" \
  --compute-quota-config '{
    "ComputeQuotaResources": [
      {
        "InstanceType": "ml.p4d.24xlarge",
        "AcceleratorPartition": {
            "Count": 4,
            "Type": "mig-1g.5gb"
        }
      }
    ],
    "ResourceSharingConfig": { 
      "Strategy": "LendAndBorrow", 
      "BorrowLimit": 100 
    }
  }'
```

## Verifying Quota Resources
<a name="sagemaker-hyperpod-eks-operate-console-ui-governance-policies-compute-allocation-gpu-partitions-verifying"></a>

```
# Check ClusterQueue
kubectl get clusterqueues
kubectl describe clusterqueue QUEUE_NAME

# Check ResourceFlavors
kubectl get resourceflavor
kubectl describe resourceflavor FLAVOR_NAME
```