

# Training plans utilization for Amazon SageMaker HyperPod clusters
<a name="training-plan-utilization-for-hyperpod"></a>

To use SageMaker training plans for your Amazon SageMaker HyperPod cluster, you specify the training plan you want to use at the cluster instance level when creating or updating your cluster. 

**Note**  
The training plan must be in the `Scheduled` or `Active` status to be used by an HyperPod cluster.
Ensure the cluster configuration aligns with the Availability Zone (AZ) specified in your training plan.  
For VPC setup, resource location, and security group configuration, refer to [Setting up SageMaker HyperPod with a custom Amazon VPC](sagemaker-hyperpod-prerequisites.md#sagemaker-hyperpod-prerequisites-optional-vpc) in the SageMaker HyperPod documentation.  
If setting up HyperPod with Amazon FSx for Lustre, learn about Region and AZ selection, review VPC configuration requirements, and understand AZ alignment best practices in [(Optional) Setting up SageMaker HyperPod with Amazon FSx for Lustre](sagemaker-hyperpod-prerequisites.md#sagemaker-hyperpod-prerequisites-optional-fsx).
You can select a plan for each of your instance groups. However, we do not recommend using a training plan for the primary instance group of a cluster, as primary nodes require continuous, stable resources that don't align with the fixed duration and potentially discontinuous nature of training plan capacities.

**Topics**
+ [Create a SageMaker HyperPod cluster on training plans using the SageMaker AI console](use-training-plan-for-hyperpod-creation-using-console.md)
+ [Update a SageMaker HyperPod cluster on training plans using the SageMaker AI console](use-training-plan-for-hyperpod-update-using-console.md)
+ [Create a SageMaker HyperPod cluster on training plans using the SageMaker API, or AWS CLI](use-training-plan-for-hyperpod-creation-using-api-cli-sdk.md)
+ [Update a SageMaker HyperPod cluster on training plans using the SageMaker API, or AWS CLI](use-training-plan-for-hyperpod-update-using-api-cli-sdk.md)

# Create a SageMaker HyperPod cluster on training plans using the SageMaker AI console
<a name="use-training-plan-for-hyperpod-creation-using-console"></a>

To create an SageMaker HyperPod cluster using training plans from the SageMaker AI console UI, follow these steps:

1. Navigate to the SageMaker AI console at [https://console.aws.amazon.com/sagemaker/](https://console.aws.amazon.com/sagemaker/).

1. In the left navigation pane, choose **Hyperpod**, and then **Create cluster**.

1. When configuring an instance group, you can select a plan that aligns with your compute capacity needs.

![\[SageMaker AI console interface showing a modal window for creating an instance group within an SageMaker HyperPod cluster. The form includes fields for instance group name, instance type, quantity, instance capacity (with options for on-demand and training plans), and a directory path for on-create lifecycle script.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/training-plans/tp-create-hyperpod-cluster.png)


Review and create your cluster. Instance groups using a training plan scale up to the specified target instance count when the training plan becomes `Active`, subject to available capacity. Thirty minutes before each Reserved Capacity period ends, the instance group begins scaling down to zero instances. This scaled-down state persists until the next Reserved Capacity period begins or the plan ends. Throughout this process, an healthy instance group maintains an `InService` status after its initial creation, regardless of the current instance count.

# Update a SageMaker HyperPod cluster on training plans using the SageMaker AI console
<a name="use-training-plan-for-hyperpod-update-using-console"></a>

You can update, remove, or add a training plan to an existing SageMaker HyperPod cluster using the SageMaker AI console UI. To update the instance group of an SageMaker HyperPod cluster, follow these steps:

1. Navigate to the SageMaker AI console at [https://console.aws.amazon.com/sagemaker/](https://console.aws.amazon.com/sagemaker/).

1. In the left navigation pane, choose **Hyperpod**.

1. Navigate to the cluster's details page by following the hyperlink associated with the cluster name.

1. When configuring an instance group, you can update your plan to align with your new compute capacity needs.

![\[SageMaker AI console interface showing a modal window for updating an instance group within an SageMaker HyperPod cluster. The form includes fields for instance group name, instance type, quantity, instance capacity (with options for on-demand and training plans), and a directory path for on-create lifecycle script.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/training-plans/tp-update-hyperpod-clusters.png)


Review and update your cluster.

# Create a SageMaker HyperPod cluster on training plans using the SageMaker API, or AWS CLI
<a name="use-training-plan-for-hyperpod-creation-using-api-cli-sdk"></a>

To use SageMaker training plans for your Amazon SageMaker HyperPod cluster, specify the ARN of the training plan you want to use in the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ClusterInstanceGroupSpecification.html#sagemaker-Type-ClusterInstanceGroupSpecification-TrainingPlanArn](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ClusterInstanceGroupSpecification.html#sagemaker-Type-ClusterInstanceGroupSpecification-TrainingPlanArn) parameter of the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ClusterInstanceGroupSpecification.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ClusterInstanceGroupSpecification.html) when calling the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateCluster.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateCluster.html) API operation. 

Ensure that the subnet associated with the designated AZ of your plan is included in the `VPCConfig` of your cluster configuration. You can retrieve the `AvailabilityZone` of a training plan in the response of a [``DescribeTrainingPlan](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeTrainingPlan.html) API call.

The following sample illustrates how to create a new SageMaker HyperPod cluster and provide an instance group with a training plan in the `--instance-groups` attribute of the `create-cluster` AWS CLI command. 

```
# Create a cluster         
aws sagemaker create-cluster \
  --cluster-name cluster-name \
  --instance-groups '[ \
        { \
            "InstanceCount": 1,\
            "InstanceGroupName": "controller-nodes",\
            "InstanceType": "ml.t3.xlarge",\
            "LifeCycleConfig": {"SourceS3Uri": source_s3_uri, "OnCreate": "on_create.sh"},\
            "ExecutionRole": "arn:aws:iam::customer_account_id:role/execution_role",\
            "ThreadsPerCore": 1,\
        },\
        { \
            "InstanceCount": 2, \
            "InstanceGroupName": "worker-nodes",\
            "InstanceType": "p4d.24xlarge",\
            "LifeCycleConfig": {"SourceS3Uri": source_s3_uri, "OnCreate": "on_create.sh"},\
            "ExecutionRole": "arn:aws:iam::customer_account_id}:role/execution_role}",\
            "ThreadsPerCore": 1,\
            "TrainingPlanArn": training_plan_arn,\
        }]'
```

For information about how to create an HyperPod cluster using the AWS CLI, see [https://docs.aws.amazon.com/cli/latest/reference/sagemaker/create-cluster.html](https://docs.aws.amazon.com/cli/latest/reference/sagemaker/create-cluster.html).

After creating the cluster, you can verify that your instance group was properly assigned capacity from the training plan by calling the `DescribeCluster` API.

```
aws sagemaker describe-cluster --cluster-name cluster-name
```

# Update a SageMaker HyperPod cluster on training plans using the SageMaker API, or AWS CLI
<a name="use-training-plan-for-hyperpod-update-using-api-cli-sdk"></a>

You can add, update, or remove a training plan by updating the instance group of an existing cluster using the `update-cluster` AWS CLI command. The following sample illustrates how to update a SageMaker HyperPod cluster and provide an instance group with a new training plan.

```
# Update a cluster
aws sagemaker update-cluster \
  --cluster-name cluster-name \
  --instance-groups '[ \
        { \
            "InstanceCount": 1,\
            "InstanceGroupName": "controller-nodes",\
            "InstanceType": "ml.t3.xlarge",\
            "LifeCycleConfig": {"SourceS3Uri": source_s3_uri, "OnCreate": "on_create.sh"},\
            "ExecutionRole": "arn:aws:iam::customer_account_id:role/execution_role",\
            "ThreadsPerCore": 1,\
        },\
        { \
            "InstanceCount": 2, \
            "InstanceGroupName": "worker-nodes",\
            "InstanceType": "p4d.24xlarge",\
            "LifeCycleConfig": {"SourceS3Uri": source_s3_uri, "OnCreate": "on_create.sh"},\
            "ExecutionRole": "arn:aws:iam::customer_account_id}:role/execution_role}",\
            "ThreadsPerCore": 1,\
            "TrainingPlanArn": training_plan_arn,\
        },\
        {\
            "InstanceCount": 1,\
            "InstanceGroupName": "worker-nodes-2",\
            "InstanceType": "p4d.24xlarge",\
            "LifeCycleConfig": {"SourceS3Uri": source_s3_uri, "OnCreate": "on_create.sh"},\
            "ExecutionRole": "arn:aws:iam::customer_account_id:role/execution_role",\
            "ThreadsPerCore": 1,\
            "TrainingPlanArn": training_plan_arn,\
        }\
    ]'
```