

# Tutorials on how to use AWS ParallelCluster
<a name="tutorials-v3"></a>

The following tutorials show you how to get started with AWS ParallelCluster version 3, and provide best practice guidance for some common tasks.

When using the AWS ParallelCluster command line interface (CLI) or API, you only pay for the AWS resources that are created when you create or update AWS ParallelCluster images and clusters. For more information, see [AWS services used by AWS ParallelCluster](aws-services-v3.md).

**Topics**
+ [Running your first job on AWS ParallelCluster](tutorials-running-your-first-job-on-version-3.md)
+ [Building a custom AWS ParallelCluster AMI](building-custom-ami-v3.md)
+ [Integrating Active Directory](tutorials_05_multi-user-ad.md)
+ [Configuring shared storage encryption with an AWS KMS key](tutorials_04_encrypted_kms_fs-v3.md)
+ [Running jobs in a multiple queue mode cluster](multi-queue-tutorial-v3.md)
+ [Using the AWS ParallelCluster API](tutorials_06_API_use.md)
+ [Creating a cluster with Slurm accounting](tutorials_07_slurm-accounting-v3.md)
+ [Creating a cluster with an external Slurmdbd accounting](external-slurmdb-accounting.md)
+ [Reverting to a previous AWS Systems Manager document version](tutorials_08_ssm-document-version-rev-v3.md)
+ [Creating a cluster with CloudFormation](tutorials_09_cfn-custom-resource-v3.md)
+ [Deploy ParallelCluster API with Terraform](tutorial-deploy-terraform.md)
+ [Creating a cluster with Terraform](tutorial-create-cluster-terraform.md)
+ [Creating a custom AMI with Terraform](tutorial-create-ami-terraform.md)
+ [AWS ParallelCluster UI Integration with Identity Center](tutorials_10_pcui-aws-ic-integration-v3.md)
+ [Running containerized jobs with Pyxis](tutorials_11_running-containerized-jobs-with-pyxis.md)
+ [Creating a cluster with an EFA-enabled FSx Lustre](tutorial-efa-enabled-fsx-lustre.md)
+ [Support NVIDIA-Imex with p6e-gb200 instance](support-nvidia-imex-p6e-gb200-instance.md)
+ [Customize compute node network interfaces with launch template overrides](tutorial-network-customization-v3.md)

# Running your first job on AWS ParallelCluster
<a name="tutorials-running-your-first-job-on-version-3"></a>

This tutorial walks you through running your first Hello World job on AWS ParallelCluster

When using the AWS ParallelCluster command line interface (CLI) or API, you only pay for the AWS resources that are created when you create or update AWS ParallelCluster images and clusters. For more information, see [AWS services used by AWS ParallelCluster](aws-services-v3.md).

**Prerequisites**
+ AWS ParallelCluster [is installed](install-v3-parallelcluster.md).
+ The AWS CLI [is installed and configured.](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)
+ You have an [Amazon EC2 key pair](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html).
+ You have an IAM role with the [permissions](iam-roles-in-parallelcluster-v3.md#iam-roles-in-parallelcluster-v3-example-user-policies) required to run the [`pcluster`](pcluster-v3.md) CLI.

## Verifying your installation
<a name="tutorial-1stjob-verify-install"></a>

 First, we verify that AWS ParallelCluster is correctly, including the Node.js dependency, installed and configured. 

```
$ node --version
v16.8.0
$ pcluster version
{
  "version": "3.15.0"
}
```

This returns the running version of AWS ParallelCluster.

## Creating your first cluster
<a name="tutorial-1stjob-first-cluster"></a>

Now it's time to create your first cluster. Because the workload for this tutorial isn't performance intensive, we can use the default instance size of `t2.micro`. (For production workloads, you choose an instance size that best fits your needs.) Let's call your cluster `hello-world`.

```
$ pcluster create-cluster \
    --cluster-name hello-world \
    --cluster-configuration hello-world.yaml
```

**Note**  
The AWS Region to use must be specified for most `pcluster` commands. If it's not specified in the `AWS_DEFAULT_REGION` environment variable, or the `region` setting in the `[default]` section of the `~/.aws/config` file, then the `--region` parameter must be provided on the `pcluster` command line.

If the output gives you a message about configuration, you need to run the following to configure AWS ParallelCluster: 

```
$ pcluster configure --config hello-world.yaml
```

 If the [`pcluster create-cluster`](pcluster.create-cluster-v3.md) command succeeds, you see output similar to the following: 

```
{
  "cluster": {
    "clusterName": "hello-world",
    "cloudformationStackStatus": "CREATE_IN_PROGRESS",
    "cloudformationStackArn": "arn:aws:cloudformation:xxx:stack/xxx",
    "region": "...",
    "version": "...",
    "clusterStatus": "CREATE_IN_PROGRESS"
  }
}
```

 You monitor the creation of the cluster using: 

```
$ pcluster describe-cluster --cluster-name hello-world
```

 The `clusterStatus` reports "`CREATE_IN_PROGRESS`" while the cluster is being created. The `clusterStatus` transitions to "`CREATE_COMPLETE`" when the cluster is created successfully. The output also provides us with the `publicIpAddress` and `privateIpAddress` of our head node.

## Logging into your head node
<a name="tutorial-1stjob-logging-in-head-node"></a>

 Use your OpenSSH pem file to log into your head node. 

```
$ pcluster ssh --cluster-name hello-world -i /path/to/keyfile.pem
```

 After you log in, run the command `sinfo` to verify that your compute nodes are set up and configured. 

```
$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
queue1*      up   infinite     10  idle~ queue1-dy-queue1t2micro-[1-10]
```

 The output shows that we have one queue in our cluster, with up to ten nodes. 

## Running your first job using Slurm
<a name="tutorial-1stjob-first-slurm-job"></a>

Next, we create a job that sleeps for a little while and then outputs its own hostname. Create a file called `hellojob.sh`, with the following contents.

```
#!/bin/bash
sleep 30
echo "Hello World from $(hostname)"
```

 Next, submit the job using `sbatch`, and verify that it runs. 

```
$ sbatch hellojob.sh
Submitted batch job 2
```

 Now, you can view your queue and check the status of the job. The provisioning of a new Amazon EC2 instance is started in the background. You can monitor the status of the cluster instances with the `sinfo` command.

```
$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
                 2    queue1 hellojob ec2-user CF       3:30      1 queue1-dy-queue1t2micro-1
```

 The output shows that the job has been submitted to `queue1`. Wait 30 seconds for the job to finish, and then run `squeue` again. 

```
$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
```

 Now that there are no jobs in the queue, we can check for output in our current directory. 

```
$ ls -l
total 8
-rw-rw-r-- 1 ec2-user ec2-user 57 Sep  1 14:25 hellojob.sh
-rw-rw-r-- 1 ec2-user ec2-user 43 Sep  1 14:30 slurm-2.out
```

 In the output, we see a "`out`" file. We can see output from our job: 

```
$ cat slurm-2.out
Hello World from queue1-dy-queue1t2micro-1
```

The output also shows that our job ran successfully on instance `queue1-dy-queue1t2micro-1`.

In the cluster you just created, only the home directory is shared among all nodes of the cluster.

To learn more about creating and using clusters, see [Best practices](best-practices-v3.md).

If your application requires shared software, libraries, or data, consider the following options:
+ Build a AWS ParallelCluster enabled custom AMI that includes your software as described in [Building a custom AWS ParallelCluster AMI](building-custom-ami-v3.md).
+ Use the [StorageSettings](SharedStorage-v3.md) option in the AWS ParallelCluster configuration file to specify a shared filesystem and store your installed software in the specified mount location.
+ Use [Custom bootstrap actions](custom-bootstrap-actions-v3.md) to automate the bootstrap procedure of each node of your cluster.

# Building a custom AWS ParallelCluster AMI
<a name="building-custom-ami-v3"></a>

When using the AWS ParallelCluster command line interface (CLI) or API, you only pay for the AWS resources that are created when you create or update AWS ParallelCluster images and clusters. For more information, see [AWS services used by AWS ParallelCluster](aws-services-v3.md).

**Important**  
If you build a custom AMI, you must repeat the steps that you used to create your custom AMI with each new AWS ParallelCluster release.

Before reading further, we recommend that you first review the [Custom bootstrap actions](custom-bootstrap-actions-v3.md) section. Determine if the modifications that you want to make can be scripted and supported with future AWS ParallelCluster releases.

Even though building a custom AMI in general isn't ideal, there are specific scenarios where building a custom AMI for AWS ParallelCluster is necessary. This tutorial covers how to build a custom AMI for these scenarios.

**Prerequisites**
+ AWS ParallelCluster [is installed](install-v3-parallelcluster.md).
+ The AWS CLI [is installed and configured.](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)
+ You have an [Amazon EC2 key pair](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html).
+ You have an IAM role with the [permissions](iam-roles-in-parallelcluster-v3.md#iam-roles-in-parallelcluster-v3-example-user-policies) required to run the [`pcluster`](pcluster-v3.md) CLI and build images.

## How to customize the AWS ParallelCluster AMI
<a name="how-to-customize-the-aws-parallelcluster-ami-v3"></a>

There are two ways to build a custom AWS ParallelCluster AMI. One of these two methods is to build a new AMI using the AWS ParallelCluster CLI. Another method requires that you to make manual modifications to build a new AMI that's available under your AWS account.

## Build a custom AWS ParallelCluster AMI
<a name="build-a-custom-aws-parallelcluster-ami-v3"></a>

If you have a customized AMI and software, you can apply the changes that are needed by AWS ParallelCluster on top of it. AWS ParallelCluster relies on the EC2 Image Builder service to build customized AMIs. For more information, see the [Image Builder User Guide](https://docs.aws.amazon.com/imagebuilder/latest/userguide/what-is-image-builder.html).

Key points:
+ The process takes about 1 hour. This time can vary if there are additional [`Build`](Build-v3.md) / [`Components`](Build-v3.md#Build-v3-Components) to be installed at build time.
+ The AMI is tagged with the versions of the main components. These include the kernel, scheduler, and [EFA](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa.html) driver. A subset of the component versions are also reported in the AMI description.
+ Starting from AWS ParallelCluster 3.0.0, a new set of CLI commands can be used to manage the lifecycle of images. This includes [`build-image`](pcluster.build-image-v3.md), [`list-images`](pcluster.list-images-v3.md), [`describe-image`](pcluster.describe-image-v3.md), and [`delete-image`](pcluster.delete-image-v3.md).
+ This method is repeatable. You can re-run it to keep AMIs updated (for example, OS updates), and then use them when you update an existing cluster.

**Note**  
If you use this method in the AWS China Partition, you might get network errors. For example, you might see these errors from the `pcluster build-image` command when it downloads packages from GitHub or from an OS repository. If this happens, we recommend that you use one of the following alternative methods:  
Follow the [Modify an AWS ParallelCluster AMI](#modify-an-aws-parallelcluster-ami-v3) approach that bypasses this command.
Build the image in another Partition and Region, such as `us-east-1`, and then store/restore it to move it to the China Region. For more information, see [Store and restore an AMI using S3](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ami-store-restore.html) in the *Amazon EC2 User Guide*.

Steps:

1. Configure your AWS account credentials so that the AWS ParallelCluster client can make calls to AWS API operations on your behalf. For a list of the required permissions, see [AWS Identity and Access Management permissions in AWS ParallelCluster](iam-roles-in-parallelcluster-v3.md).

1. Create a basic *build image* configuration file. To do this, specify the [`InstanceType`](Build-v3.md#yaml-build-image-Build-InstanceType) to be used to build the image and the [`ParentImage`](Build-v3.md#yaml-build-image-Build-ParentImage). These are used as the starting point to create the AMI. For more information about optional build parameters, see [Image Configuration](image-builder-configuration-file-v3.md).

   ```
   Build:
    InstanceType: <BUILD_INSTANCE_TYPE>
    ParentImage: <BASE_AMI_ID>
   ```

1. Use the CLI command [`pcluster build-image`](pcluster.build-image-v3.md) to build an AWS ParallelCluster AMI starting from the AMI that you provide as the base.

   ```
   $ pcluster build-image --image-id IMAGE_ID --image-configuration IMAGE_CONFIG.yaml --region REGION
       {
    "image": {
      "imageId": "IMAGE_ID",
      "imageBuildStatus": "BUILD_IN_PROGRESS",
      "cloudformationStackStatus": "CREATE_IN_PROGRESS",
      "cloudformationStackArn": "arn:aws:cloudformation:us-east-1:123456789012:stack/IMAGE_ID/abcd1234-ef56-gh78-ij90-1234abcd5678",
      "region": "us-east-1",
      "version": "3.15.0"
    }
   }
   ```
**Warning**  
`pcluster build-image` uses the default VPC. If you delete the default VPC using AWS Control Tower or AWS Landing Zone, the subnet ID must be specified in the image configuration file. For more information, see [`SubnetId`](HeadNode-v3.md#yaml-HeadNode-Networking-SubnetId).

   For a list of other parameters, see the [`pcluster build-image`](pcluster.build-image-v3.md) command reference page. The results of the preceding command are as follows:
   + A CloudFormation stack is created based on the image configuration. The stack includes all of the EC2 Image Builder resources required for the build.
   + The created resources include the official Image Builder AWS ParallelCluster components that custom Image Builder components can be added to. For more information, see [ Create a custom component with Image Builder](https://docs.aws.amazon.com/imagebuilder/latest/userguide/create-component.html) in the *EC2 Image Builder User Guide*.
   + EC2 Image Builder launches a build instance, applies the AWS ParallelCluster cookbook, installs the AWS ParallelCluster software stack, and performs necessary configuration tasks. The AWS ParallelCluster cookbook is used to build and bootstrap AWS ParallelCluster.
   + The instance is stopped and a new AMI is created from it.
   + Another instance is launched from the newly created AMI. During the test phase, EC2 Image Builder runs tests that are defined in the Image Builder components.
   + If the build is successful, the stack is deleted. If the build fails, the stack is retained and available for inspection.

1. You can monitor the status of the build process by running the following command. After the build completes, you can run it to retrieve the AMI ID given in the response.

   ```
   $ pcluster describe-image --image-id IMAGE_ID --region REGION
       
   # BEFORE COMPLETE
   {
    "imageConfiguration": {
      "url": "https://parallelcluster-1234abcd5678efgh-v1-do-not-delete.s3.amazonaws.com/parallelcluster/3.15.0/images/IMAGE_ID-abcd1234efgh5678/configs/image-config.yaml?...",
    },
    "imageId": "IMAGE_ID",
    "imagebuilderImageStatus": "BUILDING",
    "imageBuildStatus": "BUILD_IN_PROGRESS",
    "cloudformationStackStatus": "CREATE_IN_PROGRESS",
    "cloudformationStackArn": "arn:aws:cloudformation:us-east-1:123456789012:stack/IMAGE_ID/abcd1234-ef56-gh78-ij90-1234abcd5678",
    "region": "us-east-1",
    "version": "3.15.0",
    "cloudformationStackTags": [
      {
        "value": "3.15.0",
        "key": "parallelcluster:version"
      },
      {
        "value": "IMAGE_ID",
        "key": "parallelcluster:image_name"
      },
      ...
    ],
    "imageBuildLogsArn": "arn:aws:logs:us-east-1:123456789012:log-group:/aws/imagebuilder/ParallelClusterImage-IMAGE_ID",
    "cloudformationStackCreationTime": "2022-04-05T21:36:26.176Z"
   }
   
   # AFTER COMPLETE
   {
    "imageConfiguration": {
      "url": "https://parallelcluster-1234abcd5678efgh-v1-do-not-delete.s3.us-east-1.amazonaws.com/parallelcluster/3.15.0/images/IMAGE_ID-abcd1234efgh5678/configs/image-config.yaml?Signature=..."
    },
    "imageId": "IMAGE_ID",
    "imageBuildStatus": "BUILD_COMPLETE",
    "region": "us-east-1",
    "ec2AmiInfo": {
        "amiName": "IMAGE_ID 2022-04-05T21-39-24.020Z",
        "amiId": "ami-1234stuv5678wxyz",
        "description": "AWS ParallelCluster AMI for alinux2, kernel-4.14.238-182.422.amzn2.x86_64, lustre-2.10.8-5.amzn2.x86_64, efa-1.13.0-1.amzn2.x86_64, dcv-2021.1.10598-1.el7.x86_64, slurm-20-11-8-1",
        "state": "AVAILABLE",
        "tags": [
         {
           "value": "2021.3.11591-1.el7.x86_64",
           "key": "parallelcluster:dcv_version"
         },
         ...
        ],
      "architecture": "x86_64"
    },
    "version": "3.15.0"      
   }
   ```

1. To create your cluster, enter the AMI ID in the [`CustomAmi`](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-Image-CustomAmi) field in your cluster configuration.

**Troubleshooting and monitoring AMI creation process**

Image creation completes in about an hour. You can monitor the process by running the [`pcluster describe-image`](pcluster.describe-image-v3.md) command or log retrieval commands.

```
$ pcluster describe-image --image-id IMAGE_ID --region REGION
```

The [`build-image`](pcluster.build-image-v3.md) command creates a CloudFormation stack with all the Amazon EC2 resources that are required to build the image, and launches the EC2 Image Builder process.

After running the [`build-image`](pcluster.build-image-v3.md) command, it's possible to retrieve CloudFormation stack events by using [`pcluster get-image-stack-events`](pcluster.get-image-stack-events-v3.md). You can filter results with the `--query` parameter to see the latest events. For more information, see [Filtering AWS CLI output](https://docs.aws.amazon.com/cli/latest/userguide/cli-usage-filter.html) in the *AWS Command Line Interface User Guide*.

```
$ pcluster get-image-stack-events --image-id IMAGE_ID --region REGION --query "events[0]"
{
 "eventId": "ParallelClusterImage-CREATE_IN_PROGRESS-2022-04-05T21:39:24.725Z",
 "physicalResourceId": "arn:aws:imagebuilder:us-east-1:123456789012:image/parallelclusterimage-IMAGE_ID/3.15.0/1",
 "resourceStatus": "CREATE_IN_PROGRESS",
 "resourceStatusReason": "Resource creation Initiated",
 "resourceProperties": "{\"InfrastructureConfigurationArn\":\"arn:aws:imagebuilder:us-east-1:123456789012:infrastructure-configuration/parallelclusterimage-abcd1234-ef56-gh78-ij90-1234abcd5678\",\"ImageRecipeArn\":\"arn:aws:imagebuilder:us-east-1:123456789012:image-recipe/parallelclusterimage-IMAGE_ID/3.15.0\",\"DistributionConfigurationArn\":\"arn:aws:imagebuilder:us-east-1:123456789012:distribution-configuration/parallelclusterimage-abcd1234-ef56-gh78-ij90-1234abcd5678\",\"Tags\":{\"parallelcluster:image_name\":\"IMAGE_ID\",\"parallelcluster:image_id\":\"IMAGE_ID\"}}",
 "stackId": "arn:aws:cloudformation:us-east-1:123456789012:stack/IMAGE_ID/abcd1234-ef56-gh78-ij90-1234abcd5678",
 "stackName": "IMAGE_ID",
 "logicalResourceId": "ParallelClusterImage",
 "resourceType": "AWS::ImageBuilder::Image",
 "timestamp": "2022-04-05T21:39:24.725Z"
}
```

After about 15 minutes, the stack events appear in the log event entry related to Image Builder creation. You can now list image log streams and monitor the Image Builder steps by using [`pcluster list-image-log-streams`](pcluster.list-image-log-streams-v3.md) and [`pcluster get-image-log-events`](pcluster.get-image-log-events-v3.md) commands.

```
$ pcluster list-image-log-streams --image-id IMAGE_ID --region REGION \
    --query 'logStreams[*].logStreamName'

 "3.15.0/1"
]

$ pcluster get-image-log-events --image-id IMAGE_ID --region REGION \
--log-stream-name 3.15.0/1 --limit 3
{
 "nextToken": "f/36295977202298886557255241372854078762600452615936671762",
 "prevToken": "b/36295977196879805474012299949460899222346900769983430672",
 "events": [
   {
     "message": "ExecuteBash: FINISHED EXECUTION",
     "timestamp": "2022-04-05T22:13:26.633Z"
   },
   {
     "message": "Document arn:aws:imagebuilder:us-east-1:123456789012:component/parallelclusterimage-test-abcd1234-ef56-gh78-ij90-1234abcd5678/3.15.0/1",
     "timestamp": "2022-04-05T22:13:26.741Z"
   },
   {
     "message": "TOE has completed execution successfully",
     "timestamp": "2022-04-05T22:13:26.819Z"
   }
 ]
}
```

Continue to check with the [`describe-image`](pcluster.describe-image-v3.md) command until you see the `BUILD_COMPLETE` status.

```
$ pcluster describe-image --image-id IMAGE_ID --region REGION
{
 "imageConfiguration": {
   "url": "https://parallelcluster-1234abcd5678efgh-v1-do-not-delete.s3.us-east-1.amazonaws.com/parallelcluster/3.15.0/images/IMAGE_ID-abcd1234efgh5678/configs/image-config.yaml?Signature=..."
 },
 "imageId": "IMAGE_ID",
 "imageBuildStatus": "BUILD_COMPLETE",
 "region": "us-east-1",
 "ec2AmiInfo": {
     "amiName": "IMAGE_ID 2022-04-05T21-39-24.020Z",
     "amiId": "ami-1234stuv5678wxyz",
     "description": "AWS ParallelCluster AMI for alinux2, kernel-4.14.238-182.422.amzn2.x86_64, lustre-2.10.8-5.amzn2.x86_64, efa-1.13.0-1.amzn2.x86_64, dcv-2021.1.10598-1.el7.x86_64, slurm-20-11-8-1",
     "state": "AVAILABLE",
     "tags": [
      {
        "value": "2021.3.11591-1.el7.x86_64",
        "key": "parallelcluster:dcv_version"
      },
      ...
     ],
   "architecture": "x86_64"
 },
 "version": "3.15.0"      
}
```

If you need to troubleshoot a custom AMI creation issue, create an archive of the image logs as described in following steps.

It's possible to archive the logs in an Amazon S3 bucket or in a local file, depending on the `--output` parameter.

```
$ pcluster export-image-logs --image-id IMAGE_ID --region REGION \
--bucket BUCKET_NAME --bucket-prefix BUCKET_FOLDER
{
 "url": "https://BUCKET_NAME.s3.us-east-1.amazonaws.com/BUCKET-FOLDER/IMAGE_ID-logs-202209071136.tar.gz?AWSAccessKeyId=..."
}

$ pcluster export-image-logs --image-id IMAGE_ID \
--region REGION --bucket BUCKET_NAME --bucket-prefix BUCKET_FOLDER --output-file /tmp/archive.tar.gz
{
 "path": "/tmp/archive.tar.gz"
}
```

The archive contains the CloudWatch Logs Streams related to the Image Builder process and CloudFormation stack events. The command might take several minutes to run.

 **Managing Custom AMIs** 

Starting from AWS ParallelCluster 3.0.0, a new set of commands has been added in the CLI to build, monitor, and manage the image lifecycle. For more information about the commands, see [pcluster commands](pcluster-v3.md).

## Modify an AWS ParallelCluster AMI
<a name="modify-an-aws-parallelcluster-ami-v3"></a>

This method consists of modifying an official AWS ParallelCluster AMI by adding customization on top of it. The base AWS ParallelCluster AMIs are updated with new releases. These AMIs have all of the components that are required for AWS ParallelCluster to function when it's installed and configured. You can start with one of these as your base.

Key points:
+ This method is faster than the [`build-image`](pcluster.build-image-v3.md) command. However, it's a manual process and not automatically repeatable.
+ With this method, you don't have access to the log retrieval and image lifecycle management commands that are available through the CLI.

Steps:

------
#### [ New Amazon EC2 console ]

1. Find the AMI that corresponds to the specific AWS Region that you use. To find it, use the [`pcluster list-official-images`](pcluster.list-official-images-v3.md) command with the `--region` parameter to select the specific AWS Region and `--os` and `--architecture` parameters to filter for the desired AMI with the OS and architecture that you want to use. From the output, retrieve the Amazon EC2 Image ID.

1. Sign in to the AWS Management Console and open the Amazon EC2 console at [https://console.aws.amazon.com/ec2/](https://console.aws.amazon.com/ec2/).

1. In the navigation pane, choose **Images**, and then **AMIs**. Search for the retrieved EC2 Image ID, select the AMI, and choose **Launch instance from AMI**.

1. Scroll down and choose your **Instance type**.

1. Choose your **Key pair** and **Launch Instance**.

1. Log in to your instance using the OS user and your SSH key.

1. Manually customize your instance to meet your requirements.

1. Run the following command to prepare your instance for AMI creation.

   ```
   sudo /usr/local/sbin/ami_cleanup.sh
   ```

1. From the console, choose **Instance state** and **Stop instance**.

   Navigate to **Instances**, choose the new instance, select **Instance state**, and **Stop instance**.

1. Create a new AMI from the instance using the Amazon EC2 console or AWS CLI [create-image](https://docs.aws.amazon.com/cli/latest/reference/ec2/create-image.html).

**From the Amazon EC2 console**

   1. Choose **Instances** in the navigation pane.

   1. Choose the instance that you created and modified.

   1. In **Actions**, choose **Image** and then **Create image**.

   1. Choose **Create Image**.

1. Enter the new AMI ID in the [`CustomAmi`](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-Image-CustomAmi) field in your cluster configuration and create a cluster.

------
#### [ Old Amazon EC2 console ]

1. Find the AWS ParallelCluster AMI that corresponds to the specific AWS Region that you use. To find it you can use the [`pcluster list-official-images`](pcluster.list-official-images-v3.md) command with the `--region` parameter to select the specific AWS Region and `--os` and `--architecture` parameters to filter for the desired AMI with the OS and architecture that you want to use. From the output you can retrieve the Amazon EC2 Image ID.

1. Sign in to the AWS Management Console and open the Amazon EC2 console at [https://console.aws.amazon.com/ec2/](https://console.aws.amazon.com/ec2/).

1. In the navigation pane, choose **Images**, and then **AMIs**. Set the filter for **Public images** and search for the retrieved EC2 Image ID, select the AMI, and choose **Launch**.

1. Choose your instance type and select **Next: Configure Instance Details** or **Review and Launch** to launch your instance.

1. Choose **Launch**, select your **Key pair**, and **Launch Instances**.

1. Log into your instance using the OS user and your SSH key. For more information, navigate to **Instances**, select the new instance and **Connect**.

1. Manually customize your instance to meet your requirements.

1. Run the following command to prepare your instance for AMI creation:

   ```
   sudo /usr/local/sbin/ami_cleanup.sh
   ```

1. From the Amazon EC2 console, choose **Instances** in the navigation pane, select your new instance and choose **Actions**, **Instance State** and **Stop**.

1. Create a new AMI from the instance using the Amazon EC2 console or AWS CLI [create-image](https://docs.aws.amazon.com/cli/latest/reference/ec2/create-image.html).

**From the Amazon EC2 console**

   1. Choose **Instances** in the navigation pane.

   1. Choose the instance you created and modified.

   1. In **Actions**, choose **Image** and then **Create Image**.

   1. Choose **Create Image**.

1. Enter the new AMI ID in the [`CustomAmi`](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-Image-CustomAmi) field in your cluster configuration and create a cluster.

------

# Integrating Active Directory
<a name="tutorials_05_multi-user-ad"></a>

In this tutorial, you create a multiple user environment. This environment includes an AWS ParallelCluster that's integrated with an AWS Managed Microsoft AD (Active Directory) at `corp.example.com`. You configure an `Admin` user to manage the directory, a `ReadOnly` user to read the directory, and a `user000` user to log into the cluster. You can use either the automated path or the manual path to create the networking resources, an Active Directory (AD), and the Amazon EC2 instance that you use to configure the AD. Regardless of the path, the infrastructure that you create is pre-configured to integrate AWS ParallelCluster using one of the following methods:
+ LDAPS with certificate verification (recommended as the most secure option)
+ LDAPS without certificate verification
+ LDAP

LDAP by itself *doesn't* provide encryption. To ensure secure transmission of potentially sensitive information, we strongly recommend that you use LDAPS (LDAP over TLS/SSL) for clusters integrated with ADs. For more information, see [Enable server-side LDAPS using AWS Managed Microsoft AD](https://docs.aws.amazon.com/directoryservice/latest/admin-guide/ms_ad_ldap_server_side.html) in the Directory Service *Administration Guide*.

After you create these resources, proceed to configure and create your cluster integrated with your Active Directory (AD). After the cluster is created, log in as the user you created. For more information about the configuration that you create in this tutorial, see [Multiple user access to clusters](multi-user-v3.md) and the [`DirectoryService`](DirectoryService-v3.md) configuration section.

This tutorial covers how to create an environment that supports multiple user access to clusters. This tutorial doesn't cover how you create and use an Directory Service AD. The steps that you take to set up an AWS Managed Microsoft AD in this tutorial are provided for testing purposes only. They *aren't* provided to replace the official documentation and best practices you can find at [AWS Managed Microsoft AD](https://docs.aws.amazon.com/directoryservice/latest/admin-guide/directory_microsoft_ad.html) and [Simple AD](https://docs.aws.amazon.com/directoryservice/latest/admin-guide/directory_simple_ad.html) in the *Directory Service Administration Guide*.

**Note**  
Directory user passwords expire according to the directory password policy property definitions. To reset directory passwords with AWS ParallelCluster, see [How to reset a user password and expired passwords](troubleshooting-v3-multi-user.md#troubleshooting-v3-multi-user-reset-passwd).

**Note**  
The directory domain controller IP addresses can change due to domain controller changes and directory maintenance. If you chose the automated quick create method to create the directory infrastructure, you must manually align the load balancer in front of the directory controllers when the directory IP addresses change. If you use the quick create method, the directory IP addresses aren't automatically aligned with the load balancers.

When using the AWS ParallelCluster command line interface (CLI) or API, you only pay for the AWS resources that are created when you create or update AWS ParallelCluster images and clusters. For more information, see [AWS services used by AWS ParallelCluster](aws-services-v3.md).

**Prerequisites**
+ AWS ParallelCluster [is installed](install-v3-parallelcluster.md).
+ The AWS CLI [is installed and configured.](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)
+ You have an [Amazon EC2 key pair](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html).
+ You have an IAM role with the [permissions](iam-roles-in-parallelcluster-v3.md#iam-roles-in-parallelcluster-v3-example-user-policies) required to run the [`pcluster`](pcluster-v3.md) CLI.

As you go through the tutorial, replace `inputs highlighted in red`, such as `region-id` and `d-abcdef01234567890`, with your own names and IDs. Replace `0123456789012` with your AWS account number.

# Create the AD infrastructure
<a name="tutorials_05_multi-user-ad-step1"></a>

Choose the *Automated* tab to create the Active Directory (AD) infrastructure with an CloudFormation quick create template.

Choose the *Manual* tab to manually create the AD infrastructure.

## Automated
<a name="tutorials_05_multi-user-ad-step1-automated"></a>

1. Sign in to the AWS Management Console.

1. Open [CloudFormation Quick Create (region us-east-1)](https://us-east-1.console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/create/review?stackName=pcluster-ad&templateURL=https://us-east-1-aws-parallelcluster.s3.amazonaws.com/templates/1-click/ad-integration.yaml) to create the following resources in the CloudFormation console:
   + A VPC with two subnets and routing for public access, if no VPC is specified.
   + An AWS Managed Microsoft AD.
   + An Amazon EC2 instance that's joined to the AD that you can use to manage the directory.

1. In the **Quick create stack** page **Parameters** section, enter passwords for the following parameters:
   + **AdminPassword**
   + **ReadOnlyPassword**
   + **UserPassword**

   Make note of the passwords. You use them later on in this tutorial.

1. For **DomainName**, enter **corp.example.com**

1. For **Keypair**, enter the name of an Amazon EC2 key pair.

1. Check the boxes to acknowledge each of the access capabilities at the bottom of the page.

1. Choose **Create stack**.

1. After the CloudFormation stack has reached the `CREATE_COMPLETE` state, choose the **Outputs** tab of the stack. Make a note of the output resource names and IDs because you need to use them in later steps. The outputs provide the information that's needed to create the cluster.  
![\[A diagram that shows the created stack outputs in the AWS Management Console.\]](http://docs.aws.amazon.com/parallelcluster/latest/ug/images/ad-cfn.png)

1. To complete the exercises [(Optional) Manage AD users and groups](tutorials_05_multi-user-ad-step2.md), you need the directory ID. Choose **Resources** and scroll down to make note of the directory ID.

1. Continue at [(Optional) Manage AD users and groups](tutorials_05_multi-user-ad-step2.md) or [Create the cluster](tutorials_05_multi-user-ad-step3.md).

## Manual
<a name="tutorials_05_multi-user-ad-step1-manual"></a>

Create a VPC for the directory service with two subnets in different Availability Zones and an AWS Managed Microsoft AD.

### Create the AD
<a name="tutorials_05_multi-user-ad-step1-manual-ad"></a>

**Note**  
The directory and domain name is `corp.example.com`. The short name is `CORP`.
Change the `Admin` password in the script.
The Active Directory (AD) takes at least 15 minutes to create.

Use the following Python script to create the VPC, subnets, and AD resources in your local AWS Region. Save this file as `ad.py` and run it.

```
import boto3
import time
from pprint import pprint

vpc_name = "PclusterVPC"
ad_domain = "corp.example.com"
admin_password = "asdfASDF1234"

ec2 = boto3.client("ec2")
ds = boto3.client("ds")
region = boto3.Session().region_name

# Create the VPC, Subnets, IGW, Routes
vpc = ec2.create_vpc(CidrBlock="10.0.0.0/16")["Vpc"]
vpc_id = vpc["VpcId"]
time.sleep(30)
ec2.create_tags(Resources=[vpc_id], Tags=[{"Key": "Name", "Value": vpc_name}])
subnet1 = ec2.create_subnet(VpcId=vpc_id, CidrBlock="10.0.0.0/17", AvailabilityZone=f"{region}a")["Subnet"]
subnet1_id = subnet1["SubnetId"]
time.sleep(30)
ec2.create_tags(Resources=[subnet1_id], Tags=[{"Key": "Name", "Value": f"{vpc_name}/subnet1"}])
ec2.modify_subnet_attribute(SubnetId=subnet1_id, MapPublicIpOnLaunch={"Value": True})
subnet2 = ec2.create_subnet(VpcId=vpc_id, CidrBlock="10.0.128.0/17", AvailabilityZone=f"{region}b")["Subnet"]
subnet2_id = subnet2["SubnetId"]
time.sleep(30)
ec2.create_tags(Resources=[subnet2_id], Tags=[{"Key": "Name", "Value": f"{vpc_name}/subnet2"}])
ec2.modify_subnet_attribute(SubnetId=subnet2_id, MapPublicIpOnLaunch={"Value": True})
igw = ec2.create_internet_gateway()["InternetGateway"]
ec2.attach_internet_gateway(InternetGatewayId=igw["InternetGatewayId"], VpcId=vpc_id)
route_table = ec2.describe_route_tables(Filters=[{"Name": "vpc-id", "Values": [vpc_id]}])["RouteTables"][0]
ec2.create_route(RouteTableId=route_table["RouteTableId"], DestinationCidrBlock="0.0.0.0/0", GatewayId=igw["InternetGatewayId"])
ec2.modify_vpc_attribute(VpcId=vpc_id, EnableDnsSupport={"Value": True})
ec2.modify_vpc_attribute(VpcId=vpc_id, EnableDnsHostnames={"Value": True})

# Create the Active Directory
ad = ds.create_microsoft_ad(
    Name=ad_domain,
    Password=admin_password,
    Description="ParallelCluster AD",
    VpcSettings={"VpcId": vpc_id, "SubnetIds": [subnet1_id, subnet2_id]},
    Edition="Standard",
)
directory_id = ad["DirectoryId"]

# Wait for completion
print("Waiting for the directory to be created...")
directories = ds.describe_directories(DirectoryIds=[directory_id])["DirectoryDescriptions"]
directory = directories[0]
while directory["Stage"] in {"Requested", "Creating"}:
    time.sleep(3)
    directories = ds.describe_directories(DirectoryIds=[directory_id])["DirectoryDescriptions"]
    directory = directories[0]
    
dns_ip_addrs = directory["DnsIpAddrs"]

pprint({"directory_id": directory_id,
        "vpc_id": vpc_id,
        "subnet1_id": subnet1_id,
        "subnet2_id": subnet2_id,
        "dns_ip_addrs": dns_ip_addrs})
```

The following is example output from the Python script.

```
{
  "directory_id": "d-abcdef01234567890",
  "dns_ip_addrs": ["192.0.2.254", "203.0.113.237"],
  "subnet1_id": "subnet-021345abcdef6789",
  "subnet2_id": "subnet-1234567890abcdef0",
  "vpc_id": "vpc-021345abcdef6789"
}
```

Make a note of the output resource names and IDs. You use them in later steps.

After the script completes, continue to the next step.

### Create an Amazon EC2 instance
<a name="tutorials_05_multi-user-ad-step1-manual-instance"></a>

------
#### [ New Amazon EC2 console ]

1. Sign in to the AWS Management Console.

1. If you don't have a role with the policies listed in step 4 attached, open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/). Otherwise, skip to step 5.

1. Create the `ResetUserPassword` policy, replacing the red highlighted content with your AWS Region ID, Account ID, and the directory ID from the output of the script you ran to create the AD.

   ResetUserPassword

   ```
   {
          "Statement": [
           {
               "Action": [
                   "ds:ResetUserPassword"
               ],
               "Resource": "arn:aws:ds:region-id:123456789012:directory/d-abcdef01234567890",
               "Effect": "Allow"
           }
       ]
   }
   ```

1. Create an IAM role with the following policies attached.
   + AWS managed policy [AmazonSSMManagedInstanceCore](https://console.aws.amazon.com/iam/home#/policies/arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore)
   + AWS managed policy [AmazonSSMDirectoryServiceAccess](https://console.aws.amazon.com/iam/home#/policies/arn:aws:iam::aws:policy/AmazonSSMDirectoryServiceAccess)
   + ResetUserPassword policy

1. Open the Amazon EC2 console at [https://console.aws.amazon.com/ec2/](https://console.aws.amazon.com/ec2/).

1. In the **Amazon EC2 Dashboard**, choose **Launch Instance**.

1. In **Application and OS Images**, select a recent Amazon Linux 2 AMI.

1. For **Instance type**, choose t2.micro.

1. For **Key pair**, choose a key pair.

1. For **Network settings**, choose **Edit**.

1. For **VPC**, select the directory VPC.

1. Scroll down and select **Advanced details**.

1. In **Advanced details**, **Domain join directory**, choose **corp.example.com**.

1. For **IAM Instance profile**, choose the role you created in step 1 or a role with policies listed in step 4 attached.

1. In **Summary** choose **Launch instance**.

1. Make note of the Instance ID (for example, i-1234567890abcdef0) and wait for the instance to finish launching.

1. After the instance has launched, continue to the next step.

------
#### [ Old Amazon EC2 console ]

1. Sign in to the AWS Management Console.

1. If you don't have a role with the policies listed in step 4 attached, open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/). Otherwise, skip to step 5.

1. Create the `ResetUserPassword` policy. Replace the red highlighted content with your AWS Region ID, AWS account ID, and the directory ID from the output of the script you ran to create the Active Directory (AD).

   ResetUserPassword

   ```
   {
           "Statement": [
               {
                   "Action": [
                       "ds:ResetUserPassword"
                   ],
                   "Resource": "arn:aws:ds:region-id:123456789012:directory/d-abcdef01234567890",
                   "Effect": "Allow"
               }
           ]
        }
   ```

1. Create an IAM role with the following policies attached.
   + AWS managed policy [AmazonSSMManagedInstanceCore](https://console.aws.amazon.com/iam/home#/policies/arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore)
   + AWS managed policy [AmazonSSMDirectoryServiceAccess](https://console.aws.amazon.com/iam/home#/policies/arn:aws:iam::aws:policy/AmazonSSMDirectoryServiceAccess)
   + ResetUserPassword policy

1. Open the Amazon EC2 console at [https://console.aws.amazon.com/ec2/](https://console.aws.amazon.com/ec2/).

1. In the **Amazon EC2 Dashboard**, choose **Launch Instance**.

1. In **Application and OS Images**, select a recent Amazon Linux 2 AMI.

1. For **Instance type**, choose t2.micro.

1. For **Key pair**, choose a key pair.

1. In **Network settings**, choose **Edit**.

1. In **Network settings**, **VPC**, select the directory VPC.

1. Scroll down and select **Advanced details**.

1. In **Advanced details**, **Domain join directory**, choose **corp.example.com**.

1. In **Advanced details**, **Instance profile**, choose the role that you created in step 1 or a role with the policies that are listed in step 4 attached.

1. In **Summary** choose **Launch instance**.

1. Make note of the Instance ID (for example, i-1234567890abcdef0) and wait for the instance to finish launching.

1. After the instance has launched, continue to the next step.

------

### Join your instance to the AD
<a name="tutorials_05_multi-user-ad-step1-manual-join"></a>

1. 

**Connect to your instance and join the AD realm as `admin`.**

   Run the following commands to connect to the instance.

   ```
   $ INSTANCE_ID="i-1234567890abcdef0"
   ```

   ```
   $ PUBLIC_IP=$(aws ec2 describe-instances \
   --instance-ids $INSTANCE_ID \
   --query "Reservations[0].Instances[0].PublicIpAddress" \
   --output text)
   ```

   ```
   $ ssh -i ~/.ssh/keys/keypair.pem ec2-user@$PUBLIC_IP
   ```

1. 

**Install necessary software and join the realm.**

   ```
   $ sudo yum -y install sssd realmd oddjob oddjob-mkhomedir adcli samba-common samba-common-tools krb5-workstation openldap-clients policycoreutils-python
   ```

1. 

**Replace the admin password with your `admin` password.**

   ```
   $ ADMIN_PW="asdfASDF1234"
   ```

   ```
   $ echo $ADMIN_PW | sudo realm join -U Admin corp.example.com
   Password for Admin:
   ```

   If the preceding has succeeded, you're joined to the realm and can proceed to the next step.

### Add users to the AD
<a name="tutorials_05_multi-user-ad-step1-manual-join-add-users"></a>

1. 

**Create the ReadOnlyUser and an additional user.**

   In this step, you use [adcli](https://www.mankier.com/package/adcli) and [openldap-clients](https://www.mankier.com/package/openldap-clients) tools that you installed in a preceding step.

   ```
   $ echo $ADMIN_PW | adcli create-user -x -U Admin --domain=corp.example.com --display-name=ReadOnlyUser ReadOnlyUser
   ```

   ```
   $ echo $ADMIN_PW | adcli create-user -x -U Admin --domain=corp.example.com --display-name=user000 user000
   ```

1. **Verify the users are created:**

   The directory DNS IP addresses are outputs of the Python script.

   ```
   $ DIRECTORY_IP="192.0.2.254"
   ```

   ```
   $ ldapsearch -x -h $DIRECTORY_IP -D Admin -w $ADMIN_PW -b "cn=ReadOnlyUser,ou=Users,ou=CORP,dc=corp,dc=example,dc=com"
   ```

   ```
   $ ldapsearch -x -h $DIRECTORY_IP -D Admin -w $ADMIN_PW -b "cn=user000,ou=Users,ou=CORP,dc=corp,dc=example,dc=com"
   ```

   By default, when you create a user with the `ad-cli`, the user is disabled.

1. 

****Reset and activate the user passwords from your local machine:****

   Log out of your Amazon EC2 instance.
**Note**  
`ro-p@ssw0rd` is the password of `ReadOnlyUser`, retrieved from AWS Secrets Manager.
`user-p@ssw0rd` is the password of a cluster user that's provided when you connect (`ssh`) to the cluster.

   The `directory-id` is an output of the Python script.

   ```
   $ DIRECTORY_ID="d-abcdef01234567890"
   ```

   ```
   $ aws ds reset-user-password \
   --directory-id $DIRECTORY_ID \
   --user-name "ReadOnlyUser" \
   --new-password "ro-p@ssw0rd" \
   --region "region-id"
   ```

   ```
   $ aws ds reset-user-password \
   --directory-id $DIRECTORY_ID \
   --user-name "user000" \
   --new-password "user-p@ssw0rd" \
   --region "region-id"
   ```

1. **Add the password to a Secrets Manager secret.**

   Now that you created a `ReadOnlyUser` and set the password, store it in a secret that AWS ParallelCluster uses for validating logins.

   Use Secrets Manager to create a new secret to hold the password for the `ReadOnlyUser` as the value. The secret value format must be plain text only (not JSON format). Make note of the secret ARN for future steps.

   ```
   $ aws secretsmanager create-secret --name "ADSecretPassword" \
   --region region_id \
   --secret-string "ro-p@ssw0rd" \
   --query ARN \
   --output text
   arn:aws:secretsmanager:region-id:123456789012:secret:ADSecretPassword-1234
   ```

### LDAPS with certificate verification (recommended) setup
<a name="tutorials_05_multi-user-ad-step1-manual-ldaps"></a>

Make a note of resource IDs. You use them in steps later on.

1. 

**Generate domain certificate, locally.**

   ```
   $ PRIVATE_KEY="corp-example-com.key"
   CERTIFICATE="corp-example-com.crt"
   printf ".\n.\n.\n.\n.\ncorp.example.com\n.\n" | openssl req -x509 -sha256 -nodes -newkey rsa:2048 -keyout $PRIVATE_KEY -days 365 -out $CERTIFICATE
   ```

1. 

**Store the certificate to Secrets Manager to make it retrievable from within the cluster later on.**

   ```
   $ aws secretsmanager create-secret --name example-cert \
     --secret-string file://$CERTIFICATE \
     --region region-id
   {
     "ARN": "arn:aws:secretsmanager:region-id:123456789012:secret:example-cert-123abc",
     "Name": "example-cert",
     "VersionId": "14866070-092a-4d5a-bcdd-9219d0566b9c"
   }
   ```

1. Add the following policy to the IAM role that you created to join the Amazon EC2 instance to the AD domain.

   `PutDomainCertificateSecrets`

   ```
   {
       "Statement": [
           {
               "Action": [
                   "secretsmanager:PutSecretValue"
               ],
               "Resource": [
                   "arn:aws:secretsmanager:region-id:123456789012:secret:example-cert-123abc"
               ],
               "Effect": "Allow"
           }
       ]
   }
   ```

1. 

**Import the certificate to AWS Certificate Manager (ACM).**

   ```
   $ aws acm import-certificate --certificate fileb://$CERTIFICATE \
     --private-key fileb://$PRIVATE_KEY \
     --region region-id
   {
     "CertificateArn": "arn:aws:acm:region-id:123456789012:certificate/343db133-490f-4077-b8d4-3da5bfd89e72"
   }
   ```

1. 

**Create and the load balancer that is put in front of the Active Directory endpoints.**

   ```
   $ aws elbv2 create-load-balancer --name CorpExampleCom-NLB \
     --type network \
     --scheme internal \
     --subnets subnet-1234567890abcdef0 subnet-021345abcdef6789 \
     --region region-id
   {
     "LoadBalancers": [
       {
         "LoadBalancerArn": "arn:aws:elasticloadbalancing:region-id:123456789012:loadbalancer/net/CorpExampleCom-NLB/3afe296bf4ba80d4",
         "DNSName": "CorpExampleCom-NLB-3afe296bf4ba80d4.elb.region-id.amazonaws.com",
         "CanonicalHostedZoneId": "Z2IFOLAFXWLO4F",
         "CreatedTime": "2022-05-05T12:56:55.988000+00:00",
         "LoadBalancerName": "CorpExampleCom-NLB",
         "Scheme": "internal",
         "VpcId": "vpc-021345abcdef6789",
         "State": {
           "Code": "provisioning"
          },
          "Type": "network",
          "AvailabilityZones": [
            {
              "ZoneName": "region-idb",
              "SubnetId": "subnet-021345abcdef6789",
              "LoadBalancerAddresses": []
            },
            {
              "ZoneName": "region-ida",
              "SubnetId": "subnet-1234567890abcdef0",
              "LoadBalancerAddresses": []
            }
          ],
          "IpAddressType": "ipv4"
       }   
     ]
   }
   ```

1. 

**Create the target group that's targeting the Active Directory endpoints.**

   ```
   $ aws elbv2 create-target-group --name CorpExampleCom-Targets --protocol TCP \
     --port 389 \
     --target-type ip \
     --vpc-id vpc-021345abcdef6789 \
     --region region-id
   {
     "TargetGroups": [
       {
         "TargetGroupArn": "arn:aws:elasticloadbalancing:region-id:123456789012:targetgroup/CorpExampleCom-Targets/44577c583b695e81",
         "TargetGroupName": "CorpExampleCom-Targets",
         "Protocol": "TCP",
         "Port": 389,
         "VpcId": "vpc-021345abcdef6789",
         "HealthCheckProtocol": "TCP",
         "HealthCheckPort": "traffic-port",
         "HealthCheckEnabled": true,
         "HealthCheckIntervalSeconds": 30,
         "HealthCheckTimeoutSeconds": 10,
         "HealthyThresholdCount": 3,
         "UnhealthyThresholdCount": 3,
         "TargetType": "ip",
         "IpAddressType": "ipv4"
       }
     ]
   }
   ```

1. 

**Register the Active Directory (AD) endpoints into the target group.**

   ```
   $ aws elbv2 register-targets --target-group-arn arn:aws:elasticloadbalancing:region-id:123456789012:targetgroup/CorpExampleCom-Targets/44577c583b695e81 \
     --targets Id=192.0.2.254,Port=389 Id=203.0.113.237,Port=389 \
     --region region-id
   ```

1. 

**Create the LB listener with the certificate.**

   ```
   $ aws elbv2 create-listener --load-balancer-arn arn:aws:elasticloadbalancing:region-id:123456789012:loadbalancer/net/CorpExampleCom-NLB/3afe296bf4ba80d4 \
     --protocol TLS \
     --port 636 \
     --default-actions Type=forward,TargetGroupArn=arn:aws:elasticloadbalancing:region-id:123456789012:targetgroup/CorpExampleCom-Targets/44577c583b695e81 \
     --ssl-policy ELBSecurityPolicy-TLS-1-2-2017-01 \
     --certificates CertificateArn=arn:aws:acm:region-id:123456789012:certificate/343db133-490f-4077-b8d4-3da5bfd89e72 \
     --region region-id
     "Listeners": [
     {
       "ListenerArn": "arn:aws:elasticloadbalancing:region-id:123456789012:listener/net/CorpExampleCom-NLB/3afe296bf4ba80d4/a8f9d97318743d4b",
       "LoadBalancerArn": "arn:aws:elasticloadbalancing:region-id:123456789012:loadbalancer/net/CorpExampleCom-NLB/3afe296bf4ba80d4",
       "Port": 636,
       "Protocol": "TLS",
       "Certificates": [
         {
           "CertificateArn": "arn:aws:acm:region-id:123456789012:certificate/343db133-490f-4077-b8d4-3da5bfd89e72"
          }
        ],
        "SslPolicy": "ELBSecurityPolicy-TLS-1-2-2017-01",
        "DefaultActions": [
          {
            "Type": "forward",
            "TargetGroupArn": "arn:aws:elasticloadbalancing:region-id:123456789012:targetgroup/CorpExampleCom-Targets/44577c583b695e81",
            "ForwardConfig": {
              "TargetGroups": [
                {
                   "TargetGroupArn": "arn:aws:elasticloadbalancing:region-id:123456789012:targetgroup/CorpExampleCom-Targets/44577c583b695e81"
                 }
               ]
             }
           }
         ]
       }
     ]
   }
   ```

1. 

**Create the hosted zone to make the domain discoverable within the cluster VPC.**

   ```
   $ aws route53 create-hosted-zone --name corp.example.com \
     --vpc VPCRegion=region-id,VPCId=vpc-021345abcdef6789 \
     --caller-reference "ParallelCluster AD Tutorial"
   {
     "Location": "https://route53.amazonaws.com/2013-04-01/hostedzone/Z09020002B5MZQNXMSJUB",
     "HostedZone": {
       "Id": "/hostedzone/Z09020002B5MZQNXMSJUB",
       "Name": "corp.example.com.",
       "CallerReference": "ParallelCluster AD Tutorial",
       "Config": {
            "PrivateZone": true
       },
       "ResourceRecordSetCount": 2
     },
     "ChangeInfo": {
       "Id": "/change/C05533343BF3IKSORW1TQ",
       "Status": "PENDING",
       "SubmittedAt": "2022-05-05T13:21:53.863000+00:00"
     },
     "VPC": {
       "VPCRegion": "region-id",
       "VPCId": "vpc-021345abcdef6789"
     }
   }
   ```

1. 

**Create a file that's named `recordset-change.json` with the following content. `HostedZoneId` is the canonical hosted zone ID of the load balancer.**

   ```
   {
     "Changes": [
       {
         "Action": "CREATE",
         "ResourceRecordSet": {
           "Name": "corp.example.com",
           "Type": "A",
           "Region": "region-id",
           "SetIdentifier": "example-active-directory",
           "AliasTarget": {
             "HostedZoneId": "Z2IFOLAFXWLO4F",
             "DNSName": "CorpExampleCom-NLB-3afe296bf4ba80d4.elb.region-id.amazonaws.com",
             "EvaluateTargetHealth": true
           }
         }
       }
     ]
   }
   ```

1. 

**Submit the recordset change to the hosted zone, this time using the hosted zone ID.**

   ```
   $ aws route53 change-resource-record-sets --hosted-zone-id Z09020002B5MZQNXMSJUB \
     --change-batch file://recordset-change.json
   {
     "ChangeInfo": {
       "Id": "/change/C0137926I56R3GC7XW2Y",
       "Status": "PENDING",
       "SubmittedAt": "2022-05-05T13:40:36.553000+00:00"
     }
   }
   ```

1. 

**Create a policy document `policy.json` with the following content.**

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Action": [
                   "secretsmanager:GetSecretValue"
               ],
               "Resource": [
                   "arn:aws:secretsmanager:us-east-1:123456789012:secret:example-cert-abc123"
               ],
               "Effect": "Allow"
           }
       ]
   }
   ```

------

1. 

**Create a policy document that is named `policy.json` with the following content.**

   ```
   $ aws iam create-policy --policy-name ReadCertExample \
     --policy-document file://policy.json
   {
     "Policy": {
       "PolicyName": "ReadCertExample",
       "PolicyId": "ANPAUUXUVBC42VZSI4LDY",
       "Arn": "arn:aws:iam::123456789012:policy/ReadCertExample-efg456",
       "Path": "/",
       "DefaultVersionId": "v1",
       "AttachmentCount": 0,
       "PermissionsBoundaryUsageCount": 0,
       "IsAttachable": true,
       "CreateDate": "2022-05-05T13:42:18+00:00",
       "UpdateDate": "2022-05-05T13:42:18+00:00"
     }
   }
   ```

1. Continue to follow the steps at [(Optional) Manage AD users and groups](tutorials_05_multi-user-ad-step2.md) or [Create the cluster](tutorials_05_multi-user-ad-step3.md).

# (Optional) Manage AD users and groups
<a name="tutorials_05_multi-user-ad-step2"></a>

In this step, you manage users and groups from an Amazon EC2 Amazon Linux 2 instance that's joined to the Active Delivery (AD) domain.

If you followed the *automated* path, restart and log in to the AD joined instance that was created as part of the automation.

If you followed the *manual* path, restart and log in to the instance that you created and joined to the AD in preceding steps.

In these steps, you use the [adcli](https://www.mankier.com/package/adcli) and [openldap-clients](https://www.mankier.com/package/openldap-clients) tools that were installed in the instance as part of a preceding step.

**Log in to an Amazon EC2 instance that is joined to the AD domain**

1. From the Amazon EC2 console, select the untitled Amazon EC2 instance that was created in previous steps. The instance state might be **Stopped**.

1. If the instance state is **Stopped**, choose **Instance state** and then **Start instance**.

1. After the status checks pass, select the instance and choose **Connect** and SSH in to the instance.

**Manage users and groups when logged into an Amazon EC2 Amazon Linux 2 instance that's joined the AD**

When you run the `adcli` commands with the ` -U "Admin"` option, you're prompted to enter the AD `Admin` password. You include the AD `Admin` password as part of the `ldapsearch` commands.

1. 

**Create a user.**

   ```
   $ adcli create-user "clusteruser" --domain "corp.example.com" -U "Admin"
   ```

1. 

**Set a user password.**

   ```
   $ aws --region "region-id" ds reset-user-password --directory-id "d-abcdef01234567890" --user-name "clusteruser" --new-password "new-p@ssw0rd"
   ```

1. 

**Create a group.**

   ```
   $ adcli create-group "clusterteam" --domain "corp.example.com" -U "Admin"
   ```

1. 

**Add a user to a group.**

   ```
   $ adcli add-member "clusterteam" "clusteruser" --domain "corp.example.com" -U "Admin"
   ```

1. 

**Describe users and groups.**

   Describe all users.

   ```
   $ ldapsearch "(&(objectClass=user))" -x -h "192.0.2.254" -b "DC=corp,DC=example,DC=com" -D "CN=Admin,OU=Users,OU=CORP,DC=corp,DC=example,DC=com" -w "p@ssw0rd"
   ```

   Describe a specific user.

   ```
   $ ldapsearch "(&(objectClass=user)(cn=clusteruser))" -x -h "192.0.2.254" -b "DC=corp,DC=example,DC=com" -D "CN=Admin,OU=Users,OU=CORP,DC=corp,DC=example,DC=com" -w "p@ssw0rd"
   ```

   Describe all users with a name pattern.

   ```
   $ ldapsearch "(&(objectClass=user)(cn=user*))" -x -h "192.0.2.254" -b "DC=corp,DC=example,DC=com" -D "CN=Admin,OU=Users,OU=CORP,DC=corp,DC=example,DC=com" -w "p@ssw0rd"
   ```

   Describe all users that are part of a specific group.

   ```
   $ ldapsearch "(&(objectClass=user)(memberOf=CN=clusterteam,OU=Users,OU=CORP,DC=corp,DC=example,DC=com))" -x -h "192.0.2.254" -b "DC=corp,DC=example,DC=com" -D "CN=Admin,OU=Users,OU=CORP,DC=corp,DC=example,DC=com" -w "p@ssw0rd"
   ```

   Describe all groups

   ```
   $ ldapsearch "objectClass=group" -x -h "192.0.2.254" -b "DC=corp,DC=example,DC=com" -D "CN=Admin,OU=Users,OU=CORP,DC=corp,DC=example,DC=com" -w "p@ssw0rd"
   ```

   Describe a specific group

   ```
   $ ldapsearch "(&(objectClass=group)(cn=clusterteam))" -x -h "192.0.2.254" -b "DC=corp,DC=example,DC=com" -D "CN=Admin,OU=Users,OU=CORP,DC=corp,DC=example,DC=com" -w "p@ssw0rd"
   ```

1. 

**Remove a user from a group.**

   ```
   $ adcli remove-member "clusterteam" "clusteruser" --domain "corp.example.com" -U "Admin"
   ```

1. 

**Delete a user.**

   ```
   $ adcli delete-user "clusteruser" --domain "corp.example.com" -U "Admin"
   ```

1. 

**Delete a group.**

   ```
   $ adcli delete-group "clusterteam" --domain "corp.example.com" -U "Admin"
   ```

# Create the cluster
<a name="tutorials_05_multi-user-ad-step3"></a>

If you haven't exited the Amazon EC2 instance, do so now.

The environment is set up to create a cluster that can authenticate users against the Active Directory (AD).

Create a simple cluster configuration and provide the settings relevant to connecting to the AD. For more information, see the [`DirectoryService`](DirectoryService-v3.md) section.

Choose one of the following cluster configurations and copy it to a file that's named `ldaps_config.yaml`, `ldaps_nocert_config.yaml`, or `ldap_config.yaml`.

We recommend that you choose the LDAPS configuration with certificate verification. If you choose this configuration, you must also copy the bootstrap script to a file that's named `active-directory.head.post.sh`. And, you must store it in an Amazon S3 bucket as indicated in the configuration file.

## LDAPS with certificate verification configuration (recommended)
<a name="tutorials_05_multi-user-ad-step3-ldaps"></a>

**Note**  
`KeyName`: One of your Amazon EC2 keypairs.
`SubnetId / SubnetIds`: One of the subnet IDs provided in the output of the CloudFormation quick create stack (automated tutorial) or python script (manual tutorial).
`Region`: The Region where you created the AD infrastructure.
`DomainAddr`: This IP address is one of the DNS addresses of your AD service.
`PasswordSecretArn`: The Amazon Resource Name (ARN) of the secret that contains the password for the `DomainReadOnlyUser`.
`BucketName`: The name of the bucket that holds the bootstrap script.
`AdditionalPolicies` / `Policy`: The Amazon Resource Name (ARN) of the read domain certification policy ReadCertExample.
`CustomActions` / `OnNodeConfigured` / `Args`: The Amazon Resource Name (ARN) of secret that holds the domain certification policy.
For better security posture, we suggest to use the `HeadNode` / `Ssh` / `AllowedIps` configuration to limit the SSH access to the head node.  
Notice that the certificate specified in `LdapTlsCaCert` must be accessible to all cluster nodes.

**Hard Requirements**  
The certificate specified in `LdapTlsCaCert` must be accessible to all cluster nodes.  
A node without access to the certificate will not be able to resolve users from the directory.

```
Region: region-id
Image:
  Os: alinux2
HeadNode: 
  InstanceType: t2.micro
  Networking:
    SubnetId: subnet-abcdef01234567890
  Ssh:
    KeyName: keypair
  Iam:
    AdditionalIamPolicies:
      - Policy: arn:aws:iam::123456789012:policy/ReadCertExample
    S3Access:
      - BucketName: amzn-s3-demo-bucket
        EnableWriteAccess: false
        KeyName: bootstrap/active-directory/active-directory.head.post.sh
  CustomActions:
    OnNodeConfigured:
      Script: s3://amzn-s3-demo-bucket/bootstrap/active-directory/active-directory.head.post.sh
      Args:
        - arn:aws:secretsmanager:region-id:123456789012:secret:example-cert-123abc
        - /opt/parallelcluster/shared/directory_service/domain-certificate.crt
Scheduling:
  Scheduler: slurm
  SlurmQueues:
    - Name: queue0
      ComputeResources:
        - Name: queue0-t2-micro
          InstanceType: t2.micro
          MinCount: 1
          MaxCount: 10         
      Networking:
        SubnetIds:
          - subnet-abcdef01234567890
DirectoryService:
  DomainName: corp.example.com
  DomainAddr: ldaps://corp.example.com
  PasswordSecretArn: arn:aws:secretsmanager:region-id:123456789012:secret:ADSecretPassword-1234
  DomainReadOnlyUser: cn=ReadOnlyUser,ou=Users,ou=CORP,dc=corp,dc=example,dc=com
  LdapTlsCaCert: /opt/parallelcluster/shared/directory_service/domain-certificate.crt
  LdapTlsReqCert: hard
```

**Bootstrap script**

After you create the bootstrap file and before you upload it to your S3 bucket, run `chmod +x active-directory.head.post.sh` to give AWS ParallelCluster run permission.

```
#!/bin/bash
set -e

CERTIFICATE_SECRET_ARN="$1"
CERTIFICATE_PATH="$2"

[[ -z $CERTIFICATE_SECRET_ARN ]] && echo "[ERROR] Missing CERTIFICATE_SECRET_ARN" && exit 1
[[ -z $CERTIFICATE_PATH ]] && echo "[ERROR] Missing CERTIFICATE_PATH" && exit 1

source /etc/parallelcluster/cfnconfig
REGION="${cfn_region:?}"

mkdir -p $(dirname $CERTIFICATE_PATH)
aws secretsmanager get-secret-value --region $REGION --secret-id $CERTIFICATE_SECRET_ARN --query SecretString --output text > $CERTIFICATE_PATH
```

## LDAPS without certificate verification configuration
<a name="tutorials_05_multi-user-ad-step3-ldaps-no-cert"></a>

**Note**  
`KeyName`: One of your Amazon EC2 keypairs.
`SubnetId / SubnetIds`: One of the subnet IDs that's in the output of the CloudFormation quick create stack (automated tutorial) or python script (manual tutorial).
`Region`: The Region where you created the AD infrastructure.
`DomainAddr`: This IP address is one of the DNS addresses of your AD service.
`PasswordSecretArn`: The Amazon Resource Name (ARN) of the secret that contains the password for the `DomainReadOnlyUser`.
For better security posture, we suggest to use the HeadNode/Ssh/AllowedIps configuration to limit the SSH access to the head node.

```
Region: region-id
Image:
  Os: alinux2
HeadNode: 
  InstanceType: t2.micro
  Networking:
    SubnetId: subnet-abcdef01234567890
  Ssh:
    KeyName: keypair
Scheduling:
  Scheduler: slurm
  SlurmQueues:
    - Name: queue0
      ComputeResources:
        - Name: queue0-t2-micro
          InstanceType: t2.micro
          MinCount: 1
          MaxCount: 10         
      Networking:
        SubnetIds:
          - subnet-abcdef01234567890
DirectoryService:
  DomainName: corp.example.com
  DomainAddr: ldaps://corp.example.com
  PasswordSecretArn: arn:aws:secretsmanager:region-id:123456789012:secret:ADSecretPassword-1234
  DomainReadOnlyUser: cn=ReadOnlyUser,ou=Users,ou=CORP,dc=corp,dc=example,dc=com
  LdapTlsReqCert: never
```

## LDAP configuration
<a name="tutorials_05_multi-user-ad-step3-ldap"></a>

**Note**  
`KeyName`: One of your Amazon EC2 keypairs.
`SubnetId / SubnetIds`: One of the subnet IDs provided in the output of the CloudFormation quick create stack (automated tutorial) or python script (manual tutorial).
`Region`: The Region where you created the AD infrastructure.
`DomainAddr`: This IP address is one of the DNS addresses of your AD service.
`PasswordSecretArn`: The Amazon Resource Name (ARN) of the secret that contains the password for the `DomainReadOnlyUser`.
For better security posture, we suggest to use the HeadNode/Ssh/AllowedIps configuration to limit the SSH access to the head node.

```
Region: region-id
Image:
  Os: alinux2
HeadNode: 
  InstanceType: t2.micro
  Networking:
    SubnetId: subnet-abcdef01234567890
  Ssh:
    KeyName: keypair
Scheduling:
  Scheduler: slurm
  SlurmQueues:
    - Name: queue0
      ComputeResources:
        - Name: queue0-t2-micro
          InstanceType: t2.micro
          MinCount: 1
          MaxCount: 10         
      Networking:
        SubnetIds:
          - subnet-abcdef01234567890
DirectoryService:
  DomainName: dc=corp,dc=example,dc=com
  DomainAddr: ldap://192.0.2.254,ldap://203.0.113.237
  PasswordSecretArn: arn:aws:secretsmanager:region-id:123456789012:secret:ADSecretPassword-1234
  DomainReadOnlyUser: cn=ReadOnlyUser,ou=Users,ou=CORP,dc=corp,dc=example,dc=com
  AdditionalSssdConfigs:
    ldap_auth_disable_tls_never_use_in_production: True
```

Create your cluster with the following command.

```
$ pcluster create-cluster --cluster-name "ad-cluster" --cluster-configuration "./ldaps_config.yaml"
{
  "cluster": {
    "clusterName": "pcluster",
    "cloudformationStackStatus": "CREATE_IN_PROGRESS",
    "cloudformationStackArn": "arn:aws:cloudformation:region-id:123456789012:stack/ad-cluster/1234567-abcd-0123-def0-abcdef0123456",
    "region": "region-id",
    "version": 3.15.0,
    "clusterStatus": "CREATE_IN_PROGRESS"
  }
}
```

# Connect to the cluster as a user
<a name="tutorials_05_multi-user-ad-step4"></a>

You can determine the status of the cluster with the following commands.

```
$ pcluster describe-cluster -n ad-cluster --region "region-id" --query "clusterStatus"
```

The output is as follows.

```
"CREATE_IN_PROGRESS" / "CREATE_COMPLETE"
```

When the status reaches `"CREATE_COMPLETE"`, log in with the created user name and password.

```
$ HEAD_NODE_IP=$(pcluster describe-cluster -n "ad-cluster" --region "region-id" --query headNode.publicIpAddress | xargs echo)
```

```
$ ssh user000@$HEAD_NODE_IP
```

You can log in without the password by providing the SSH key that was created for the new user at `/home/user000@HEAD_NODE_IP/.ssh/id_rsa`.

If the `ssh` command succeeded, you have successfully connected to the cluster as a user that's authenticated to use the Active Directory (AD).

# Clean up
<a name="tutorials_05_multi-user-ad-step5"></a>

1. 

**From your local machine, delete the cluster.**

   ```
   $ pcluster delete-cluster --cluster-name "ad-cluster" --region "region-id"
   {
     "cluster": {
       "clusterName": "ad-cluster",
       "cloudformationStackStatus": "DELETE_IN_PROGRESS",
       "cloudformationStackArn": "arn:aws:cloudformation:region-id:123456789012:stack/ad-cluster/1234567-abcd-0123-def0-abcdef0123456",
       "region": "region-id",
       "version": "3.15.0",
       "clusterStatus": "DELETE_IN_PROGRESS"
     }
   }
   ```

1. 

**Check the progress of the cluster being deleted.**

   ```
   $ pcluster describe-cluster --cluster-name "ad-cluster" --region "region-id" --query "clusterStatus"
   "DELETE_IN_PROGRESS"
   ```

   After the cluster is successfully deleted, proceed to the next step.

## Automated
<a name="tutorials_05_multi-user-ad-step5-automated"></a>

**Delete the Active Directory resources**

1. From [https://console.aws.amazon.com/cloudformation/](https://console.aws.amazon.com/cloudformation/).

1. In the navigation pane, choose **Stacks**.

1. From the list of stacks, choose the AD stack (for example, `pcluster-ad`).

1. Choose **Delete**.

## Manual
<a name="tutorials_05_multi-user-ad-step5-manual"></a>

1. 

**Delete the Amazon EC2 instance.**

   1. From [https://console.aws.amazon.com/ec2/](https://console.aws.amazon.com/ec2/), choose **Instances** in the navigation pane.

   1. From the list of instances, choose the instance that you created to add users to the directory.

   1. Choose **Instance state**, then **Terminate instance**.

1. 

**Delete the hosted zone.**

   1. Create a `recordset-delete.json` with the following content. In this example, HostedZoneId is the canonical hosted zone ID of the load balancer.

      ```
      {
        "Changes": [
          {
            "Action": "DELETE",
            "ResourceRecordSet": {
              "Name": "corp.example.com",
              "Type": "A",
              "Region": "region-id",
              "SetIdentifier": "pcluster-active-directory",
              "AliasTarget": {
                "HostedZoneId": "Z2IFOLAFXWLO4F",
                "DNSName": "CorpExampleCom-NLB-3afe296bf4ba80d4.elb.region-id.amazonaws.com",
                "EvaluateTargetHealth": true
              }
            }
          }
        ]
      }
      ```

   1. Submit the recordset change to the hosted zone using the hosted zone ID.

      ```
      $ aws route53 change-resource-record-sets --hosted-zone-id Z09020002B5MZQNXMSJUB \
        --change-batch file://recordset-delete.json
      {
       "ChangeInfo": {
           "Id": "/change/C04853642A0TH2TJ5NLNI",
           "Status": "PENDING",
           "SubmittedAt": "2022-05-05T14:25:51.046000+00:00"
       }
      }
      ```

   1. Delete the hosted zone.

      ```
      $ aws route53 delete-hosted-zone --id Z09020002B5MZQNXMSJUB
      {
       "ChangeInfo": {
           "Id": "/change/C0468051QFABTVHMDEG9",
           "Status": "PENDING",
           "SubmittedAt": "2022-05-05T14:26:13.814000+00:00"
       }
      }
      ```

1. 

**Delete the LB listener.**

   ```
   $ aws elbv2 delete-listener \
     --listener-arn arn:aws:elasticloadbalancing:region-id:123456789012:listener/net/CorpExampleCom-NLB/3afe296bf4ba80d4/a8f9d97318743d4b --region region-id
   ```

1. 

**Delete the target group.**

   ```
   $ aws elbv2 delete-target-group \
     --target-group-arn arn:aws:elasticloadbalancing:region-id:123456789012:targetgroup/CorpExampleCom-Targets/44577c583b695e81 --region region-id
   ```

1. 

**Delete the load balancer.**

   ```
   $ aws elbv2 delete-load-balancer \
     --load-balancer-arn arn:aws:elasticloadbalancing:region-id:123456789012:loadbalancer/net/CorpExampleCom-NLB/3afe296bf4ba80d4 --region region-id
   ```

1. 

**Delete the policy that the cluster uses to read the certificate from Secrets Manager.**

   ```
   $ aws iam delete-policy --policy-arn arn:aws:iam::123456789012:policy/ReadCertExample
   ```

1. 

**Delete the secret that contains the domain certificate.**

   ```
   $ aws secretsmanager delete-secret \
     --secret-id arn:aws:secretsmanager:region-id:123456789012:secret:example-cert-123abc \
     --region region-id
   {
    "ARN": "arn:aws:secretsmanager:region-id:123456789012:secret:example-cert-123abc",
    "Name": "example-cert",
    "DeletionDate": "2022-06-04T16:27:36.183000+02:00"
   }
   ```

1. 

**Delete the certificate from ACM.**

   ```
   $ aws acm delete-certificate \
     --certificate-arn arn:aws:acm:region-id:123456789012:certificate/343db133-490f-4077-b8d4-3da5bfd89e72 --region region-id
   ```

1. 

**Delete the Active Directory (AD) resources.**

   1. Get the following resource IDs from the output of the python script `ad.py`:
      + AD ID
      + AD subnet IDs
      + AD VPC ID

   1. Delete the directory by running the following command.

      ```
      $ aws ds delete-directory --directory-id d-abcdef0123456789 --region region-id
      {
         "DirectoryId": "d-abcdef0123456789"
      }
      ```

   1. List the security groups in the VPC.

      ```
      $ aws ec2 describe-security-groups --filters '[{"Name":"vpc-id","Values":["vpc-07614ade95ebad1bc"]}]' --region region-id
      ```

   1. Delete the custom security group.

      ```
      $ aws ec2 delete-security-group --group-id sg-021345abcdef6789 --region region-id
      ```

   1. Delete the subnets.

      ```
      $ aws ec2 delete-subnet --subnet-id subnet-1234567890abcdef --region region-id
      ```

      ```
      $ aws ec2 delete-subnet --subnet-id subnet-021345abcdef6789 --region region-id
      ```

   1. Describe internet gateway.

      ```
      $ aws ec2 describe-internet-gateways \
        --filters Name=attachment.vpc-id,Values=vpc-021345abcdef6789 \
        --region region-id
      {
        "InternetGateways": [
          {
            "Attachments": [
              {
                "State": "available",
                "VpcId": "vpc-021345abcdef6789"
              }
            ],
            "InternetGatewayId": "igw-1234567890abcdef",
            "OwnerId": "123456789012",
            "Tags": []
          }
        ]  
      }
      ```

   1. Detach the internet gateway.

      ```
      $ aws ec2 detach-internet-gateway \
        --internet-gateway-id igw-1234567890abcdef \
        --vpc-id vpc-021345abcdef6789 \
        --region region-id
      ```

   1. Delete the internet gateway.

      ```
      $ aws ec2 delete-internet-gateway \
        --internet-gateway-id igw-1234567890abcdef \
        --region region-id
      ```

   1. Delete the VPC.

      ```
      $ aws ec2 delete-vpc \
        --vpc-id vpc-021345abcdef6789 \
        --region region-id
      ```

   1. Delete the secret that contains the `ReadOnlyUser` password.

      ```
      $ aws secretsmanager delete-secret \
        --secret-id arn:aws:secretsmanager:region-id:123456789012:secret:ADSecretPassword-1234" \
        --region region-id
      ```

# Configuring shared storage encryption with an AWS KMS key
<a name="tutorials_04_encrypted_kms_fs-v3"></a>

Learn how to set up a customer managed AWS KMS key to encrypt and protect your data in the cluster file storage systems that are configured for AWS ParallelCluster.

When using the AWS ParallelCluster command line interface (CLI) or API, you only pay for the AWS resources that are created when you create or update AWS ParallelCluster images and clusters. For more information, see [AWS services used by AWS ParallelCluster](aws-services-v3.md).

AWS ParallelCluster supports following shared storage configuration options:
+ [`SharedStorage`](SharedStorage-v3.md) / [`EbsSettings`](SharedStorage-v3.md#SharedStorage-v3-EbsSettings) / [`KmsKeyId`](SharedStorage-v3.md#yaml-SharedStorage-EbsSettings-KmsKeyId)
+ [`SharedStorage`](SharedStorage-v3.md) / [`EfsSettings`](SharedStorage-v3.md#SharedStorage-v3-EfsSettings) / [`KmsKeyId`](SharedStorage-v3.md#yaml-SharedStorage-EfsSettings-KmsKeyId)
+ [`SharedStorage`](SharedStorage-v3.md) / [`FsxLustreSettings`](SharedStorage-v3.md#SharedStorage-v3-FsxLustreSettings) / [`KmsKeyId`](SharedStorage-v3.md#yaml-SharedStorage-FsxLustreSettings-KmsKeyId)

You can use these options to provide a customer managed AWS KMS key for Amazon EBS, Amazon EFS, and FSx for Lustre shared storage system encryption. To use them, you must create and configure an IAM policy for the following:
+ [`HeadNode`](HeadNode-v3.md) / [`Iam`](HeadNode-v3.md#HeadNode-v3-Iam) / [`AdditionalIamPolicies`](HeadNode-v3.md#yaml-HeadNode-Iam-AdditionalIamPolicies) / [`Policy`](HeadNode-v3.md#yaml-HeadNode-Iam-AdditionalIamPolicies-Policy)
+ [`Scheduler`](Scheduling-v3.md#yaml-Scheduling-Scheduler) / [`SlurmQueues`](Scheduling-v3.md#Scheduling-v3-SlurmQueues) / [`Iam`](Scheduling-v3.md#Scheduling-v3-SlurmQueues-Iam) / [`AdditionalIamPolicies`](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-Iam-AdditionalIamPolicies) / [`Policy`](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-Iam-AdditionalIamPolicies-Policy) 

**Prerequisites**
+ AWS ParallelCluster [is installed](install-v3-parallelcluster.md).
+ The AWS CLI [is installed and configured.](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)
+ You have an [Amazon EC2 key pair](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html).
+ You have an IAM role with the [permissions](iam-roles-in-parallelcluster-v3.md#iam-roles-in-parallelcluster-v3-example-user-policies) that are required to run the [`pcluster`](pcluster-v3.md) CLI.

**Topics**
+ [Create the policy](creating-the-role-v3.md)
+ [Configure and create the cluster](creating-the-cluster-v3.md)

# Create the policy
<a name="creating-the-role-v3"></a>

In this tutorial, you will create a policy for configuring shared storage encryption with an AWS KMS key.

**Create a policy.**

1. Go to the IAM Console: [https://console.aws.amazon.com/iam/home](https://console.aws.amazon.com/iam/home).

1. Choose **Policies**.

1. Choose **Create policy**.

1. Choose the **JSON** tab and paste in the following policy. Make sure to replace all occurrences of `123456789012` with your AWS account ID and the key Amazon Resource Name (ARN) and AWS Region with that of your own.

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Effect": "Allow",
               "Action": [
                   "kms:DescribeKey",
                   "kms:ReEncrypt*",
                   "kms:CreateGrant",
                   "kms:Decrypt"
               ],
               "Resource": [
                   "arn:aws:kms:us-east-1:123456789012:key/abcd1234-ef56-gh78-ij90-abcd1234efgh5678"
               ]
           }
       ]
   }
   ```

------

1. For this tutorial, name the policy `ParallelClusterKmsPolicy`, and then choose **Create Policy**.

1. Make a note of the policy ARN. You need it to configure your cluster.

# Configure and create the cluster
<a name="creating-the-cluster-v3"></a>

The following is an example cluster configuration that includes an Amazon Elastic Block Store shared file system with encryption.

```
Region: eu-west-1
Image:
  Os: alinux2
HeadNode:
  InstanceType: t2.micro
  Networking:
    SubnetId: subnet-abcdef01234567890
  Ssh:
    KeyName: my-ssh-key
  Iam:
    AdditionalIamPolicies:
      - Policy: arn:aws:iam::123456789012:policy/ParallelClusterKmsPolicy
Scheduling:
  Scheduler: slurm
  SlurmQueues:
    - Name: q1
      ComputeResources:
        - Name: t2micro
          InstanceType: t2.micro
          MinCount: 0
          MaxCount: 10
      Networking:
        SubnetIds:
          - subnet-abcdef01234567890
      Iam:
        AdditionalIamPolicies:
          - Policy: arn:aws:iam::123456789012:policy/ParallelClusterKmsPolicy
SharedStorage:
  - MountDir: /shared/ebs1
    Name: shared-ebs1
    StorageType: Ebs
    EbsSettings:
      Encrypted: True
      KmsKeyId: abcd1234-ef56-gh78-ij90-abcd1234efgh5678
```

Replace the items in red text with your own values. Then, create a cluster that uses your AWS KMS key to encrypt your data in Amazon EBS.

The configuration is similar for Amazon EFS and FSx for Lustre file systems.

The Amazon EFS `SharedStorage` configuration is as follows.

```
...
SharedStorage:
  - MountDir: /shared/efs1
    Name: shared-efs1
    StorageType: Efs
    EfsSettings:
      Encrypted: True
      KmsKeyId: abcd1234-ef56-gh78-ij90-abcd1234efgh5678
```

The FSx for Lustre `SharedStorage` configuration is as follows.

```
...
SharedStorage:
  - MountDir: /shared/fsx1
    Name: shared-fsx1
    StorageType: FsxLustre
    FsxLustreSettings:
      StorageCapacity: 1200
      DeploymentType: PERSISTENT_1
      PerUnitStorageThroughput: 200
      KmsKeyId: abcd1234-ef56-gh78-ij90-abcd1234efgh5678
```

# Running jobs in a multiple queue mode cluster
<a name="multi-queue-tutorial-v3"></a>

This tutorial covers how to run your first "Hello World" job on AWS ParallelCluster with [multiple queue mode](configuration-of-multiple-queues-v3.md).

When using the AWS ParallelCluster command line interface (CLI) or API, you only pay for the AWS resources that are created when you create or update AWS ParallelCluster images and clusters. For more information, see [AWS services used by AWS ParallelCluster](aws-services-v3.md).

**Prerequisites**
+ AWS ParallelCluster [is installed](install-v3-parallelcluster.md).
+ The AWS CLI [is installed and configured.](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)
+ You have an [Amazon EC2 key pair](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html).
+ You have an IAM role with the [permissions](iam-roles-in-parallelcluster-v3.md#iam-roles-in-parallelcluster-v3-example-user-policies) that are required to run the [`pcluster`](pcluster-v3.md) CLI.

## Configure your cluster
<a name="multi-queue-tutorial-v3-configure-cluster"></a>

First, verify that AWS ParallelCluster is correctly installed by running the following command.

```
$ pcluster version
```

For more information about `pcluster version`, see [`pcluster version`](pcluster.version-v3.md).

This command returns the running version of AWS ParallelCluster.

Next, run `pcluster configure` to generate a basic configuration file. Follow all the prompts that follow this command.

```
$ pcluster configure --config multi-queue-mode.yaml
```

For more information about the `pcluster configure` command, see [`pcluster configure`](pcluster.configure-v3.md).

After you complete this step, a basic configuration file named `multi-queue-mode.yaml` appears. This file contains a basic cluster configuration.

In the next step, you modify your new configuration file and launch a cluster with multiple queues.

**Note**  
Some instances that this tutorial uses aren't free-tier eligible.

For this tutorial, modify your configuration file to match the following configuration. The items that are highlighted in red represent your configuration file values. Keep your own values.

```
Region: region-id
Image:
 Os: alinux2
HeadNode:
 InstanceType: c5.xlarge
 Networking:
   SubnetId: subnet-abcdef01234567890
 Ssh:
   KeyName: yourkeypair
Scheduling:
 Scheduler: slurm
 SlurmQueues:
 - Name: spot
   ComputeResources:
   - Name: c5xlarge
     InstanceType: c5.xlarge
     MinCount: 1
     MaxCount: 10
   - Name: t2micro
     InstanceType: t2.micro
     MinCount: 1
     MaxCount: 10
   Networking:
     SubnetIds:
     - subnet-abcdef01234567890
 - Name: ondemand
   ComputeResources:
   - Name: c52xlarge
     InstanceType: c5.2xlarge
     MinCount: 0
     MaxCount: 10
   Networking:
     SubnetIds:
     - subnet-021345abcdef6789
```

## Create your cluster
<a name="multi-queue-tutorial-v3-create-cluster"></a>

Create a cluster that's named `multi-queue-cluster` based on your configuration file.

```
$ pcluster create-cluster --cluster-name multi-queue-cluster --cluster-configuration multi-queue-mode.yaml
{
 "cluster": {
   "clusterName": "multi-queue-cluster",
   "cloudformationStackStatus": "CREATE_IN_PROGRESS",
   "cloudformationStackArn": "arn:aws:cloudformation:eu-west-1:123456789012:stack/multi-queue-cluster/1234567-abcd-0123-def0-abcdef0123456",
   "region": "eu-west-1",
   "version": "3.15.0",
   "clusterStatus": "CREATE_IN_PROGRESS"
 }
}
```

For more information about the `pcluster create-cluster` command, see [`pcluster create-cluster`](pcluster.create-cluster-v3.md).

To check the status of the cluster, run the following command.

```
$ pcluster list-clusters
{
 "cluster": {
   "clusterName": "multi-queue-cluster",
   "cloudformationStackStatus": "CREATE_IN_PROGRESS",
   "cloudformationStackArn": "arn:aws:cloudformation:eu-west-1:123456789012:stack/multi-queue-cluster/1234567-abcd-0123-def0-abcdef0123456",
   "region": "eu-west-1",
   "version": "3.15.0",
   "clusterStatus": "CREATE_IN_PROGRESS"
 }
}
```

When the cluster is created, the `clusterStatus` field shows `CREATE_COMPLETE`.

## Log in to the head node
<a name="multi-queue-tutorial-v3-log-into-head-node"></a>

Use your private SSH key file to log in to the head node.

```
$ pcluster ssh --cluster-name multi-queue-cluster -i ~/path/to/yourkeyfile.pem
```

For more information about `pcluster ssh`, see [`pcluster ssh`](pcluster.ssh-v3.md).

After logging in, run the `sinfo` command to verify that your scheduler queues are set up and configured.

For more information about `sinfo`, see [sinfo](https://slurm.schedmd.com/sinfo.html) in the *Slurm documentation*.

```
$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
spot*        up   infinite     18  idle~ spot-dy-c5xlarge-[1-9],spot-dy-t2micro-[1-9]
spot*        up   infinite      2  idle  spot-st-c5xlarge-1,spot-st-t2micro-1
ondemand     up   infinite     10  idle~ ondemand-dy-c52xlarge-[1-10]
```

The output shows that you have one `t2.micro` and one `c5.xlarge` compute node in the `idle` state that are available in your cluster.

Other nodes are all in the power saving state, indicated by the `~` suffix in node state, with no Amazon EC2 instances backing them. The default queue is indicated by a `*` suffix after its queue name. `spot` is your default job queue.

## Run job in multiple queue mode
<a name="multi-queue-tutorial-v3-running-job-mqm"></a>

Next, try to run a job to sleep for a while. The job later outputs its own hostname. Make sure that this script can be run by the current user.

```
$ tee <<EOF hellojob.sh
#!/bin/bash
sleep 30
echo "Hello World from \$(hostname)"
EOF

$ chmod +x hellojob.sh
$ ls -l hellojob.sh
-rwxrwxr-x 1 ec2-user ec2-user 57 Sep 23 21:57 hellojob.sh
```

Submit the job using the `sbatch` command. Request two nodes for this job with the `-N 2` option, and verify that the job submits successfully. For more information about `sbatch`, see [https://slurm.schedmd.com/sbatch.html](https://slurm.schedmd.com/sbatch.html) in the *Slurm documentation*.

```
$ sbatch -N 2 --wrap "srun hellojob.sh"
Submitted batch job 1
```

You can view your queue and check the status of the job with the `squeue` command. Because you didn't specify a specific queue, the default queue (`spot`) is used. For more information about `squeue`, see [https://slurm.schedmd.com/squeue.html](https://slurm.schedmd.com/squeue.html) in the *Slurm documentation*.

```
$ squeue
JOBID PARTITION     NAME     USER  ST      TIME  NODES NODELIST(REASON)
   1      spot     wrap ec2-user  R       0:10      2 spot-st-c5xlarge-1,spot-st-t2micro-1
```

The output shows that the job is currently in a running state. Wait for the job to finish. This takes about 30 seconds. Then, run `squeue` again.

```
$ squeue
JOBID PARTITION     NAME     USER          ST       TIME  NODES NODELIST(REASON)
```

Now that the jobs in the queue have all finished, look for the output file that's named `slurm-1.out` in your current directory.

```
$ cat slurm-1.out
Hello World from spot-st-t2micro-1
Hello World from spot-st-c5xlarge-1
```

The output shows that the job ran successfully on the `spot-st-t2micro-1` and `spot-st-c5xlarge-1` nodes.

Now submit the same job by specifying constraints for specific instances with the following commands.

```
$ sbatch -N 3 -p spot -C "[c5.xlarge*1&t2.micro*2]" --wrap "srun hellojob.sh"
Submitted batch job 2
```

You used these parameters for `sbatch`:
+ `-N 3`– requests three nodes.
+ `-p spot`– submits the job to the `spot` queue. You can also submit a job to the `ondemand` queue by specifying `-p ondemand`.
+ `-C "[c5.xlarge*1&t2.micro*2]"`– specifies the specific node constraints for this job. This requests one `c5.xlarge` node and two `t2.micro` nodes to be used for this job.

Run the `sinfo` command to view the nodes and queues. Queues in AWS ParallelCluster are called partitions in Slurm.

```
$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
spot*        up   infinite      1  alloc# spot-dy-t2micro-1
spot*        up   infinite     17  idle~ spot-dy-c5xlarge-[2-10],spot-dy-t2micro-[2-9]
spot*        up   infinite      1  mix   spot-st-c5xlarge-1
spot*        up   infinite      1  alloc spot-st-t2micro-1
ondemand     up   infinite     10  idle~ ondemand-dy-c52xlarge-[1-10]
```

The nodes are powering up. This is indicated by the `#` suffix on the node state. Run the squeue command to view information about the jobs in the cluster.

```
$ squeue
JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
   2      spot     wrap ec2-user CF       0:04      3 spot-dy-c5xlarge-1,spot-dy-t2micro-1,spot-st-t2micro-1
```

Your job is in the `CF` (CONFIGURING) state, waiting for instances to scale up and join the cluster.

After about three minutes, the nodes are available and the job enters the `R` (RUNNING) state.

```
$ squeue
JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
   2      spot     wrap ec2-user  R       0:07      3 spot-dy-t2micro-1,spot-st-c5xlarge-1,spot-st-t2micro-1
```

The job finishes, and all three nodes are in the `idle` state.

```
$ squeue
JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
spot*        up   infinite     17  idle~ spot-dy-c5xlarge-[1-9],spot-dy-t2micro-[2-9]
spot*        up   infinite      3  idle  spot-dy-t2micro-1,spot-st-c5xlarge-1,spot-st-t2micro-1
ondemand     up   infinite     10  idle~ ondemand-dy-c52xlarge-[1-10]
```

Then, after no jobs remain in the queue, check for `slurm-2.out` in your local directory.

```
$ cat slurm-2.out 
Hello World from spot-st-t2micro-1
Hello World from spot-dy-t2micro-1
Hello World from spot-st-c5xlarge-1
```

This is the final state of the cluster.

```
$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
spot*        up   infinite     17  idle~ spot-dy-c5xlarge-[1-9],spot-dy-t2micro-[2-9]
spot*        up   infinite      3  idle  spot-dy-t2micro-1,spot-st-c5xlarge-1,spot-st-t2micro-1
ondemand     up   infinite     10  idle~ ondemand-dy-c52xlarge-[1-10]
```

After logging off of the cluster, you can clean up by running `pcluster delete-cluster`. For more information, see [`pcluster list-clusters`](pcluster.list-clusters-v3.md) and [`pcluster delete-cluster`](pcluster.delete-cluster-v3.md).

```
$ pcluster list-clusters
{
 "clusters": [
   {
     "clusterName": "multi-queue-cluster",
     "cloudformationStackStatus": "CREATE_COMPLETE",
     "cloudformationStackArn": "arn:aws:cloudformation:eu-west-1:123456789012:stack/multi-queue-cluster/1234567-abcd-0123-def0-abcdef0123456",
     "region": "eu-west-1",
     "version": "3.1.4",
     "clusterStatus": "CREATE_COMPLETE"
   }
 ]
}
$ pcluster delete-cluster -n multi-queue-cluster
{
 "cluster": {
   "clusterName": "multi-queue-cluster",
   "cloudformationStackStatus": "DELETE_IN_PROGRESS",
   "cloudformationStackArn": "arn:aws:cloudformation:eu-west-1:123456789012:stack/multi-queue-cluster/1234567-abcd-0123-def0-abcdef0123456",
   "region": "eu-west-1",
   "version": "3.1.4",
   "clusterStatus": "DELETE_IN_PROGRESS"
 }
}
```

# Using the AWS ParallelCluster API
<a name="tutorials_06_API_use"></a>

In this tutorial, you build and test the API with [Amazon API Gateway](https://docs.aws.amazon.com/apigateway/latest/developerguide/welcome.html) and an AWS ParallelCluster CloudFormation template. Then, you use the example client available on GitHub to use the API. For more information about using the API, see the [AWS ParallelCluster API](api-reference-v3.md).

For more information, see [ Create a custom component with Image Builder](https://docs.aws.amazon.com/imagebuilder/latest/userguide/create-component.html) in the *EC2 Image Builder User Guide*.

When using the AWS ParallelCluster command line interface (CLI) or API, you only pay for the AWS resources that are created when you create or update AWS ParallelCluster images and clusters. For more information, see [AWS services used by AWS ParallelCluster](aws-services-v3.md).

**Prerequisites**
+ The AWS CLI is [installed](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) and configured in your compute environment.
+ AWS ParallelCluster is installed in a virtual environment. For more information, see [Install AWS ParallelCluster in a virtual environment (recommended)](install-v3-virtual-environment.md).
+ You have an [Amazon EC2 key pair](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html).
+ You have an IAM role with the [permissions](iam-roles-in-parallelcluster-v3.md#iam-roles-in-parallelcluster-v3-example-user-policies) that are required to run the [`pcluster`](pcluster-v3.md) CLI.



## Step 1: Build the API with Amazon API Gateway
<a name="tutorials_06_multi-API-use-step1"></a>

**Stay in your home user directory and activate your virtual environment:**

1. Install a helpful JSON command line processor.

   ```
   $ sudo yum groupinstall -y "Development Tools"
    sudo yum install -y jq python3-devel
   ```

1. Run the following command to get your AWS ParallelCluster version and assign it to an environment variable.

   ```
   $ PCLUSTER_VERSION=$(pcluster version | jq -r '.version')
    echo "export PCLUSTER_VERSION=${PCLUSTER_VERSION}" |tee -a ~/.bashrc
   ```

1. Create an environment variable and assign your Region ID to it.

   ```
   $ export AWS_DEFAULT_REGION="us-east-1"
    echo "export AWS_DEFAULT_REGION=${AWS_DEFAULT_REGION}" |tee -a ~/.bashrc
   ```

1. Run the following commands to deploy the API.

   ```
   API_STACK_NAME="pc-api-stack"
    echo "export API_STACK_NAME=${API_STACK_NAME}" |tee -a ~/.bashrc
   ```

   ```
   aws cloudformation create-stack \
      --region ${AWS_DEFAULT_REGION} \
      --stack-name ${API_STACK_NAME} \
      --template-url https://${AWS_DEFAULT_REGION}-aws-parallelcluster.s3.${AWS_DEFAULT_REGION}.amazonaws.com/parallelcluster/${PCLUSTER_VERSION}/api/parallelcluster-api.yaml \
      --capabilities CAPABILITY_NAMED_IAM CAPABILITY_AUTO_EXPAND \
      --parameters ParameterKey=EnableIamAdminAccess,ParameterValue=true
        
       {
          "StackId": "arn:aws:cloudformation:us-east-1:123456789012:stack/my-api-stack/abcd1234-ef56-gh78-ei90-1234abcd5678"
       }
   ```

   After the process completes, proceed to the next step.

## Step 2: Test the API in the Amazon API Gateway console
<a name="tutorials_06_multi-API-use-step2"></a>

1. Sign in to the AWS Management Console.

1. Navigate to the [Amazon API Gateway console](https://console.aws.amazon.com/apigateway/home).

1. Choose your API deployment.  
![\[Amazon API Gateway console with list of your gateways that you can choose from.\]](http://docs.aws.amazon.com/parallelcluster/latest/ug/images/gateway_choose.png)

1. Choose **Stages** and select a stage.  
![\[A console view of the stages that you can choose from. You can also view the URL that API Gateway provides for your API.\]](http://docs.aws.amazon.com/parallelcluster/latest/ug/images/gateway_address.png)

1. Note the URL that API Gateway provides for accessing or invoking your API. It's highlighted in blue.

1. Choose **Resources**, and select **`GET`** under **`/clusters`**.

1. Choose the **TEST** icon and then scroll down and choose **TEST** icon.  
![\[A console view of the API resources and test mechanisms.\]](http://docs.aws.amazon.com/parallelcluster/latest/ug/images/gateway_test.png)

   The response to your `/clusters GET` appears.  
![\[A console view of the API resources, test mechanisms, and the response from your test request.\]](http://docs.aws.amazon.com/parallelcluster/latest/ug/images/gateway.png)

## Step 3: Prepare and test an example client to invoke the API
<a name="tutorials_06_multi-API-use-step3"></a>



Clone the AWS ParallelCluster source code, `cd` to the `api` directory, and install the Python client libraries.

1. 

   ```
   $ git clone -b v${PCLUSTER_VERSION} https://github.com/aws/aws-parallelcluster aws-parallelcluster-v${PCLUSTER_VERSION}
    cd aws-parallelcluster-v${PCLUSTER_VERSION}/api
   ```

   ```
   $ pip3 install client/src
   ```

1. Navigate back to your home user directory.

1. Export the API Gateway base URL that the client uses when running.

   ```
   $ export PCLUSTER_API_URL=$( aws cloudformation describe-stacks --stack-name ${API_STACK_NAME} --query 'Stacks[0].Outputs[?OutputKey==`ParallelClusterApiInvokeUrl`].OutputValue' --output text )
    echo "export PCLUSTER_API_URL=${PCLUSTER_API_URL}" |tee -a ~/.bashrc
   ```

1. Export a cluster name that the client uses to create a cluster.

   ```
   $ export CLUSTER_NAME="test-api-cluster"
    echo "export CLUSTER_NAME=${CLUSTER_NAME}" |tee -a ~/.bashrc
   ```

1. Run the following commands to store the credentials that the example client uses to access the API.

   ```
   $ export PCLUSTER_API_USER_ROLE=$( aws cloudformation describe-stacks --stack-name ${API_STACK_NAME} --query 'Stacks[0].Outputs[?OutputKey==`ParallelClusterApiUserRole`].OutputValue' --output text )
    echo "export PCLUSTER_API_USER_ROLE=${PCLUSTER_API_USER_ROLE}" |tee -a ~/.bashrc
   ```

## Step 4: Copy client code script and run cluster tests
<a name="tutorials_06_multi-API-use-step4"></a>

1. Copy the following example client code to `test_pcluster_client.py` in your home user directory. The client code makes requests to do the following:
   + Create the cluster.
   + Describe the cluster.
   + List the clusters.
   + Describe the compute fleet.
   + Describe the cluster instances.

   ```
   # Copyright 2021 Amazon.com, Inc. or its affiliates. All Rights Reserved.
   # SPDX-License-Identifier: MIT-0
   #
   # Permission is hereby granted, free of charge, to any person obtaining a copy of this
   # software and associated documentation files (the "Software"), to deal in the Software
   # without restriction, including without limitation the rights to use, copy, modify,
   # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to
   # permit persons to whom the Software is furnished to do so.
   #
   # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
   # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
   # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
   # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
   # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
   # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
   #
   #  Author: Evan F. Bollig (Github: bollig)
   
   import time, datetime
   import os
   import pcluster_client
   from pprint import pprint
   from pcluster_client.api import (
       cluster_compute_fleet_api,
       cluster_instances_api,
       cluster_operations_api
   )
   from pcluster_client.model.create_cluster_request_content import CreateClusterRequestContent
   from pcluster_client.model.cluster_status import ClusterStatus
   region=os.environ.get("AWS_DEFAULT_REGION")
   
   # Defining the host is optional and defaults to http://localhost
   # See configuration.py for a list of all supported configuration parameters.
   configuration = pcluster_client.Configuration(
       host = os.environ.get("PCLUSTER_API_URL")
   )
   cluster_name=os.environ.get("CLUSTER_NAME")
   
   # Enter a context with an instance of the API client
   with pcluster_client.ApiClient(configuration) as api_client:
       cluster_ops = cluster_operations_api.ClusterOperationsApi(api_client)
       fleet_ops = cluster_compute_fleet_api.ClusterComputeFleetApi(api_client)
       instance_ops = cluster_instances_api.ClusterInstancesApi(api_client)
       
       # Create cluster
       build_done = False
       try:
           with open('cluster-config.yaml', encoding="utf-8") as f:
               body = CreateClusterRequestContent(cluster_name=cluster_name, cluster_configuration=f.read())
               api_response = cluster_ops.create_cluster(body, region=region)
       except pcluster_client.ApiException as e:
           print("Exception when calling create_cluster: %s\n" % e)
           build_done = True
       time.sleep(60)
       
       # Confirm cluster status with describe_cluster
       while not build_done:
           try:
               api_response = cluster_ops.describe_cluster(cluster_name, region=region)
               pprint(api_response)
               if api_response.cluster_status == ClusterStatus('CREATE_IN_PROGRESS'):
                   print('. . . working . . .', end='', flush=True)
                   time.sleep(60)
               elif api_response.cluster_status == ClusterStatus('CREATE_COMPLETE'):
                   print('READY!')
                   build_done = True
               else:
                   print('ERROR!!!!')
                   build_done = True    
           except pcluster_client.ApiException as e:
               print("Exception when calling describe_cluster: %s\n" % e)  
    
       # List clusters
       try:
           api_response = cluster_ops.list_clusters(region=region)
           pprint(api_response)
       except pcluster_client.ApiException as e:
           print("Exception when calling list_clusters: %s\n" % e)
                   
       # DescribeComputeFleet
       try:
           api_response = fleet_ops.describe_compute_fleet(cluster_name, region=region)
           pprint(api_response)
       except pcluster_client.ApiException as e:
           print("Exception when calling compute fleet: %s\n" % e)
   
       # DescribeClusterInstances
       try:
           api_response = instance_ops.describe_cluster_instances(cluster_name, region=region)
           pprint(api_response)
       except pcluster_client.ApiException as e:
           print("Exception when calling describe_cluster_instances: %s\n" % e)
   ```

1. Create a cluster configuration.

   ```
   $ pcluster configure --config cluster-config.yaml
   ```

1. The API Client library automatically detects configuration details from your environment variables (for example, `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, or `AWS_SESSION_TOKEN`) or `$HOME/.aws`. The following command switches your current IAM role to the designated ParallelClusterApiUserRole.

   ```
   $  eval $(aws sts assume-role --role-arn ${PCLUSTER_API_USER_ROLE} --role-session-name ApiTestSession | jq -r '.Credentials | "export AWS_ACCESS_KEY_ID=\(.AccessKeyId)\nexport AWS_SECRET_ACCESS_KEY=\(.SecretAccessKey)\nexport AWS_SESSION_TOKEN=\(.SessionToken)\n"')
   ```

   **Error to watch for:**

   If you see an error similar to the following, you already assumed the ParallelClusterApiUserRole and your `AWS_SESSION_TOKEN` has expired.

   ```
   An error occurred (AccessDenied) when calling the AssumeRole operation: 
   User: arn:aws:sts::XXXXXXXXXXXX:assumed-role/ParallelClusterApiUserRole-XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX/ApiTestSession 
   is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::XXXXXXXXXXXX:role/ParallelClusterApiUserRole-XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
   ```

   Drop the role and then re-run the `aws sts assume-role` command to use the ParallelClusterApiUserRole.

   ```
   $ unset AWS_SESSION_TOKEN
   unset AWS_SECRET_ACCESS_KEY
   unset AWS_ACCESS_KEY_ID
   ```

   To provide your current user permissions for API access, you must [expand the Resource Policy](https://docs.aws.amazon.com/apigateway/latest/developerguide/apigateway-resource-policies.html).

1. Run the following command to test the example client.

   ```
   $ python3 test_pcluster_client.py
   {'cluster_configuration': 'Region: us-east-1\n'
                             'Image:\n'
                             '  Os: alinux2\n'
                             'HeadNode:\n'
                             '  InstanceType: t2.micro\n'
                             '  Networking . . . :\n'
                             '    SubnetId: subnet-1234567890abcdef0\n'
                             '  Ssh:\n'
                             '    KeyName: adpc\n'
                             'Scheduling:\n'
                             '  Scheduler: slurm\n'
                             '  SlurmQueues:\n'
                             '  - Name: queue1\n'
                             '    ComputeResources:\n'
                             '    - Name: t2micro\n'
                             '      InstanceType: t2.micro\n'
                             '      MinCount: 0\n'
                             '      MaxCount: 10\n'
                             '    Networking . . . :\n'
                             '      SubnetIds:\n'
                             '      - subnet-1234567890abcdef0\n',
    'cluster_name': 'test-api-cluster'}
   {'cloud_formation_stack_status': 'CREATE_IN_PROGRESS',
    'cloudformation_stack_arn': 'arn:aws:cloudformation:us-east-1:123456789012:stack/test-api-cluster/abcd1234-ef56-gh78-ij90-1234abcd5678',
    'cluster_configuration': {'url': 'https://parallelcluster-021345abcdef6789-v1-do-not-delete...},
    'cluster_name': 'test-api-cluster',
    'cluster_status': 'CREATE_IN_PROGRESS',
    'compute_fleet_status': 'UNKNOWN',
    'creation_time': datetime.datetime(2022, 4, 28, 16, 18, 47, 972000, tzinfo=tzlocal()),
    'last_updated_time': datetime.datetime(2022, 4, 28, 16, 18, 47, 972000, tzinfo=tzlocal()),
    'region': 'us-east-1',
    'tags': [{'key': 'parallelcluster:version', 'value': '3.1.3'}],
    'version': '3.1.3'}
           .
           . 
           .
   . . . working . . . {'cloud_formation_stack_status': 'CREATE_COMPLETE',
    'cloudformation_stack_arn': 'arn:aws:cloudformation:us-east-1:123456789012:stack/test-api-cluster/abcd1234-ef56-gh78-ij90-1234abcd5678',
    'cluster_configuration': {'url': 'https://parallelcluster-021345abcdef6789-v1-do-not-delete...},
    'cluster_name': 'test-api-cluster',
    'cluster_status': 'CREATE_COMPLETE',
    'compute_fleet_status': 'RUNNING',
    'creation_time': datetime.datetime(2022, 4, 28, 16, 18, 47, 972000, tzinfo=tzlocal()),
    'head_node': {'instance_id': 'i-abcdef01234567890',
                  'instance_type': 't2.micro',
                  'launch_time': datetime.datetime(2022, 4, 28, 16, 21, 46, tzinfo=tzlocal()),
                  'private_ip_address': '172.31.27.153',
                  'public_ip_address': '52.90.156.51',
                  'state': 'running'},
    'last_updated_time': datetime.datetime(2022, 4, 28, 16, 18, 47, 972000, tzinfo=tzlocal()),
    'region': 'us-east-1',
    'tags': [{'key': 'parallelcluster:version', 'value': '3.1.3'}],
    'version': '3.1.3'}
   READY!
   ```

## Step 5: Copy client code script and delete cluster
<a name="tutorials_06_multi-API-use-step5"></a>

1. Copy the following example client code to `delete_cluster_client.py`. The client code makes a request to delete the cluster.

   ```
   # Copyright 2021 Amazon.com, Inc. or its affiliates. All Rights Reserved.
   # SPDX-License-Identifier: MIT-0
   #
   # Permission is hereby granted, free of charge, to any person obtaining a copy of this
   # software and associated documentation files (the "Software"), to deal in the Software
   # without restriction, including without limitation the rights to use, copy, modify,
   # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to
   # permit persons to whom the Software is furnished to do so.
   #
   # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
   # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
   # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
   # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
   # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
   # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
   #
   #  Author: Evan F. Bollig (Github: bollig)
   
   import time, datetime
   import os
   import pcluster_client
   from pprint import pprint
   from pcluster_client.api import (
       cluster_compute_fleet_api,
       cluster_instances_api,
       cluster_operations_api
   )
   from pcluster_client.model.create_cluster_request_content import CreateClusterRequestContent
   from pcluster_client.model.cluster_status import ClusterStatus
   region=os.environ.get("AWS_DEFAULT_REGION")
   
   # Defining the host is optional and defaults to http://localhost
   # See configuration.py for a list of all supported configuration parameters.
   configuration = pcluster_client.Configuration(
       host = os.environ.get("PCLUSTER_API_URL")
   )
   cluster_name=os.environ.get("CLUSTER_NAME")
   
   # Enter a context with an instance of the API client
   with pcluster_client.ApiClient(configuration) as api_client:
       cluster_ops = cluster_operations_api.ClusterOperationsApi(api_client)
       
       # Delete the cluster
       gone = False
       try:
           api_response = cluster_ops.delete_cluster(cluster_name, region=region)
       except pcluster_client.ApiException as e:
           print("Exception when calling delete_cluster: %s\n" % e)
       time.sleep(60)
       
       # Confirm cluster status with describe_cluster
       while not gone:
           try:
               api_response = cluster_ops.describe_cluster(cluster_name, region=region)
               pprint(api_response)
               if api_response.cluster_status == ClusterStatus('DELETE_IN_PROGRESS'):
                   print('. . . working . . .', end='', flush=True)
                   time.sleep(60)    
           except pcluster_client.ApiException as e:
               gone = True
               print("DELETE COMPLETE or Exception when calling describe_cluster: %s\n" % e)
   ```

1. Run the following command to delete the cluster.

   ```
   $ python3 delete_cluster_client.py
   {'cloud_formation_stack_status': 'DELETE_IN_PROGRESS',
   'cloudformation_stack_arn': 'arn:aws:cloudformation:us-east-1:123456789012:stack/test-api-cluster/abcd1234-ef56-gh78-ij90-1234abcd5678',
   'cluster_configuration': {'url': 'https://parallelcluster-021345abcdef6789-v1-do-not-delete...},
   'cluster_name': 'test-api-cluster',
   'cluster_status': 'DELETE_IN_PROGRESS',
   'compute_fleet_status': 'UNKNOWN',
   'creation_time': datetime.datetime(2022, 4, 28, 16, 50, 47, 943000, tzinfo=tzlocal()),
   'head_node': {'instance_id': 'i-abcdef01234567890',
                 'instance_type': 't2.micro',
                 'launch_time': datetime.datetime(2022, 4, 28, 16, 53, 48, tzinfo=tzlocal()),
                 'private_ip_address': '172.31.17.132',
                 'public_ip_address': '34.201.100.37',
                 'state': 'running'},
   'last_updated_time': datetime.datetime(2022, 4, 28, 16, 50, 47, 943000, tzinfo=tzlocal()),
   'region': 'us-east-1',
   'tags': [{'key': 'parallelcluster:version', 'value': '3.1.3'}],
   'version': '3.1.3'}
          .
          . 
          .
   . . . working . . . {'cloud_formation_stack_status': 'DELETE_IN_PROGRESS',
   'cloudformation_stack_arn': 'arn:aws:cloudformation:us-east-1:123456789012:stack/test-api-cluster/abcd1234-ef56-gh78-ij90-1234abcd5678',
   'cluster_configuration': {'url': 'https://parallelcluster-021345abcdef6789-v1-do-not-delete...},
   'cluster_name': 'test-api-cluster',
   'cluster_status': 'DELETE_IN_PROGRESS',
   'compute_fleet_status': 'UNKNOWN',
   'creation_time': datetime.datetime(2022, 4, 28, 16, 50, 47, 943000, tzinfo=tzlocal()),
   'last_updated_time': datetime.datetime(2022, 4, 28, 16, 50, 47, 943000, tzinfo=tzlocal()),
   'region': 'us-east-1',
   'tags': [{'key': 'parallelcluster:version', 'value': '3.1.3'}],
   'version': '3.1.3'}
   . . . working . . . DELETE COMPLETE or Exception when calling describe_cluster: (404)
   Reason: Not Found
    	      .
    	      .
    	      .
   HTTP response body: {"message":"Cluster 'test-api-cluster' does not exist or belongs to an incompatible ParallelCluster major version."}
   ```

1. After you are finished testing, unset the environment variables.

   ```
   $ unset AWS_SESSION_TOKEN
   unset AWS_SECRET_ACCESS_KEY
   unset AWS_ACCESS_KEY_ID
   ```

## Step 6: Clean up
<a name="tutorials_06_multi-API-use-step6"></a>

You can use the AWS Management Console or AWS CLI to delete your API.

1. From the CloudFormation console, choose the API stack and then choose **Delete**.

1. Run the following command if using the AWS CLI.

   Using CloudFormation.

   ```
   $ aws cloudformation delete-stack --stack-name ${API_STACK_NAME}
   ```

# Creating a cluster with Slurm accounting
<a name="tutorials_07_slurm-accounting-v3"></a>

Learn how to configure and create a cluster with Slurm accounting. For more information, see [Slurm accounting with AWS ParallelCluster](slurm-accounting-v3.md).

When using the AWS ParallelCluster command line interface (CLI) or API, you only pay for the AWS resources that are created when you create or update AWS ParallelCluster images and clusters. For more information, see [AWS services used by AWS ParallelCluster](aws-services-v3.md).

In this tutorial, you use a [CloudFormation quick-create template (us-east-1)](https://us-east-1.console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/create/review?stackName=pcluster-slurm-db&templateURL=https://us-east-1-aws-parallelcluster.s3.amazonaws.com/templates/1-click/serverless-database.yaml) to create an [Amazon Aurora](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/CHAP_AuroraOverview.html) for MySQL serverless database. The template instructs CloudFormation to create all the necessary components to deploy an Amazon Aurora serverless database on the same VPC as the cluster. The template also creates a basic networking and security configuration for the connection between the cluster and the database.

**Note**  
Starting with version 3.3.0, AWS ParallelCluster supports Slurm accounting with the cluster configuration parameter [SlurmSettings](Scheduling-v3.md#Scheduling-v3-SlurmSettings) / [Database](Scheduling-v3.md#Scheduling-v3-SlurmSettings-Database).

**Note**  
The quick-create template serves as an example. This template doesn't cover all possible use cases for a Slurm accounting database server. It's your responsibility to create a database server with the configuration and capacity appropriate for your production workloads.

**Prerequisites:**
+ AWS ParallelCluster [is installed](install-v3-parallelcluster.md).
+ The AWS CLI [is installed and configured.](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)
+ You have an [Amazon EC2 key pair](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html).
+ You have an IAM role with the [permissions](iam-roles-in-parallelcluster-v3.md#iam-roles-in-parallelcluster-v3-example-user-policies) that are required to run the [`pcluster`](pcluster-v3.md) CLI.
+ The region that you deploy the quick-create template in supports Amazon Aurora MySQL serverless v2. For more information, see [Aurora Serverless v2 with Aurora MySQL](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Concepts.Aurora_Fea_Regions_DB-eng.Feature.ServerlessV2.html#Concepts.Aurora_Fea_Regions_DB-eng.Feature.ServerlessV2.amy).

## Step 1: Create the VPC and subnets for AWS ParallelCluster
<a name="slurm-accounting-vpc-v3"></a>

To use the provided CloudFormation template for the Slurm accounting database, you must have the VPC for the cluster ready. You can do this manually or as part of the [Configure and create a cluster with the AWS ParallelCluster command line interface](install-v3-configuring.md) procedure. If you already used AWS ParallelCluster, you might have a VPC ready for the deployment of the cluster and the database server.

## Step 2: Create the database stack
<a name="slurm-accounting-db-stack-v3"></a>

Use the [CloudFormation quick-create template(us-east-1)](https://us-east-1.console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/create/review?stackName=pcluster-slurm-db&templateURL=https://us-east-1-aws-parallelcluster.s3.amazonaws.com/templates/1-click/serverless-database.yaml) to create a database stack for Slurm accounting. The template requires following inputs:
+ Database server credentials, specifically the admin user name and password.
+ Sizing of the Amazon Aurora serverless cluster. This depends on the expected cluster loading.
+ Networking parameters, specifically the target VPC and subnets or CIDR blocks for the creation of the subnets.

Select appropriate credentials and size for your database server. For the networking options, you're required to use the same VPC that the AWS ParallelCluster cluster is deployed to. You can create the subnets for the database and pass them as input to the template. Or, provide two disjoint CIDR blocks for the two subnets and let the CloudFormation template create the two subnets for CIDR blocks. Make sure that the CIDR blocks don't overlap with existing subnets. If the CIDR blocks overlap with existing subnets, the stack fails to be created.

The database server takes several minutes to be created.

## Step 3: Create a cluster with Slurm accounting enabled
<a name="slurm-accounting-create-cluster-v3"></a>

The provided CloudFormation template generates a CloudFormation stack with some defined outputs. From the AWS Management Console, you can view the outputs in the **Outputs** tab in the CloudFormation stack view. To enable the Slurm accounting, some of these outputs must be used in the AWS ParallelCluster cluster configuration file:
+ `DatabaseHost`: Used for the [`SlurmSettings`](Scheduling-v3.md#Scheduling-v3-SlurmSettings) / [`Database`](Scheduling-v3.md#Scheduling-v3-SlurmSettings-Database) / [`Uri`](Scheduling-v3.md#yaml-Scheduling-SlurmSettings-Database-Uri) cluster config parameter.
+ `DatabaseAdminUser`: Used for the [`SlurmSettings`](Scheduling-v3.md#Scheduling-v3-SlurmSettings) / [`Database`](Scheduling-v3.md#Scheduling-v3-SlurmSettings-Database) / [`UserName`](Scheduling-v3.md#yaml-Scheduling-SlurmSettings-Database-UserName) cluster configuration parameter value.
+ `DatabaseSecretArn`: Used for the [`SlurmSettings`](Scheduling-v3.md#Scheduling-v3-SlurmSettings) / [`Database`](Scheduling-v3.md#Scheduling-v3-SlurmSettings-Database) / [`PasswordSecretArn`](Scheduling-v3.md#yaml-Scheduling-SlurmSettings-Database-PasswordSecretArn) cluster config parameter.
+ `DatabaseClientSecurityGroup`: This is the security group that's attached to the head node of the cluster that's defined in the [`HeadNode`](HeadNode-v3.md) / [`Networking`](HeadNode-v3.md#HeadNode-v3-Networking) / [`SecurityGroups`](HeadNode-v3.md#yaml-HeadNode-Networking-SecurityGroups) configuration parameter.

Update your cluster configuration file `Database` parameters with the output values. Use the [`pcluster`](pcluster-v3.md) CLI to create the cluster.

```
$ pcluster create-cluster -n cluster-3.x -c path/to/cluster-config.yaml
```

After the cluster is created, you can start using Slurm accounting commands such as `sacctmgr` or `sacct`.

# Creating a cluster with an external Slurmdbd accounting
<a name="external-slurmdb-accounting"></a>

Learn how to conﬁgure and create a cluster with external Slurmdbd accounting. For more information, see [Slurm accounting with AWS ParallelCluster](slurm-accounting-v3.md).

When using the AWS ParallelCluster command line interface (CLI) or API, you only pay for the AWS resources that are created when you create or update AWS ParallelCluster images and clusters. For more information, see [AWS services used by AWS ParallelCluster](aws-services-v3.md).

The AWS ParallelCluster UI is built on a serverless architecture and you can use it within the AWS Free Tier category for most cases. For more information, see [AWS ParallelCluster UI costs](install-pcui-costs-v3.md).

In this tutorial, you use a AWS CloudFormation quick-create template to create the necessary components to deploy a Slurmdbd instance on the same VPC as the cluster. The template creates a basic networking and security conﬁguration for the connection between the cluster and the database.

**Note**  
Starting with `version 3.10.0`, AWS ParallelCluster supports external Slurmdbd with the cluster conﬁguration parameter `SlurmSettings / ExternelSlurmdbd`.

**Note**  
The quick-create template serves as an example. This template doesn't cover all possible use cases. It's your responsibility to create an external Slurmdbd with the conﬁguration and capacity appropriate for your production workloads.

**Prerequisites:**
+ AWS ParallelCluster [is installed](install-v3-parallelcluster.md).
+ The AWS CLI [is installed and configured.](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)
+ You have an [Amazon Elastic Compute Cloud key pair](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html).
+ You have an AWS Identity and Access Management role with the [permissions](iam-roles-in-parallelcluster-v3.md#iam-roles-in-parallelcluster-v3-example-user-policies) that are required to run the [`pcluster`](pcluster-v3.md) CLI.
+ You have a Slurm accounting database. To step through a tutorial of creating Slurm accounting database, follow steps 1 and 2 in [Create the Slurm Accounting Database stack](tutorials_07_slurm-accounting-v3.md).

## Step 1: Create the Slurmdbd stack
<a name="external-slurmdb-accounting-step1"></a>

In this tutorial, use a [CloudFormation quick-create template (`us-east-1`)](https://us-east-1.console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/create/review?stackName=pcluster-slurm-dbd&templateURL=https://us-east-1-aws-parallelcluster.s3.amazonaws.com/templates/1-click/external-slurmdbd.json) to create a Slurmdbd stack. The template requires following inputs:

**Networking**
+ **VPCId**: The VPC ID to launch the Slurmdbd instance.
+ **SubnetId**: The Subnet ID to launch the Slurmdbd instance.
+ **PrivatePrefix**: The CIDR prefix of the VPC.
+ **PrivateIp**: A secondary private IP to assign to the Slurmdbd instance.

**Database connection**
+ **DBMSClientSG**: The security group to be attach to the Slurmdbd instance. This security group should allows connections between the database server and the Slurmdbd instance.
+ **DBMSDatabaseName**: The name of the database.
+ **DBMSUsername**: The username to the database.
+ **DBMSPasswordSecretArn**: The secret containing the password to the database.
+ **DBMSUri**: The URI of the database server.

**Instance settings**
+ **InstanceType**: An instance type to use for the slurmdbd instance.
+ **KeyName**: An Amazon EC2 key pair to use for the slurmdbd instance.

**Slurmdbd settings**
+ **AMIID**: An AMI of the Slurmdbd instance. The AMI should be a ParallelCluster AMI. The version of the ParallelCluster AMI determines the version of Slurmdbd.
+ **MungeKeySecretArn**: The secret containing the munge key to use for authenticating communications between Slurmdbd and clusters.
+ **SlurmdbdPort**: A port number that the slurmdbd uses. 
+ **EnableSlurmdbdSystemService**: Enables slurmdbd as system service and have it run when an instance launches.

**Warning**  
If the database was created by a different version of SlurmDB, do not use Slurmdbd as a system service.  
If the database contains a large number of entries, the Slurm Database Daemon (SlurmDBD) may require tens of minutes to update the database and be unresponsive during this time interval.   
Before upgrading SlurmDB, make a backup of the database. For more information, see the [Slurm documentation](https://slurm.schedmd.com/quickstart_admin.html#upgrade).

## Step 2: Create a cluster with external Slurmdbd enabled
<a name="external-slurmdb-accounting-step2"></a>

The provided CloudFormation template generates a CloudFormation stack with some deﬁned outputs. 

From the AWS Management Console, view the **Outputs** tab in the CloudFormation stack to review the entities created. To enable the Slurm accounting, some of these outputs must be used in the AWS ParallelCluster conﬁguration ﬁle:
+ **SlurmdbdPrivateIp**: Used for the [SlurmSettings](Scheduling-v3.md#Scheduling-v3-SlurmSettings) / [ExternalSlurmdbd](Scheduling-v3.md#Scheduling-v3-SlurmSettings-ExternalSlurmdbd) / [Host cluster conﬁg](Scheduling-v3.md#yaml-Scheduling-SlurmSettings-ExternalSlurmdbd-Host) parameter.
+ **SlurmdbdPort**: Used for the [ SlurmSettings](Scheduling-v3.md#Scheduling-v3-SlurmSettings) / [ ExternalSlurmdbd](Scheduling-v3.md#Scheduling-v3-SlurmSettings-ExternalSlurmdbd) / [ Port](Scheduling-v3.md#yaml-Scheduling-SlurmSettings-ExternalSlurmdbd-Port) cluster conﬁguration parameter value.
+ **AccountingClientSecurityGroup**: This is the security group that's attached to the head node of the cluster that's deﬁned in the [HeadNode](HeadNode-v3.md) / [Networking](HeadNode-v3.md#HeadNode-v3-Networking) / [AdditionalSecurityGroups](HeadNode-v3.md#yaml-HeadNode-Networking-AdditionalSecurityGroups) conﬁguration parameter.

Additional, from the **Parameters** tab in the CloudFormation stack view:
+ **MungeKeySecretArn**: Used for the [SlurmSettings](Scheduling-v3.md#Scheduling-v3-SlurmSettings) / [MungeKeySecretArn](Scheduling-v3.md#yaml-Scheduling-SlurmSettings-MungeKeySecretArn) cluster conﬁguration parameter value.

Update your cluster conﬁguration ﬁle database parameters with the output values. Use the pcluster AWS CLI to create the cluster.

```
$  pcluster create-cluster -n cluster-3.x-c path/to/cluster-config.yaml
```

After the cluster is created, you can start using Slurm accounting commands such as `sacctmgr` or `sacct`.

**Warning**  
Traffic between `ParallelCluster` and the external SlurmDB is not encrypted. It is recommended to run the cluster and the external SlurmDB in a trusted network.





# Reverting to a previous AWS Systems Manager document version
<a name="tutorials_08_ssm-document-version-rev-v3"></a>

Learn how to revert to a previous AWS Systems Manager document version. For more information about SSM documents, see [AWS Systems Manager Documents](https://docs.aws.amazon.com/systems-manager/latest/userguide/sysman-ssm-docs.html) in the *AWS Systems Manager User Guide*.

When using the AWS ParallelCluster command line interface (CLI) or API, you only pay for the AWS resources that are created when you create or update AWS ParallelCluster images and clusters. For more information, see [AWS services used by AWS ParallelCluster](aws-services-v3.md).

**Prerequisites:**
+ An AWS account with permissions to manage SSM documents.
+ The AWS CLI [is installed and configured.](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)

## Revert to a previous SSM document version
<a name="tutorials_08_ssm-document-version-rev-steps"></a>

1. In your terminal, run the following command to get the list of existing SSM documents that you own.

   ```
   $ aws ssm list-documents --document-filter "key=Owner,value=Self"
   ```

1. Revert an SSM document to a previous version. In this example, we revert to a previous version of the `SessionManagerRunShell` document. You can use the SSM `SessionManagerRunShell` document to customize every SSM shell session that you initiate.

   1. Find the `DocumentVersion` parameter for `SessionManagerRunShell` by running the following command:

      ```
      $ aws ssm describe-document --name "SSM-SessionManagerRunShell"
      {
          "Document": {
              "Hash": "...",
              "HashType": "Sha256",
              "Name": "SSM-SessionManagerRunShell",
              "Owner": "123456789012",
              "CreatedDate": "2023-02-20T19:04:32.390000+00:00",
              "Status": "Active",
              "DocumentVersion": "1",
              "Parameters": [
                  {
                      "Name": "linuxcmd",
                      "Type": "String",
                      "Description": "The command to run on connection...",
                      "DefaultValue": "if [ -d '/opt/parallelcluster' ]; then source /opt/parallelcluster/cfnconfig; sudo su - $cfn_cluster_user; fi; /bin/bash"
                  }
              ],
              "PlatformTypes": [
                  "Windows",
                  "Linux",
                  "MacOS"
              ],
              "DocumentType": "Session",
              "SchemaVersion": "1.0",
              "LatestVersion": "2",
              "DefaultVersion": "1",
              "DocumentFormat": "JSON",
              "Tags": []
          }
      }
      ```

      The latest version is `2`.

   1. Revert to the previous version by running the following command:

      ```
      $ aws ssm delete-document --name "SSM-SessionManagerRunShell" --document-version 2
      ```

1. Verify that the document version has been reverted by running the `describe-document` command again:

   ```
   $ aws ssm describe-document --name "SSM-SessionManagerRunShell"
   {
       "Document": {
           "Hash": "...",
           "HashType": "Sha256",
           "Name": "SSM-SessionManagerRunShell",
           "Owner": "123456789012",
           "CreatedDate": "2023-02-20T19:04:32.390000+00:00",
           "Status": "Active",
           "DocumentVersion": "1",
           "Parameters": [
               {
                   "Name": "linuxcmd",
                   "Type": "String",
                   "Description": "The command to run on connection...",
                   "DefaultValue": "if [ -d '/opt/parallelcluster' ]; then source /opt/parallelcluster/cfnconfig; sudo su - $cfn_cluster_user; fi; /bin/bash"
               }
           ],
           "PlatformTypes": [
               "Windows",
               "Linux",
               "MacOS"
           ],
           "DocumentType": "Session",
           "SchemaVersion": "1.0",
           "LatestVersion": "1",
           "DefaultVersion": "1",
           "DocumentFormat": "JSON",
           "Tags": []
       }
   }
   ```

   The latest version is `1`.

# Creating a cluster with CloudFormation
<a name="tutorials_09_cfn-custom-resource-v3"></a>

Learn how to create a cluster with an AWS ParallelCluster CloudFormation custom resource. For more information, see [AWS CloudFormation custom resource](cloudformation-v3.md).

When using AWS ParallelCluster, you only pay for the AWS resources that are created when you create or update AWS ParallelCluster images and clusters. For more information, see [AWS services used by AWS ParallelCluster](aws-services-v3.md).

**Prerequisites:**
+ The AWS CLI [is installed and configured.](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)
+ An [Amazon EC2 key pair](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html).
+ An IAM role with the [permissions](iam-roles-in-parallelcluster-v3.md#iam-roles-in-parallelcluster-v3-example-user-policies) that are required to run the [`pcluster`](pcluster-v3.md) CLI.

## Cluster creation with a CloudFormation quick-create stack
<a name="cfn-custom-resource-quick-v3"></a>

In this tutorial, you use a quick-create stack to deploy a CloudFormation template that creates a cluster and the following AWS resources:
+ A root CloudFormation stack created by using a CloudFormation quick-create stack.
+ Nested CloudFormation stacks that include default policies, default VPC set up, and a custom resource provider.
+ An example AWS ParallelCluster cluster stack and a cluster that you can log in to and run jobs.

**Create a cluster with AWS CloudFormation**

1. Sign in to the AWS Management Console.

1. Open the CloudFormation [quick-create link](https://us-east-1.console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/create/review?stackName=mycluster&templateURL=https://us-east-1-aws-parallelcluster.s3.amazonaws.com/parallelcluster/3.15.0/templates/1-click/cluster-example.yaml) to create the following resources in the CloudFormation console:
   + A nested CloudFormation stack with a VPC with a public and private subnet for running the cluster head node and compute nodes, respectively.
   + A nested CloudFormation stack with an AWS ParallelCluster custom resource for managing the cluster.
   + A nested CloudFormation stack with the default policies for managing the cluster.
   + A root CloudFormation stack for the nested stacks.
   + An AWS ParallelCluster cluster with the Slurm scheduler and a defined number of compute nodes.  
![\[The console CloudFormation quick-create user interface.\]](http://docs.aws.amazon.com/parallelcluster/latest/ug/images/cfn-quick-create.png)

1. In the **Quick create stack** **Parameters** section, enter values for the following parameters:

   1. For **KeyName**, enter the name of your Amazon EC2 key pair.

   1. For **AvailabilityZone**, choose an AZ for your cluster nodes, for example, `us-east-1a`.

1. Check the boxes to acknowledge each of the access capabilities at the bottom of the page.

1. Choose **Create stack**.

1. Wait for the CloudFormation stack to reach the `CREATE_COMPLETE` state.

## Cluster creation with the CloudFormation Command Line Interface (CLI)
<a name="cfn-custom-resource-cli-v3"></a>

In this tutorial, you use the AWS Command Line Interface (CLI) for CloudFormation to deploy a CloudFormation template that creates a cluster.

**Create the following AWS resources:**
+ A root CloudFormation stack created by using a CloudFormation quick-create stack.
+ Nested CloudFormation stacks that include default policies, default VPC setup, and a custom resource provider.
+ An example AWS ParallelCluster cluster stack and a cluster that you can log in to and run jobs.

Replace *inputs highlighted in red*, such as *keypair*, with your own values.

**Create a cluster with AWS CloudFormation**

1. Create a CloudFormation template named `cluster_template.yaml` with the following content:

   ```
   AWSTemplateFormatVersion: '2010-09-09'
   Description: >
     AWSParallelCluster CloudFormation Template
   
   Parameters:
     KeyName:
       Description: KeyPair to login to the head node
       Type: AWS::EC2::KeyPair::KeyName
   
     AvailabilityZone:
       Description: Availability zone where instances will be launched
       Type: AWS::EC2::AvailabilityZone::Name
       Default: us-east-2a
   
   Mappings:
     ParallelCluster:
       Constants:
         Version: 3.15.0
   
   Resources:
     PclusterClusterProvider:
       Type: AWS::CloudFormation::Stack
       Properties:
         TemplateURL: !Sub
           - https://${AWS::Region}-aws-parallelcluster.s3.${AWS::Region}.${AWS::URLSuffix}/parallelcluster/${Version}/templates/custom_resource/cluster.yaml
           - { Version: !FindInMap [ParallelCluster, Constants, Version] }
   
     PclusterVpc:
       Type: AWS::CloudFormation::Stack
       Properties:
         Parameters:
           PublicCIDR: 10.0.0.0/24
           PrivateCIDR: 10.0.16.0/20
           AvailabilityZone: !Ref AvailabilityZone
         TemplateURL: !Sub
           - https://${AWS::Region}-aws-parallelcluster.s3.${AWS::Region}.${AWS::URLSuffix}/parallelcluster/${Version}/templates/networking/public-private-${Version}.cfn.json
           - { Version: !FindInMap [ParallelCluster, Constants, Version ] }
   
     PclusterCluster:
       Type: Custom::PclusterCluster
       Properties:
         ServiceToken: !GetAtt [ PclusterClusterProvider , Outputs.ServiceToken ]
         ClusterName: !Sub 'c-${AWS::StackName}'
         ClusterConfiguration:
           Image:
             Os: alinux2
           HeadNode:
             InstanceType: t2.medium
             Networking:
               SubnetId: !GetAtt [ PclusterVpc , Outputs.PublicSubnetId ]
             Ssh:
               KeyName: !Ref KeyName
           Scheduling:
             Scheduler: slurm
             SlurmQueues:
             - Name: queue0
               ComputeResources:
               - Name: queue0-cr0
                 InstanceType: t2.micro
               Networking:
                 SubnetIds:
                 -  !GetAtt [ PclusterVpc , Outputs.PrivateSubnetId ]
   Outputs:
     HeadNodeIp:
       Description: The Public IP address of the HeadNode
       Value: !GetAtt [ PclusterCluster, headNode.publicIpAddress ]
   ```

1. Run the following AWS CLI command to deploy the CloudFormation stack for cluster creation and management.

   ```
   $ aws cloudformation deploy --template-file ./cluster_template.yaml \
     --stack-name mycluster \
     --parameter-overrides KeyName=keypair \
                           AvailabilityZone=us-east-2b \
     --capabilities CAPABILITY_NAMED_IAM CAPABILITY_AUTO_EXPAND
   ```

## View CloudFormation cluster output
<a name="cfn-custom-resource-view-v3"></a>

View the CloudFormation cluster output to obtain useful cluster details. The added `ValidationMessages` property provides access to validation messages from cluster create and update operations.

1. Navigate to the [CloudFormation console](https://console.aws.amazon.com/cloudformation/home) and select the stack that includes your AWS ParallelCluster custom resource.

1. Choose **Stack details**, and select the **Outputs** tab.  
![\[The console CloudFormation outputs table showing values for HeadNodeIp and ValidationMessages.\]](http://docs.aws.amazon.com/parallelcluster/latest/ug/images/cfn-outputs.png)

   Validation messages might be truncated. For more information about how to retrieve logs, see [AWS ParallelCluster troubleshooting](troubleshooting-v3.md).

## Access your cluster
<a name="cfn-custom-resource-access-v3"></a>

Access the cluster.

**`ssh` into the cluster head node**

1. After the CloudFormation stack deployment is complete, obtain the IP address of the head node with the following command:

   ```
   $ HEAD_NODE_IP=$(aws cloudformation describe-stacks --stack-name=mycluster --query "Stacks|[0].Outputs[?OutputKey=='HeadNodeIp']|[0].OutputValue" --output=text)
   ```

   You can also retrieve the head node IP address from **HeadNodeIp** parameter in the cluster stack **Outputs** tab in the CloudFormation console.

   You can find the head node IP address here because it was added in the `Outputs` section of the cluster CloudFormation template, specifically for this example cluster.

1. Connect to the cluster head node by running the following command:

   ```
   $ ssh -i keyname.pem ec2-user@$HEAD_NODE_IP
   ```

## Clean up
<a name="cfn-custom-resource-cleanup-v3"></a>

Delete the cluster.

1. Run the following AWS CLI command to delete the CloudFormation stack and cluster.

   ```
   $ aws cloudformation delete-stack --stack-name=mycluster
   ```

1. Check the stack delete status by running the following command.

   ```
   $ aws cloudformation describe-stacks --stack-name=mycluster
   ```

# Deploy ParallelCluster API with Terraform
<a name="tutorial-deploy-terraform"></a>

In this tutorial, you will define a simple Terraform project to deploy a ParallelCluster API. 

**Prerequisites**
+ Terraform v1.5.7\$1 is installed.
+ IAM role with the permissions to deploy the ParallelCluster API. See [Required permissions](tutorial-deploy-terraform-permissions.md).

# Define a Terraform project
<a name="tutorial-deploy-terraform-define"></a>

In this tutorial, you will define a Terraform project.

1. Create a directory called `my-pcluster-api`.

   All files that you create will be within this directory.

1. Create the file `provider.tf` to configure the AWS provider.

   ```
   provider "aws" {
     region  = var.region
     profile = var.profile
   }
   ```

1. Create the file `main.tf` to define the resources using the ParallelCluster module.

   ```
   module "parallelcluster_pcluster_api" {
     source = "aws-tf/parallelcluster/aws//modules/pcluster_api"
     version = "1.1.0"
   
     region                = var.region
     api_stack_name        = var.api_stack_name
     api_version           = var.api_version
   
     parameters = {
       EnableIamAdminAccess = "true"
     }
   }
   ```

1. Create the file `variables.tf` to define the variables that can be injected for this project.

   ```
   variable "region" {
     description = "The region the ParallelCluster API is deployed in."
     type        = string
     default     = "us-east-1"
   }
   
   variable "profile" {
     type        = string
     description = "The AWS profile used to deploy the clusters."
     default     = null
   }
   
   variable "api_stack_name" {
     type        = string
     description = "The name of the CloudFormation stack used to deploy the ParallelCluster API."
     default     = "ParallelCluster"
   }
   
   variable "api_version" {
     type        = string
     description = "The version of the ParallelCluster API."
   }
   ```

1. Create the file `terraform.tfvars` to set arbitrary values for the variables. 

   The file below deploys a ParallelCluster API 3.11.1 in `us-east-1` using the stack name `MyParallelClusterAPI-3111`. You'll be able to reference this ParallelCluster API deployment using its stack name. 
**Note**  
The `api_version` assignment in the following code can be replaced with any supported AWS ParallelCluster version. 

   ```
   region = "us-east-1"
   api_stack_name = "MyParallelClusterAPI-3111"
   api_version = "3.11.1"
   ```

1. Create the file `outputs.tf` to define the outputs returned by this project.

   ```
   output "pcluster_api_stack_outputs" {
     value = module.parallelcluster_pcluster_api.stack_outputs
   }
   ```

   The project directory is:

   ```
   my-pcluster-api
   ├── main.tf - Terraform entrypoint to define the resources using the ParallelCluster module.
   ├── outputs.tf - Defines the outputs returned by Terraform.
   ├── providers.tf - Configures the AWS provider.
   ├── terraform.tfvars - Set the arbitrary values for the variables, i.e. region, PCAPI version, PCAPI stack name
   └── variables.tf - Defines the variables, e.g. region, PCAPI version, PCAPI stack name.
   ```

# Deploy the API
<a name="tutorial-deploy-terraform-deploy-api"></a>

To deploy the API, run the standard Terraform commands in order.

1. Build the project:

   ```
   terraform init
   ```

1. Define the deployment plan:

   ```
   terraform plan -out tfplan
   ```

1. Deploy the plan:

   ```
   terraform apply tfplan
   ```

# Required permissions
<a name="tutorial-deploy-terraform-permissions"></a>

You need the following permissions to deploy the ParallelCluster API with Terraform:

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Action": [
                "cloudformation:DescribeStacks",
                "cloudformation:GetTemplate"
            ],
            "Resource": "arn:aws:cloudformation:us-east-1:111122223333:stack/*",
            "Effect": "Allow",
            "Sid": "CloudFormationRead"
        },
        {
            "Action": [
                "cloudformation:CreateStack",
                "cloudformation:DeleteStack",
                "cloudformation:CreateChangeSet"
            ],
            "Resource": "arn:aws:cloudformation:us-east-1:111122223333:stack/MyParallelClusterAPI*",
            "Effect": "Allow",
            "Sid": "CloudFormationWrite"
        },
        {
            "Action": [
                "cloudformation:CreateChangeSet"
            ],
            "Resource": [
                "arn:aws:cloudformation:us-east-1:111122223333:aws:transform/Include",
                "arn:aws:cloudformation:us-east-1:111122223333:aws:transform/Serverless-2016-10-31"
            ],
            "Effect": "Allow",
            "Sid": "CloudFormationTransformWrite"
        },
        {
            "Action": [
                "s3:GetObject"
            ],
            "Resource": [
                "arn:aws:s3:us-east-1:111122223333:*-aws-parallelcluster/parallelcluster/*/api/ParallelCluster.openapi.yaml",
                "arn:aws:s3:us-east-1:111122223333:*-aws-parallelcluster/parallelcluster/*/layers/aws-parallelcluster/lambda-layer.zip"
            ],
            "Effect": "Allow",
            "Sid": "S3ParallelClusterArtifacts"
        },
        {
            "Action": [
                "iam:CreateRole",
                "iam:DeleteRole",
                "iam:GetRole",
                "iam:CreatePolicy",
                "iam:DeletePolicy",
                "iam:GetPolicy",
                "iam:GetRolePolicy",
                "iam:AttachRolePolicy",
                "iam:DetachRolePolicy",
                "iam:PutRolePolicy",
                "iam:DeleteRolePolicy",
                "iam:ListPolicyVersions"
            ],
            "Resource": [
                "arn:aws:iam::111122223333:role/*",
                "arn:aws:iam::111122223333:policy/*"
            ],
            "Effect": "Allow",
            "Sid": "IAM"
        },
        {
            "Action": [
                "iam:PassRole"
            ],
            "Resource": [
                "arn:aws:iam::111122223333:role/ParallelClusterLambdaRole-*",
                "arn:aws:iam::111122223333:role/APIGatewayExecutionRole-*"
            ],
            "Effect": "Allow",
            "Sid": "IAMPassRole"
        },
        {
            "Action": [
                "lambda:CreateFunction",
                "lambda:DeleteFunction",
                "lambda:GetFunction",
                "lambda:PublishLayerVersion",
                "lambda:DeleteLayerVersion",
                "lambda:GetLayerVersion",
                "lambda:TagResource",
                "lambda:UntagResource"
            ],
            "Resource": [
                "arn:aws:lambda:us-east-1:111122223333:layer:PCLayer-*",
                "arn:aws:lambda:us-east-1:111122223333:function:*-ParallelClusterFunction-*"
            ],
            "Effect": "Allow",
            "Sid": "Lambda"
        },
        {
            "Action": [
                "logs:CreateLogGroup",
                "logs:DeleteLogGroup",
                "logs:DescribeLogGroups",
                "logs:PutRetentionPolicy",
                "logs:TagLogGroup",
                "logs:UntagLogGroup"
            ],
            "Resource": [
                "arn:aws:logs:us-east-1:111122223333:log-group:/aws/lambda/*-ParallelClusterFunction-*"
            ],
            "Effect": "Allow",
            "Sid": "Logs"
        },
        {
            "Action": [
                "apigateway:DELETE",
                "apigateway:GET",
                "apigateway:PATCH",
                "apigateway:POST",
                "apigateway:PUT",
                "apigateway:UpdateRestApiPolicy"
            ],
            "Resource": [
                "arn:aws:apigateway:us-east-1::/restapis",
                "arn:aws:apigateway:us-east-1::/restapis/*",
                "arn:aws:apigateway:us-east-1::/tags/*"
            ],
            "Effect": "Allow",
            "Sid": "APIGateway"
        }
    ]
}
```

------

# Creating a cluster with Terraform
<a name="tutorial-create-cluster-terraform"></a>

When using AWS ParallelCluster, you only pay for the AWS resources that are created when you create or update AWS ParallelCluster images and clusters. For more information, see [AWS services used by AWS ParallelCluster](aws-services-v3.md).

**Prerequisites**
+ Terraform v1.5.7\$1 is installed. 
+ [AWS ParallelCluster API](api-reference-v3.md) v3.8.0\$1 is deployed in your account. See [Deploy ParallelCluster API with Terraform](tutorial-deploy-terraform.md). 
+ IAM role with the permissions to invoke the ParallelCluster API. See [Required permissions]

# Define a Terraform project
<a name="tutorial-create-cluster-terraform-define"></a>

In this tutorial, you will define a simple Terraform project to deploy a cluster.

1. Create a directory called `my-clusters`. 

   All files that you create will be within this directory.

1. Create the file `terraform.tf` to import the ParallelCluster provider.

   ```
   terraform {
     required_version = ">= 1.5.7"
     required_providers {
       aws-parallelcluster = {
         source  = "aws-tf/aws-parallelcluster"
         version = "~> 1.0"
       }
     }
   }
   ```

1. Create the file `providers.tf` to configure the ParallelCluster and AWS providers.

   ```
   provider "aws" {
     region  = var.region
     profile = var.profile
   }
   
   provider "aws-parallelcluster" {
     region         = var.region
     profile        = var.profile
     api_stack_name = var.api_stack_name
     use_user_role  = true
   }
   ```

1. Create the file `main.tf` to define the resources using the ParallelCluster module.

   ```
   module "pcluster" {
     source  = "aws-tf/parallelcluster/aws"
     version = "1.1.0"
   
     region                = var.region
     api_stack_name        = var.api_stack_name
     api_version           = var.api_version
     deploy_pcluster_api   = false
   
     template_vars         = local.config_vars
     cluster_configs       = local.cluster_configs
     config_path           = "config/clusters.yaml"
   }
   ```

1. Create the file `clusters.tf` to define multiple clusters as Terraform local variables. 
**Note**  
You can define multiple clusters within the `cluster_config` element. For every cluster, you can explicitly define the cluster properties within the local variables (see `DemoCluster01`) or reference an external file (see `DemoCluster02`).

   To review the cluster properties that you can set within the configuration element, see [Cluster configuration file](cluster-configuration-file-v3.md).

   To review the options that you can set for cluster creation, see [`pcluster create-cluster`](pcluster.create-cluster-v3.md).

   ```
   locals {
     cluster_configs = {
       DemoCluster01 : {
         region : local.config_vars.region
         rollbackOnFailure : false
         validationFailureLevel : "WARNING"
         suppressValidators : [
           "type:KeyPairValidator"
         ]
         configuration : {
           Region : local.config_vars.region
           Image : {
             Os : "alinux2"
           }
           HeadNode : {
             InstanceType : "t3.small"
             Networking : {
               SubnetId : local.config_vars.subnet
             }
             Iam : {
               AdditionalIamPolicies : [
                 { Policy : "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore" }
               ]
             }
           }
           Scheduling : {
             Scheduler : "slurm"
             SlurmQueues : [{
               Name : "queue1"
               CapacityType : "ONDEMAND"
               Networking : {
                 SubnetIds : [local.config_vars.subnet]
               }
               Iam : {
                 AdditionalIamPolicies : [
                   { Policy : "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore" }
                 ]
               }
               ComputeResources : [{
                 Name : "compute"
                 InstanceType : "t3.small"
                 MinCount : "1"
                 MaxCount : "4"
               }]
             }]
             SlurmSettings : {
               QueueUpdateStrategy : "TERMINATE"
             }
           }
         }
       }
       DemoCluster02 : {
         configuration : "config/cluster_config.yaml"
       }
     }
   }
   ```

1. Create the file `config/clusters.yaml` to define multiple clusters as YAML configuration.

   ```
   DemoCluster03:
     region: ${region}
     rollbackOnFailure: true
     validationFailureLevel: WARNING
     suppressValidators:
       - type:KeyPairValidator
     configuration: config/cluster_config.yaml
   DemoCluster04:
     region: ${region}
     rollbackOnFailure: false
     configuration: config/cluster_config.yaml
   ```

1. Create the file `config/cluster_config.yaml`, which is a standard ParallelCluster config file where Terraform variables can be injected.

   To review the cluster properties that you can set within the configuration element, see [Cluster configuration file](cluster-configuration-file-v3.md).

   ```
   Region: ${region}
   Image:
    Os: alinux2
   HeadNode:
    InstanceType: t3.small
    Networking:
      SubnetId: ${subnet}
    Iam:
      AdditionalIamPolicies:
        - Policy: arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
   Scheduling:
    Scheduler: slurm
    SlurmQueues:
      - Name: queue1
        CapacityType: ONDEMAND
        Networking:
          SubnetIds:
            - ${subnet}
        Iam:
          AdditionalIamPolicies:
            - Policy: arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
        ComputeResources:
          - Name: compute
            InstanceType: t3.small
            MinCount: 1
            MaxCount: 5
    SlurmSettings:
      QueueUpdateStrategy: TERMINATE
   ```

1. Create the file `clusters_vars.tf` to define the variables that can be injected into cluster configurations.

   This file allows you to define dynamic values that can be used in cluster configurations, such as region and subnet.

   This example retrieves values directly from the project variables, but you may need to use custom logic to determine them.

   ```
   locals {
     config_vars = {
       subnet = var.subnet_id
       region = var.cluster_region
     }
   }
   ```

1. Create the file `variables.tf` to define the variables that can be injected for this project.

   ```
   variable "region" {
     description = "The region the ParallelCluster API is deployed in."
     type        = string
     default     = "us-east-1"
   }
   
   variable "cluster_region" {
     description = "The region the clusters will be deployed in."
     type        = string
     default     = "us-east-1"
   }
   
   variable "profile" {
     type        = string
     description = "The AWS profile used to deploy the clusters."
     default     = null
   }
   
   variable "subnet_id" {
     type        = string
     description = "The id of the subnet to be used for the ParallelCluster instances."
   }
   
   variable "api_stack_name" {
     type        = string
     description = "The name of the CloudFormation stack used to deploy the ParallelCluster API."
     default     = "ParallelCluster"
   }
   
   variable "api_version" {
     type        = string
     description = "The version of the ParallelCluster API."
   }
   ```

1. Create the file `terraform.tfvars` to set arbitrary values for the variables. 

   The file below deploys the clusters in `eu-west-1` within the subnet `subnet-123456789`, using the existing ParallelCluster API 3.11.1, which is already deployed in `us-east-1` with stack name `MyParallelClusterAPI-3111`.

   ```
   region = "us-east-1"
   api_stack_name = "MyParallelClusterAPI-3111"
   api_version = "3.11.1"
   
   cluster_region = "eu-west-1"
   subnet_id = "subnet-123456789"
   ```

1. Create the file `outputs.tf` to define the outputs returned by this project.

   ```
   output "clusters" {
     value = module.pcluster.clusters
   }
   ```

   The project directory is:

   ```
   my-clusters
   ├── config
   │   ├── cluster_config.yaml - Cluster configuration, where terraform variables can be injected..
   │   └── clusters.yaml - File listing all the clusters to deploy.
   ├── clusters.tf - Clusters defined as Terraform local variables.
   ├── clusters_vars.tf - Variables that can be injected into cluster configurations.
   ├── main.tf - Terraform entrypoint where the ParallelCluster module is configured.
   ├── outputs.tf - Defines the cluster as a Terraform output.
   ├── providers.tf - Configures the providers: ParallelCluster and AWS.
   ├── terraform.tf - Import the ParallelCluster provider.
   ├── terraform.tfvars - Defines values for variables, e.g. region, PCAPI stack name.
   └── variables.tf - Defines the variables, e.g. region, PCAPI stack name.
   ```

# Deploy the cluster
<a name="tutorial-create-cluster-terraform-deploy"></a>

To deploy the cluster, run the standard Terraform commands in order.

**Note**  
This example assumes that you've already deployed the ParallelCluster API in your account.

1. Build the project:

   ```
   terraform init
   ```

1. Define the deployment plan:

   ```
   terraform plan -out tfplan
   ```

1. Deploy the plan:

   ```
   terraform apply tfplan
   ```

## Deploy the ParallelCluster API with clusters
<a name="tutorial-create-cluster-terraform-deploy-api"></a>

If you haven't deployed the ParallelCluster API and you want to deploy it with the clusters, change the following files:
+ `main.tf`

  ```
  module "pcluster" {
    source  = "aws-tf/aws/parallelcluster"
    version = "1.0.0"
  
    region                = var.region
    api_stack_name        = var.api_stack_name
    api_version           = var.api_version
    deploy_pcluster_api   = true
    parameters = {
      EnableIamAdminAccess = "true"
    }
    
    template_vars         = local.config_vars
    cluster_configs       = local.cluster_configs
    config_path           = "config/clusters.yaml"
  }
  ```
+ `providers.tf`

  ```
  provider "aws-parallelcluster" {
    region   = var.region
    profile  = var.profile
    endpoint = module.pcluster.pcluster_api_stack_outputs.ParallelClusterApiInvokeUrl
    role_arn = module.pcluster.pcluster_api_stack_outputs.ParallelClusterApiUserRole
  }
  ```

# Required permissions
<a name="tutorial-create-cluster-terraform-permissions"></a>

You need the following permissions to deploy a cluster with Terraform:
+ assume the ParallelCluster API role, which is in charge of interacting with the ParallelCluster API
+ describe the CloudFormation stack of the ParallelCluster API to verify it exists and retrieve its parameters and outputs

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Action": "sts:AssumeRole",
            "Resource": "arn:aws:sts::111122223333:role/PCAPIUserRole-*",
            "Effect": "Allow",
            "Sid": "AssumePCAPIUserRole"
        },
        {
            "Action": [
                "cloudformation:DescribeStacks"
            ],
            "Resource": "arn:aws:cloudformation:us-east-1:111122223333:stack/*",
            "Effect": "Allow",
            "Sid": "CloudFormation"
        }
    ]
}
```

------

# Creating a custom AMI with Terraform
<a name="tutorial-create-ami-terraform"></a>

When using AWS ParallelCluster, you only pay for the AWS resources that are created when you create or update AWS ParallelCluster images and clusters. For more information, see [AWS services used by AWS ParallelCluster](aws-services-v3.md).

**Prerequisites**
+  Terraform v1.5.7\$1 is installed. 
+ [AWS ParallelCluster API](api-reference-v3.md) v3.8.0\$1 is deployed in your account. See [Creating a cluster with Terraform](tutorial-create-cluster-terraform.md). 
+ IAM role with the permissions to invoke the ParallelCluster API. See [Required permissions](tutorial-create-ami-terraform-permissions.md).

# Define a Terraform project
<a name="tutorial-create-ami-terraform-define"></a>

In this tutorial, you will define a simple Terraform project to deploy a ParallelCluster custom AMI.

1. Create a directory called `my-amis`. 

   All files that you create will be within this directory.

1. Create the file `terraform.tf` to import the ParallelCluster provider.

   ```
   terraform {
     required_version = ">= 1.5.7"
     required_providers {
       aws-parallelcluster = {
         source  = "aws-tf/aws-parallelcluster"
         version = "~> 1.0"
       }
     }
   }
   ```

1. Create the file `providers.tf` to configure the ParallelCluster and AWS providers.

   ```
   provider "aws" {
     region  = var.region
     profile = var.profile
   }
   
   provider "aws-parallelcluster" {
     region         = var.region
     profile        = var.profile
     api_stack_name = var.api_stack_name
     use_user_role  = true
   }
   ```

1. Create the file `main.tf` to define the resources using the ParallelCluster module.

   To review the image properties that you can set within the `image_configuration` element, see [Build image configuration files](image-builder-configuration-file-v3.md).

   To review the options that you can set for image creation, for example `image_id` and `rollback_on_failure`, see [`pcluster build-image`](pcluster.build-image-v3.md). 

   ```
   data "aws-parallelcluster_list_official_images" "parent_image" {
     region = var.region
     os = var.os
     architecture = var.architecture
   }
   
   resource "aws-parallelcluster_image" "demo01" {
     image_id            = "demo01"
     image_configuration = yamlencode({
       "Build":{
         "InstanceType": "c5.2xlarge",
         "ParentImage": data.aws-parallelcluster_list_official_images.parent_image.official_images[0].amiId,
         "UpdateOsPackages": {"Enabled": false}
       }
     })
     rollback_on_failure = false
   }
   ```

1. Create the file `variables.tf` to define the variables that can be injected for this project.

   ```
   variable "region" {
     description = "The region the ParallelCluster API is deployed in."
     type        = string
     default     = "us-east-1"
   }
   
   variable "profile" {
     type        = string
     description = "The AWS profile used to deploy the clusters."
     default     = null
   }
   
   variable "api_stack_name" {
     type        = string
     description = "The name of the CloudFormation stack used to deploy the ParallelCluster API."
     default     = "ParallelCluster"
   }
   
   variable "api_version" {
     type        = string
     description = "The version of the ParallelCluster API."
   }
   
   variable "os" {
     type        = string
     description = "The OS of the ParallelCluster image."
   }
   
   variable "architecture" {
     type        = string
     description = "The architecture of the ParallelCluster image."
   }
   ```

1. Create the file `terraform.tfvars` to set your arbitrary values for the variables. 

   With the file below deploy the custom AMI in `us-east-1` based on Amazon Linux 2 for x86\$164 architecture, using the existing ParallelCluster API 3.11.1 which is already deployed in `us-east-1` with stack name `MyParallelClusterAPI-3111`.

   ```
   region = "us-east-1"
   api_stack_name = "MyParallelClusterAPI-3111"
   api_version = "3.11.1"
   
   os = "alinux2"
   architecture = "x86_64"
   ```

1. Create the file `outputs.tf` to define the outputs returned by this project.

   ```
   output "parent_image" {
     value = data.aws-parallelcluster_list_official_images.parent_image.official_images[0]
   }
   
   output "custom_image" {
     value = aws-parallelcluster_image.demo01
   }
   ```

   The project directory is:

   ```
   my-amis
   ├── main.tf - Terraform entrypoint where the ParallelCluster module is configured.
   ├── outputs.tf - Defines the cluster as a Terraform output.
   ├── providers.tf - Configures the providers: ParallelCluster and AWS.
   ├── terraform.tf - Import the ParallelCluster provider.
   ├── terraform.tfvars - Defines values for variables, e.g. region, PCAPI stack name.
   └── variables.tf - Defines the variables, e.g. region, PCAPI stack name.
   ```

# Deploy the AMI
<a name="tutorial-create-ami-terraform-deploy"></a>

To deploy the AMI, run the standard Terraform commands in order.

1. Build the project:

   ```
   terraform init
   ```

1. Define the deployment plan:

   ```
   terraform plan -out tfplan
   ```

1. Deploy the plan:

   ```
   terraform apply tfplan
   ```

# Required permissions
<a name="tutorial-create-ami-terraform-permissions"></a>

You need the following permissions to deploy a custom AMI with Terraform:
+ assume the ParallelCluster API role, which is in charge of interacting with the ParallelCluster API
+ describe the CloudFormation stack of the ParallelCluster API, to verify it exists and retrieve its parameters and outputs

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Action": "sts:AssumeRole",
            "Resource": "arn:aws:sts::111122223333:role/PCAPIUserRole-*",
            "Effect": "Allow",
            "Sid": "AssumePCAPIUserRole"
        },
        {
            "Action": [
                "cloudformation:DescribeStacks"
            ],
            "Resource": "arn:aws:cloudformation:us-east-1:111122223333:stack/*",
            "Effect": "Allow",
            "Sid": "CloudFormation"
        }
    ]
}
```

------

# AWS ParallelCluster UI Integration with Identity Center
<a name="tutorials_10_pcui-aws-ic-integration-v3"></a>

The goal of this tutorial is to demonstrate how to integrate AWS ParallelCluster UI with IAM Identity Center for a single sign-on solution that unifies users in Active Directory that can be shared with AWS ParallelCluster clusters.

When using AWS ParallelCluster, you only pay for the AWS resources that are created when you create or update AWS ParallelCluster images and clusters. For more information, see [AWS services used by AWS ParallelCluster](aws-services-v3.md).

**Prerequisites:**
+ An existing AWS ParallelCluster UI which can be installed following the instructions [ here.](install-pcui-v3.md)
+ An existing Managed Active Directory, preferably one that you will also use for [ integrating with AWS ParallelCluster](tutorials_05_multi-user-ad.md).

## Enable IAM Identity Center
<a name="enable-iam-identity-center-v3"></a>

If you already have an identity center connected to the your AWS Managed Microsoft AD (Active Directory) it can be used and you can skip to the section **Adding your Application to IAM Identity Center**.

If you do not already have an identity center connected to an AWS Managed Microsoft AD, follow the steps below to set it up.

**Enabling Identity Center**

1. In the console, navigate to IAM Identity Center. (Make sure you are in the region in which you have your AWS Managed Microsoft AD.)

1. Click the **Enable** button, this may ask if you want to enable organizations, this is a requirement so you can select to enable it. **Note **: This will email the administrator of your account with a confirmation email that you should follow the link to confirm.

**Connecting Identity Center to Managed AD**

1. On the next page after enabling identity center you should see Recommended Set Up Steps, under Step 1, select **Choose Your Identity Source**.

1. In the Identity Source section, click on the **Actions** drop down menu (in the top right), then select **Change Identity Source**.

1. Select **Active Directory**.

1. Under **Existing Directories**, choose your directory.

1. Click Next.

1. Review your changes, scroll to the bottom, type ACCEPT into the text box to confirm, then click **Change Identity Source**.

1. Wait for the changes to complete, then you should see a green banner at the top.

**Syncing users and groups to Identity Center**

1. In the green banner click **Start Guided Setup** (button in the top right one)  
![\[Screenshot highlighting the Start Guided Setup button.\]](http://docs.aws.amazon.com/parallelcluster/latest/ug/images/tutorials/pcui_awsic_integration/IAC_start_guided_setup_1.png)

1. In the **Configure Attribute Mappings**, click **Next**

1. In the Configure sync scope section, type in the name of the users you want synced to identity center, then click **Add**

1. Once finished adding users and groups, click **Next**  
![\[Screenshot highlighting Next button.\]](http://docs.aws.amazon.com/parallelcluster/latest/ug/images/tutorials/pcui_awsic_integration/IAC_guided_setup_add_users_groups_2.png)

1. Review your changes, then click **Save configuration**

1. If you see a warning in the next screen about users not being synced, select the **Resume sync button** in the top right.

1. Next, to enable users, In the **Users** tab on the left, select a user and then click **Enable user access** > **Enable user access** 

   **Note**: You may need to select Resume sync if you have a warning banner at the top and then wait for users to sync (try the refresh button to see if they are synced yet).  
![\[Screenshot highlighting Users tab.\]](http://docs.aws.amazon.com/parallelcluster/latest/ug/images/tutorials/pcui_awsic_integration/IAC_enable_user_access_3.png)

## Adding your Application to IAM Identity Center
<a name="adding-apps-to-iam-identity-center-v3"></a>

Once you have synced your users with IAM Identity Center, you will need to add a new application. This configures which SSO enabled applications will be available from your IAM Identity Center portal. In this case, we will be adding AWS ParallelCluster UI as an application while IAM Identity Center will be the identity provider.

The next step will add the AWS ParallelCluster UI as an application in IAM Identity Center. AWS ParallelCluster UI is a web portal that helps the user to manage their clusters. For more information see [AWS ParallelCluster UI](pcui-using-v3.md).

**Setting up the application in Identity Center**

1. Under **IAM Identity Center** > **Applications** (found on the left menu bar, click on Applications)

1. Click **Add Application**

1. Select **Add custom SAML 2.0 application**

1. Click **Next**

1. Select the display name and description you would like to use (e.g. PCUI and AWS ParallelCluster UI)

1. Under **IAM Identity Center metadata**, copy the link for IAM Identity Center SAML metadata file and save for later, this will be used when configuring SSO on the web app

1. Under **Application properties**, in the Application start URL, put your PCUI address. This can be found by going to the CloudFormation console, selecting the stack that corresponds to PCUI (e.g. parallelcluster-ui) and going to the **Outputs** tab to find ParallelClusterUIUrl

   e.g. https://m2iwazsi1j.execute-api.us-east-1.amazonaws.com

1. Under **Application metadata**, choose **Manually type your metadata values**. Then provide the following values.

   1. **Important**: Make sure to replace the domain-prefix, region, and userpool-id values with information that's specific to your environment.

   1. The domain prefix, region and userpool-id can be obtained by opening the **Amazon Cognito** > **User pools console**  
![\[Screenshot highlighting User Pool Name under Cognito user pools\]](http://docs.aws.amazon.com/parallelcluster/latest/ug/images/tutorials/pcui_awsic_integration/Amazon_cognito_user_pools_4.png)

   1. Select the user pool that corresponds to PCUI (which will have a User pool name like pcui-cd8a2-Cognito-153EK3TO45S98-userpool)

   1. Navigate to **App Integration**  
![\[Screenshot highlighting the Cognito Domain in the App Integration tab\]](http://docs.aws.amazon.com/parallelcluster/latest/ug/images/tutorials/pcui_awsic_integration/Amazon_cognito_app_integration_5.png)

1. Application Assertion Consumer Service (ACS) URL: https://<domain-prefix>.auth.<region>.amazoncognito.com/saml2/idpresponse

   Application SAML audience: urn:amazon:cognito:sp:<userpool-id>

1. Choose **Submit**. Then, go to the **Details** page for the application that you added.

1. Select the **Actions** dropdown list and choose **Edit attribute mappings**. Then, provide the following attributes.

   1. User attribute in the application: **subject** (Note: **subject** is prefilled.) → Maps to this string value or user attribute in IAM Identity Center: **\$1\$1user:email\$1**, Format: **emailAddress**

   1. User attribute in the application: **email** → Maps to this string value or user attribute in IAM Identity Center: **\$1\$1user:email\$1**, Format: **unspecified**  
![\[Screenshot highlighting the Attribute Mappings for PCUI section\]](http://docs.aws.amazon.com/parallelcluster/latest/ug/images/tutorials/pcui_awsic_integration/IAC_attribute_mappings_PCUI_6.png)

1. Save your changes.

1. Choose the **Assign Users** button and then assign your user to the application. These are the users in your Active Directory that will have access to the PCUI interface.  
![\[Screenshot highlighting Assign users for the application.\]](http://docs.aws.amazon.com/parallelcluster/latest/ug/images/tutorials/pcui_awsic_integration/IAC_PCUI_App_7.png)

**Configure IAM Identity Center as a SAML IdP in your user pool**

1. In your user pool settings, select **Sign-in experience** > **Add identity provider**  
![\[Screenshot highlighting Sign-in experience tab\]](http://docs.aws.amazon.com/parallelcluster/latest/ug/images/tutorials/pcui_awsic_integration/Amazon_cognito_sign_in_expereince_8.png)

1. Choose a SAML IdP

1. For **Provider name** provide **IdentityCenter**

1. Under **Metadata document source** choose **Enter metadata document endpoint URL** and provide the URL copied during the Application setup of Identity Center

1. Under the **Attributes**, for email choose email  
![\[Screenshot highlighting Sign-in experience tab\]](http://docs.aws.amazon.com/parallelcluster/latest/ug/images/tutorials/pcui_awsic_integration/Amazonw_cognito_SAML_9.png)

1. Select **Add identity provider**.

**Integrate the IdP with the user pool app client**

1. Next, under the **App Integration** section of your user pool, choose the client listed under **App client list**  
![\[Screenshot highlighting Sign-in experience tab\]](http://docs.aws.amazon.com/parallelcluster/latest/ug/images/tutorials/pcui_awsic_integration/Amazon_cognito_user_pool_app_client_10.png)

1. Under **Hosted UI** choose **Edit**

1. Under **Identity providers** choose **IdentityCenter** as well.

1. Choose **Save changes**

**Validate your setup**

1. Next we will validate the setup that we just created by logging in to PCUI. Sign in to your PCUI portal and you should now see an option to sign in with your Corporate ID:  
![\[Screenshot highlighting Sign-in experience tab\]](http://docs.aws.amazon.com/parallelcluster/latest/ug/images/tutorials/pcui_awsic_integration/Amazon_cognito_validate_step_11.png)

1. Clicking the **IdentityCenter** button should take you to the IAM Identity Center IdP login followed by a page with your applications on it which includes PCUI, open that application.

1. Once you get to the following screen, your user will have been added to the Cognito user pool.  
![\[Screenshot highlighting Sign-in experience tab\]](http://docs.aws.amazon.com/parallelcluster/latest/ug/images/tutorials/pcui_awsic_integration/Amazon_cognito_continue_with_IC_12.png)

**Make your user an administrator**

1. Now navigate to the **Amazon Cognito** > **User pools console** and select the newly created user which should have a prefix of identitycenter  
![\[Screenshot highlighting Sign-in experience tab\]](http://docs.aws.amazon.com/parallelcluster/latest/ug/images/tutorials/pcui_awsic_integration/Amazon_cognito_user_pools_new_created_user_13.png)

1. Under **Group memberships** select **Add user to group**, choose **admin** and click **Add**.

1. Now when you click **Continue with IdentityCenter** you will be navigated to the AWS ParallelCluster UI page.

# Running containerized jobs with Pyxis
<a name="tutorials_11_running-containerized-jobs-with-pyxis"></a>

Learn how to create a cluster that is able to run containerized jobs using Pyxis, which is a SPANK plugin to manage containerized jobs in SLURM. Containers in Pyxis are managed by Enroot, which is tool to turn traditional container/OS images into unprivileged sandboxes. For more information, see [NVIDIA Pyxis](https://github.com/NVIDIA/pyxis) and [NVIDIA Enroot](https://github.com/NVIDIA/enroot).

**Note**  
This feature is available with AWS ParallelCluster v3.11.1
The scripts in this tutorial move (`mv`) some files, which deletes them from their original locations. If you want to keep copies of these files in their original locations, change the scripts to use the copy (`cp`) command instead.

When using AWS ParallelCluster, you only pay for the AWS resources that are created when you create or update AWS ParallelCluster images and clusters. For more information, see [AWS services used by AWS ParallelCluster](aws-services-v3.md).

**Prerequisites:**
+ The AWS CLI is[ installed and configured.](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)
+ An [ Amazon EC2 key pair.](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html).
+ An IAM role with the [permissions](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html) that are required to run the[ pcluster CLI.](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html)

## Create the cluster
<a name="create-the-cluster"></a>

Starting with AWS ParallelCluster 3.11.1, all official AMIs comes with Pyxis and Enroot pre-installed. In particular, SLURM is recompiled with Pyxis support and Enroot is installed as a binary in the system. However, you must to configure them according to your specific needs. The folders used by Enroot and Pyxis will have a critical impact on cluster performance. For more information, see [Pyxis documentation](https://github.com/NVIDIA/pyxis/wiki/Setup#slurm-plugstack-configuration) and [Enroot documentation.](https://github.com/NVIDIA/pyxis/wiki/Setup#enroot-configuration-example)

For your convenience, you can find sample configurations for both Pyxis, Enroot and SPANK within `/opt/parallelcluster/examples/`. 

To deploy a cluster using the sample configurations we have provided, complete the following tutorial. 

**To create the cluster with sample configuration**

Pyxis and Enroot must be configured on the head node by first creating the persistent and volatile directories for Enroot, then creating the runtime directory for Pyxis, and finally enabling Pyxis as SPANK plugin in the whole cluster.

1. Execute the below script as [OnNodeConfigured](HeadNode-v3.md#yaml-HeadNode-CustomActions-OnNodeConfigured) custom action in the head node to configure Pyxis and Enroot on the head node.

   ```
   #!/bin/bash
   set -e
   
   echo "Executing $0"
   
   # Configure Enroot
   ENROOT_PERSISTENT_DIR="/var/enroot"
   ENROOT_VOLATILE_DIR="/run/enroot"
   
   sudo mkdir -p $ENROOT_PERSISTENT_DIR
   sudo chmod 1777 $ENROOT_PERSISTENT_DIR
   sudo mkdir -p $ENROOT_VOLATILE_DIR
   sudo chmod 1777 $ENROOT_VOLATILE_DIR
   sudo mv /opt/parallelcluster/examples/enroot/enroot.conf /etc/enroot/enroot.conf
   sudo chmod 0644 /etc/enroot/enroot.conf
   
   # Configure Pyxis
   PYXIS_RUNTIME_DIR="/run/pyxis"
   
   sudo mkdir -p $PYXIS_RUNTIME_DIR
   sudo chmod 1777 $PYXIS_RUNTIME_DIR
   
   sudo mkdir -p /opt/slurm/etc/plugstack.conf.d/
   sudo mv /opt/parallelcluster/examples/spank/plugstack.conf /opt/slurm/etc/
   sudo mv /opt/parallelcluster/examples/pyxis/pyxis.conf /opt/slurm/etc/plugstack.conf.d/
   sudo -i scontrol reconfigure
   ```

1. Pyxis and Enroot must be configured on the compute fleet by creating the persistent and volatile directories for Enroot and the runtime directory for Pyxis. Execute the below script as [OnNodeStart](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-CustomActions-OnNodeStart) custom action in compute nodes to configure Pyxis and Enroot on the compute fleet.

   ```
   #!/bin/bash
   set -e
   
   echo "Executing $0"
   
   # Configure Enroot
   ENROOT_PERSISTENT_DIR="/var/enroot"
   ENROOT_VOLATILE_DIR="/run/enroot"
   ENROOT_CONF_DIR="/etc/enroot"
   
   sudo mkdir -p $ENROOT_PERSISTENT_DIR
   sudo chmod 1777 $ENROOT_PERSISTENT_DIR
   sudo mkdir -p $ENROOT_VOLATILE_DIR
   sudo chmod 1777 $ENROOT_VOLATILE_DIR
   sudo mkdir -p $ENROOT_CONF_DIR
   sudo chmod 1777 $ENROOT_CONF_DIR
   sudo mv /opt/parallelcluster/examples/enroot/enroot.conf /etc/enroot/enroot.conf
   sudo chmod 0644 /etc/enroot/enroot.conf
   
   # Configure Pyxis
   PYXIS_RUNTIME_DIR="/run/pyxis"
   
   sudo mkdir -p $PYXIS_RUNTIME_DIR
   sudo chmod 1777 $PYXIS_RUNTIME_DIR 
   
   # In Ubuntu24.04 Apparmor blocks the creation of unprivileged user namespaces,
   # which is required by Enroot. So to run Enroot, it is required to disable this restriction.
   # See https://ubuntu.com/blog/ubuntu-23-10-restricted-unprivileged-user-namespaces
   source /etc/os-release
   if [ "${ID}${VERSION_ID}" == "ubuntu24.04" ]; then
       echo "kernel.apparmor_restrict_unprivileged_userns = 0" | sudo tee /etc/sysctl.d/99-pcluster-disable-apparmor-restrict-unprivileged-userns.conf
       sudo sysctl --system
   fi
   ```

## Submit jobs
<a name="submit-jobs"></a>

Now that Pyxis is configured in your cluster, you can submit containerized jobs using the sbatch and srun command, that are now enriched with container specific options.

```
# Submitting an interactive job
srun -N 2 --container-image docker://ubuntu:22.04 hostname

# Submitting a batch job
sbatch -N 2 --wrap='srun --container-image docker://ubuntu:22.04 hostname'
```

# Creating a cluster with an EFA-enabled FSx Lustre
<a name="tutorial-efa-enabled-fsx-lustre"></a>

In this tutorial, you will create a cluster that uses an EFA-enabled FSx Lustre file system as shared storage. Using an FSx Lustre file system with EFA enabled can provide a boost in performance up to 8x. To verify if an EFA-enabled file system is what you need, look at [Working with EFA-enabled file systems](https://docs.aws.amazon.com/fsx/latest/LustreGuide/efa-file-systems.html) in the *FSx for Lustre User Guide*.

When you use AWS ParallelCluster, you only pay for the AWS resources that are created when you create or update AWS ParallelCluster images and clusters. For more information, see [AWS services used by AWS ParallelCluster](aws-services-v3.md).

## Requirements
<a name="tutorial-efa-enabled-fsx-lustre-requirements"></a>
+ The AWS CLI is [installed and configured](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html).
+ The ParallelCluster CLI is [installed and configured](install-v3-parallelcluster.md).
+ An [Amazon EC2 key pair](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html) to log into the cluster.
+ An IAM role with the [permissions](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html) that are required to run the ParallelCluster CLI.

## Create Security Groups
<a name="tutorial-efa-enabled-fsx-lustre-security-groups"></a>

Create two security groups in the same VPC where the cluster and the file system will be deployed: one for the client running on cluster nodes and one for the file system.

```
# Create security group for the FSx client
aws ec2 create-security-group \
    --group-name Fsx-Client-SecurityGroup \
    --description "Allow traffic for the FSx Lustre client" \
    --vpc-id vpc-cluster \
    --region region

# Create security group for the FSx file system
aws ec2 create-security-group \
    --group-name Fsx-FileSystem-SecurityGroup \
    --description "Allow traffic for the FSx Lustre File System" \
    --vpc-id vpc-cluster \
    --region region
```

In the remainder of the tutorial, we will assume `sg-client` and `sg-file-system` are the security group ids of the client and file system, respectively.

Configure the security group for the client to allow all outbound traffic to the file system, as [required by EFA](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-start.html#efa-start-security).

```
# Allow all outbound traffic from the client to the file system
aws ec2 authorize-security-group-egress \
 --group-id sg-client \ 
 --protocol -1 \
 --port -1 \
 --source-group sg-file-system \
 --region region
```

Configure the security group for the file system to allow all inbound/outbound traffic within itself and all inbound traffic from the client, as [required by EFA](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-start.html#efa-start-security). 

```
# Allow all inbound traffic within this security group
aws ec2 authorize-security-group-ingress \
    --group-id sg-file-system \
    --protocol -1 \
    --port -1 \
    --source-group sg-file-system \
    --region region

# Allow all outbound traffic within this security group
aws ec2 authorize-security-group-egress \
    --group-id sg-file-system \
    --protocol -1 \
    --port -1 \
    --source-group sg-file-system \
    --region region

# Allow all inbound traffic from the client
aws ec2 authorize-security-group-ingress \
    --group-id sg-file-system \
    --protocol -1 \
    --port -1 \
    --source-group sg-client \
    --region region

# Allow all outbound traffic to the client
aws ec2 authorize-security-group-egress \
    --group-id sg-file-system \
    --protocol -1 \
    --port -1 \
    --source-group sg-client \
    --region region
```

## Create the file system
<a name="tutorial-efa-enabled-fsx-lustre-create-filesystem"></a>

Create the file system within the same Availability Zone (AZ) where the compute nodes will be; and replace `subnet-compute-nodes` with its ID in the following code. This is required to allow EFA work with your file system. Note that, as part of the file system creation, we enable EFA using the EfaEnable property.

```
aws fsx create-file-system \
    --file-system-type LUSTRE \
    --storage-capacity 38400 \
    --storage-type SSD \
    --subnet-ids subnet-compute-nodes \
    --security-group-ids sg-file-system \
    --lustre-configuration DeploymentType=PERSISTENT_2,PerUnitStorageThroughput=125,EfaEnabled=true,MetadataConfiguration={Mode=AUTOMATIC} \
    --region region
```

Take note of the file system id returned by the previous command. In the remainder of the tutorial, replace `fs-id` with this file system id.

## Create the cluster
<a name="tutorial-efa-enabled-fsx-lustre-create-cluster"></a>

1. Create the cluster with the following configurations set in the AWS ParallelCluster YAML configuration file:

   1. AMI based on a supported OS, such as Ubuntu 22.04.

   1. Compute nodes must use an [EFA supported instance type](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa.html#efa-instance-types) having [Nitro v4\$1](https://docs.aws.amazon.com/ec2/latest/instancetypes/ec2-nitro-instances.html), such as g6.16xlarge.
      + Compute nodes must be in the same AZ where the file system is.
      + Compute nodes must have [Efa/Enabled](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-ComputeResources-Efa-Enabled) set to true.
      + Compute nodes must run the configuration script `configure-efa-fsx-lustre-client.sh` as an [OnNodeStart](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-CustomActions-OnNodeStart) custom action. The script, provided in the [FSx official documentation](https://docs.aws.amazon.com/fsx/latest/LustreGuide/configure-efa-clients.html) and offered in our public bucket for your convenience, is meant to configure the FSx Lustre client on compute nodes to let them use EFA.

1. Create a cluster configuration file `config.yaml`:

   ```
   Region: region
   Image:
     Os: ubuntu2204
   HeadNode:
     InstanceType: c5.xlarge
     Networking:
       SubnetId: subnet-xxxxxxxxxx
       AdditionalSecurityGroups:
           - sg-client
     Ssh:
       KeyName: my-ssh-key
   Scheduling:
     Scheduler: slurm
     SlurmQueues:
       - Name: q1
         ComputeResources:
           - Name: cr1
             Instances:
               - InstanceType: g6.16xlarge
             MinCount: 1
             MaxCount: 3
             Efa:
               Enabled: true
         Networking:
           SubnetIds:
             - subnet-xxxxxxxxxx # Subnet in the same AZ where the file system is
           AdditionalSecurityGroups:
             - sg-client
           PlacementGroup:
             Enabled: false
         CustomActions:
           OnNodeStart:
             Script: https://us-east-1-aws-parallelcluster.s3.us-east-1.amazonaws.com/scripts/fsx-lustre-efa/configure-efa-fsx-lustre-client.sh
   SharedStorage:
     - MountDir: /fsx
       Name: my-fsxlustre-efa-external
       StorageType: FsxLustre
       FsxLustreSettings:
         FileSystemId: fs-id
   ```

   Then create a cluster using that configuration:

   ```
   pcluster create-cluster \
       --cluster-name fsx-efa-tutorial \
       --cluster-configuration config.yaml \
       --region region
   ```

## Validate FSx with EFA is working
<a name="tutorial-efa-enabled-fsx-lustre-validate"></a>

To verify that Lustre network traffic is using EFA, use the Lustre `lnetctl` tool that can show the network traffic for a given network interface. To this aim, execute the following commands in a compute node:

```
# Take note of the number of packets flowing through the interface, 
# which are specified in statistics:send_count and statistics:recv_count
sudo lnetctl net show --net efa -v

# Generate traffic to the file system
echo 'Hello World' > /fsx/hello-world.txt

# Take note of the number of packets flowing through the interface, 
# which are specified in statistics:send_count and statistics:recv_count
sudo lnetctl net show --net efa -v
```

If the feature is working, the number of packets flowing through the interface is expected to increase.

# Support NVIDIA-Imex with p6e-gb200 instance
<a name="support-nvidia-imex-p6e-gb200-instance"></a>

This tutorial shows you how to get started with AWS ParallelCluster on P6e-GB200, to leverage the highest GPU performance for AI training and inference. [ p6e-gb200.36xlarge instances are only available via P6e-GB200 UltraServers](https://aws.amazon.com/ec2/instance-types/p6/) where the `u-p6e-gb200x72` is Ultraserver Size and `p6e-gb200.36xlarge` is the [InstanceType](Scheduling-v3.md#yaml-Scheduling-SlurmQueues-ComputeResources-InstanceType) which forms the ultraserver. On purchasing an Ultraserver `u-p6e-gb200x72` it will be available through a [EC2 Capacity Blocks for ML](https://aws.amazon.com/ec2/capacityblocks/) which will have 18 `p6e-gb200.36xlarge` instances. To learn more, see [P6e-GB200](https://aws.amazon.com/ec2/instance-types/p6/).

AWS ParallelCluster version 3.14.0: 
+ provides the complete NVIDIA software stack (drivers, CUDA, EFA, NVIDIA-IMEX) required by this instance type 
+ creates nvidia-imex configurations for P6e-GB200 ultraserver
+ enables and starts the `nvidia-imex` service for P6e-GB200 ultraserver
+ configures the Slurm Block topology plugin so that every P6e-GB200 Ultraserver (an EC2 Capacity Block) is a Slurm Block with the right size (see the [Release notes and document history](document_history.md) entry for version 3.14.0).

However, GPU-to-GPU communication over NVLink requires additional configurations, specifically a [https://docs.nvidia.com/multi-node-nvlink-systems/imex-guide/config.html#imex-service-node-configuration-file-location](https://docs.nvidia.com/multi-node-nvlink-systems/imex-guide/config.html#imex-service-node-configuration-file-location) file that contains the IP addresses of compute nodes in an IMEX domain that ParallelCluster does not generate automatically. To help generate this file, we provide a prolog script that automatically discovers compute node IPs and configures the [https://docs.nvidia.com/multi-node-nvlink-systems/imex-guide/config.html#imex-service-node-configuration-file-location](https://docs.nvidia.com/multi-node-nvlink-systems/imex-guide/config.html#imex-service-node-configuration-file-location) following [ NVIDIA IMEX Slurm Job Scheduler Integration](https://docs.nvidia.com/multi-node-nvlink-systems/imex-guide/deployment.html#job-scheduler-integration) recommendations. This tutorial walks you through how to create the prolog script, deploy it via a HeadNode custom action, and validate the IMEX setup.

**Note**  
P6e-GB200 is supported starting with AWS ParallelCluster v3.14.0 on Amazon Linux 2023, Ubuntu 22.04, and Ubuntu 24.04. For detailed software versions and an updated list of supported distributions, see the [AWS ParallelCluster changelog](https://github.com/aws/aws-parallelcluster/blob/develop/CHANGELOG.md).

## Create a Prolog Script to manage NVIDIA-Imex
<a name="support-nvidia-imex-p6e-gb200-instance-prolog"></a>

**Limitation:**
+ This prolog script will run on submission of an exclusive job. This is to ensure that an IMEX re-start does not disrupt any running jobs on p6e-Gb200 nodes that belong to an IMEX domain.

Below is the `91_nvidia_imex_prolog.sh` script you should configure as a prolog in Slurm. It is used to automatically update the nvidia-imex configuration on compute nodes. The script's name has a prefix of `91` to adhere to [ SchedMD's naming convention](https://slurm.schedmd.com/prolog_epilog.html). This ensures it executes prior to any other prolog scripts in the sequence. The script reconfigures the NVIDIA Imex node's configuration when a job is started and reloads the necessary NVIDIA daemons.

**Note**  
This script will not be executed in case multiple jobs are started concurrently on the same nodes, therefore we suggest to use the `--exclusive` flag on submission.

```
#!/usr/bin/env bash

# This prolog script configures the NVIDIA IMEX on compute nodes involved in the job execution.
#
# In particular:
# - Checks whether the job is executed exclusively.
#   If not, it exits immediately because it requires jobs to be executed exclusively.
# - Checks if it is running on a p6e-gb200 instance type.
#   If not, it exits immediately because IMEX must be configured only on that instance type.
# - Checks if the IMEX service is enabled.
#   If not, it exits immediately because IMEX must be enabled to get configured.
# - Creates the IMEX default channel.
#   For more information about IMEX channels, see https://docs.nvidia.com/multi-node-nvlink-systems/imex-guide/imexchannels.html
# - Writes the private IP addresses of compute nodes into /etc/nvidia-imex/nodes_config.cfg.
# - Restarts the IMEX system service.
#
# REQUIREMENTS:
#  - This prolog assumes to be run only with exclusive jobs.

LOG_FILE_PATH="/var/log/parallelcluster/nvidia-imex-prolog.log"
SCONTROL_CMD="/opt/slurm/bin/scontrol"
IMEX_START_TIMEOUT=60
IMEX_STOP_TIMEOUT=15
ALLOWED_INSTANCE_TYPES="^(p6e-gb200)"
IMEX_SERVICE="nvidia-imex"
IMEX_NODES_CONFIG="/etc/nvidia-imex/nodes_config.cfg"

function info() {
  echo "$(date "+%Y-%m-%dT%H:%M:%S.%3N") [INFO] [PID:$$] [JOB:${SLURM_JOB_ID}] $1"
}

function warn() {
  echo "$(date "+%Y-%m-%dT%H:%M:%S.%3N") [WARN] [PID:$$] [JOB:${SLURM_JOB_ID}] $1"
}

function error() {
  echo "$(date "+%Y-%m-%dT%H:%M:%S.%3N") [ERROR] [PID:$$] [JOB:${SLURM_JOB_ID}] $1"
}

function error_exit() {
  error "$1" && exit 1
}

function prolog_end() {
    info "PROLOG End JobId=${SLURM_JOB_ID}: $0"
    info "----------------"
    exit 0
}

function get_instance_type() {
  local token=$(curl -X PUT -s "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")
  curl -s -H "X-aws-ec2-metadata-token: ${token}" http://169.254.169.254/latest/meta-data/instance-type
}

function return_if_unsupported_instance_type() {
  local instance_type=$(get_instance_type)

  if [[ ! ${instance_type} =~ ${ALLOWED_INSTANCE_TYPES} ]]; then
    info "Skipping IMEX configuration because instance type ${instance_type} does not support it"
    prolog_end
  fi
}

function return_if_imex_disabled() {
  if ! systemctl is-enabled "${IMEX_SERVICE}" &>/dev/null; then
    warn "Skipping IMEX configuration because system service ${IMEX_SERVICE} is not enabled"
    prolog_end
  fi
}

function return_if_job_is_not_exclusive() {
  if [[ "${SLURM_JOB_OVERSUBSCRIBE}" =~ ^(NO|TOPO)$  ]]; then
    info "Job is exclusive, proceeding with IMEX configuration"
  else
    info "Skipping IMEX configuration because the job is not exclusive"
    prolog_end
  fi
}

function get_ips_from_node_names() {
  local _nodes=$1
  ${SCONTROL_CMD} -ao show node "${_nodes}" | sed 's/^.* NodeAddr=\([^ ]*\).*/\1/'
}

function get_compute_resource_name() {
  local _queue_name_prefix=$1
  local _slurmd_node_name=$2
  echo "${_slurmd_node_name}" | sed -E "s/${_queue_name_prefix}(.+)-[0-9]+$/\1/"
}

function reload_imex() {
  info "Stopping IMEX"
  timeout ${IMEX_STOP_TIMEOUT} systemctl stop ${IMEX_SERVICE}
  pkill -9 ${IMEX_SERVICE}

  info "Restarting IMEX"
  timeout ${IMEX_START_TIMEOUT} systemctl start ${IMEX_SERVICE}
}

function create_default_imex_channel() {
  info "Creating IMEX default channel"
  MAJOR_NUMBER=$(cat /proc/devices | grep nvidia-caps-imex-channels | cut -d' ' -f1)
  if [ ! -d "/dev/nvidia-caps-imex-channels" ]; then
    sudo mkdir /dev/nvidia-caps-imex-channels
  fi

  # Then check and create device node
  if [ ! -e "/dev/nvidia-caps-imex-channels/channel0" ]; then
    sudo mknod /dev/nvidia-caps-imex-channels/channel0 c $MAJOR_NUMBER 0
    info "IMEX default channel created"
  else
    info "IMEX default channel already exists"
  fi
}

{
  info "PROLOG Start JobId=${SLURM_JOB_ID}: $0"

  return_if_job_is_not_exclusive
  return_if_unsupported_instance_type
  return_if_imex_disabled

  create_default_imex_channel

  IPS_FROM_CR=$(get_ips_from_node_names "${SLURM_NODELIST}")

  info "Node Names: ${SLURM_NODELIST}"
  info "Node IPs: ${IPS_FROM_CR}"
  info "IMEX Nodes Config: ${IMEX_NODES_CONFIG}"

  info "Updating IMEX nodes config ${IMEX_NODES_CONFIG}"
  echo "${IPS_FROM_CR}" > "${IMEX_NODES_CONFIG}"
  reload_imex

  prolog_end

} 2>&1 | tee -a "${LOG_FILE_PATH}" | logger -t "91_nvidia_imex_prolog"
```

## Create the HeadNode OnNodeStart Custom Action Script
<a name="support-nvidia-imex-p6e-gb200-instance-action-script"></a>

Create an `install_custom_action.sh` custom action which will download the aforementioned prolog script in a shared directory `/opt/slurm/etc/scripts/prolog.d/` which is accessed by Compute Nodes and sets the proper permissions to be executed.

```
#!/bin/bash
set -e

echo "Executing $0"

PROLOG_NVIDIA_IMEX=/opt/slurm/etc/scripts/prolog.d/91_nvidia_imex_prolog.sh
aws s3 cp "s3://<Bucket>/91_nvidia_imex_prolog.sh" "${PROLOG_NVIDIA_IMEX}"
chmod 0755 "${PROLOG_NVIDIA_IMEX}"
```

## Create the cluster
<a name="support-nvidia-imex-p6e-gb200-instance-cluster"></a>

Create a cluster including P6e-GB200 instances. Below you can find an example configuration containing SlurmQueues for Ultraserver type `u-p6e-gb200x72`.

P6e-GB200 is currently only available in Local Zones. Some [Local Zones do not support a NAT Gateway](https://docs.aws.amazon.com/local-zones/latest/ug/local-zones-connectivity-nat.html), so please follow the [Connectivity options for Local Zones](https://docs.aws.amazon.com/local-zones/latest/ug/local-zones-connectivity.html) as ParallelCluster needs [Configuring security groups for restricted environments](security-groups-configuration.md) to connect to AWS Services. Please follow the [Launch instances with Capacity Blocks (CB)](launch-instances-capacity-blocks.md) (AWS ParallelClusterLaunch) as Ultraservers are available only as Capacity Blocks. 

```
HeadNode:
  CustomActions:
    OnNodeStart:
      Script: s3://<s3-bucket-name>/install_custom_action.sh
    S3Access:
      - BucketName: <s3-bucket-name>
  InstanceType: <HeadNode-instance-type>
  Networking:
    SubnetId: <subnet-abcd78901234567890>
  Ssh:
    KeyName: <Key-name>
Image:
  Os: ubuntu2404
Scheduling:
  Scheduler: slurm
  SlurmSettings:
    CustomSlurmSettings:
      - PrologFlags: "Alloc,NoHold"
      - MessageTimeout: 240
  SlurmQueues:
    - CapacityReservationTarget:
        CapacityReservationId: <cr-123456789012345678>
      CapacityType: CAPACITY_BLOCK
      ComputeResources: ### u-p6e-gb200x72
        - DisableSimultaneousMultithreading: true
          Efa:
            Enabled: true
          InstanceType: p6e-gb200.36xlarge  
          MaxCount: 18
          MinCount: 18
          Name: cr1
      Name: q1
      Networking:
        SubnetIds:
          - <subnet-1234567890123456>
```

## Validate IMEX Setup
<a name="support-nvidia-imex-p6e-gb200-instance-validate"></a>

The `91_nvidia_imex_prolog.sh` prolog will run when you submit a Slurm job. Below is an example job to check the status NVIDIA-imex domain.

```
#!/bin/bash
#SBATCH --job-name=nvidia-imex-status-job
#SBATCH --ntasks-per-node=1
#SBATCH --output=slurm-%j.out
#SBATCH --error=slurm-%j.err

QUEUE_NAME="q1"
COMPUTE_RES_NAME="cr1"
IMEX_CONFIG_FILE="/opt/parallelcluster/shared/nvidia-imex/config_${QUEUE_NAME}_${COMPUTE_RES_NAME}.cfg"

srun bash -c "/usr/bin/nvidia-imex-ctl -N -c ${IMEX_CONFIG_FILE} > result_\${SLURM_JOB_ID}_\$(hostname).out 2> result_\${SLURM_JOB_ID}_\$(hostname).err"
```

Check the output of the Job:

```
Connectivity Table Legend:
I - Invalid - Node wasn't reachable, no connection status available
N - Never Connected
R - Recovering - Connection was lost, but clean up has not yet been triggered.
D - Disconnected - Connection was lost, and clean up has been triggreed.
A - Authenticating - If GSSAPI enabled, client has initiated mutual authentication.
!V! - Version mismatch, communication disabled.
!M! - Node map mismatch, communication disabled.
C - Connected - Ready for operation

5/12/2025 06:08:10.580
Nodes:
Node #0   - 172.31.48.81    - READY                - Version: 570.172
Node #1   - 172.31.48.98    - READY                - Version: 570.172
Node #2   - 172.31.48.221   - READY                - Version: 570.172
Node #3   - 172.31.49.228   - READY                - Version: 570.172
Node #4   - 172.31.50.39    - READY                - Version: 570.172
Node #5   - 172.31.50.44    - READY                - Version: 570.172
Node #6   - 172.31.51.66    - READY                - Version: 570.172
Node #7   - 172.31.51.157   - READY                - Version: 570.172
Node #8   - 172.31.52.239   - READY                - Version: 570.172
Node #9   - 172.31.53.80    - READY                - Version: 570.172
Node #10  - 172.31.54.95    - READY                - Version: 570.172
Node #11  - 172.31.54.183   - READY                - Version: 570.172
Node #12  - 172.31.54.203   - READY                - Version: 570.172
Node #13  - 172.31.54.241   - READY                - Version: 570.172
Node #14  - 172.31.55.59    - READY                - Version: 570.172
Node #15  - 172.31.55.187   - READY                - Version: 570.172
Node #16  - 172.31.55.197   - READY                - Version: 570.172
Node #17  - 172.31.56.47    - READY                - Version: 570.172

 Nodes From\To  0   1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17
       0        C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C 
       1        C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C 
       2        C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C
       3        C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C  
       4        C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C 
       5        C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C 
       6        C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C
       7        C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C 
       8        C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C 
       9        C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   
      10        C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   
      11        C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   
      12        C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   
      13        C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   
      14        C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C  
      15        C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C  
      16        C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C  
      17        C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C  

Domain State: UP
```

# Customize compute node network interfaces with launch template overrides
<a name="tutorial-network-customization-v3"></a>

Starting with AWS ParallelCluster 3.15.0, the `LaunchTemplateOverrides` parameter lets you customize the network interfaces of compute nodes by overriding the default network interface configuration with the configuration in a referenced launch template. The entire network interface section of the compute nodes is overwritten by the network interface section of the launch template used to override.

This tutorial walks through an example of overriding the default network configuration of `p6-b300.48xlarge` compute nodes. This customization is useful when you need a specific network interface configuration that differs from what AWS ParallelCluster configures by default. In this example, we configure use case 2 for P6-B300 instances as outlined in the [Amazon EC2 EFA-supported instance types documentation](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-acc-inst-types.html).

**Note**  
It is recommended to use the AWS CLI to create the launch template instead of the console for maximum flexibility.

**Note**  
The launch template should only contain Network Interfaces overrides. AWS ParallelCluster has a validation preventing overriding other parameters.

**Warning**  
If you use the override to configure network interfaces in a way that is not supported by the instance type being used, then the instances will fail to launch.

**Prerequisites**
+ AWS ParallelCluster version 3.15.0 or later [is installed](install-v3-parallelcluster.md).
+ The AWS CLI [is installed and configured.](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)
+ You have an IAM role with the [permissions](iam-roles-in-parallelcluster-v3.md#iam-roles-in-parallelcluster-v3-example-user-policies) that are required to run the [`pcluster`](pcluster-v3.md) CLI.

## Step 1: Create security groups
<a name="tutorial-network-customization-v3-security-groups"></a>

When creating the launch template to use in the override, you must reference a security group. The default AWS ParallelCluster security group for the compute resource does not exist until cluster creation, so you must create a custom security group. This security group must then be referenced by the head node security group to allow traffic between the head node and compute nodes.

If you are updating an existing cluster to customize new capacity, you can use the default AWS ParallelCluster compute node security group in the launch template instead of creating a custom one.

Create the following two security groups:
+ **Head node additional security group** (`sg-1234abcd`):
  + Ingress: all traffic from compute security group
+ **Compute security group** (`sg-abcd1234`):
  + Ingress: all traffic from head node security group
  + Ingress: all traffic from self (compute-to-compute)
  + Egress: default allow-all

## Step 2: Create the launch template
<a name="tutorial-network-customization-v3-launch-template"></a>

Create a launch template that defines the network interface configuration for `p6-b300.48xlarge` compute nodes. For the primary network interface (network card index 0, device index 0), use an ENA (default) network interface. For the remaining network cards, create an EFA-only interface (network card indexes 1-16, device index 0) and an ENA (default) interface (network card indexes 1-16, device index 1).

Run the following AWS CLI command to create the launch template (`lt-123456789`):

```
aws ec2 create-launch-template \
  --region us-east-1 \
  --launch-template-name override-lt \
  --launch-template-data '{
    "NetworkInterfaces": [
      {"NetworkCardIndex":0,  "DeviceIndex":0, "Groups":["sg-abcd1234"], "SubnetId":"subnet-123456789"},
      {"NetworkCardIndex":1,  "DeviceIndex":0, "InterfaceType":"efa-only", "Groups":["sg-abcd1234"], "SubnetId":"subnet-123456789"},
      {"NetworkCardIndex":1,  "DeviceIndex":1, "Groups":["sg-abcd1234"], "SubnetId":"subnet-123456789"},
      {"NetworkCardIndex":2,  "DeviceIndex":0, "InterfaceType":"efa-only", "Groups":["sg-abcd1234"], "SubnetId":"subnet-123456789"},
      {"NetworkCardIndex":2,  "DeviceIndex":1, "Groups":["sg-abcd1234"], "SubnetId":"subnet-123456789"},
      {"NetworkCardIndex":3,  "DeviceIndex":0, "InterfaceType":"efa-only", "Groups":["sg-abcd1234"], "SubnetId":"subnet-123456789"},
      {"NetworkCardIndex":3,  "DeviceIndex":1, "Groups":["sg-abcd1234"], "SubnetId":"subnet-123456789"},
      {"NetworkCardIndex":4,  "DeviceIndex":0, "InterfaceType":"efa-only", "Groups":["sg-abcd1234"], "SubnetId":"subnet-123456789"},
      {"NetworkCardIndex":4,  "DeviceIndex":1, "Groups":["sg-abcd1234"], "SubnetId":"subnet-123456789"},
      {"NetworkCardIndex":5,  "DeviceIndex":0, "InterfaceType":"efa-only", "Groups":["sg-abcd1234"], "SubnetId":"subnet-123456789"},
      {"NetworkCardIndex":5,  "DeviceIndex":1, "Groups":["sg-abcd1234"], "SubnetId":"subnet-123456789"},
      {"NetworkCardIndex":6,  "DeviceIndex":0, "InterfaceType":"efa-only", "Groups":["sg-abcd1234"], "SubnetId":"subnet-123456789"},
      {"NetworkCardIndex":6,  "DeviceIndex":1, "Groups":["sg-abcd1234"], "SubnetId":"subnet-123456789"},
      {"NetworkCardIndex":7,  "DeviceIndex":0, "InterfaceType":"efa-only", "Groups":["sg-abcd1234"], "SubnetId":"subnet-123456789"},
      {"NetworkCardIndex":7,  "DeviceIndex":1, "Groups":["sg-abcd1234"], "SubnetId":"subnet-123456789"},
      {"NetworkCardIndex":8,  "DeviceIndex":0, "InterfaceType":"efa-only", "Groups":["sg-abcd1234"], "SubnetId":"subnet-123456789"},
      {"NetworkCardIndex":8,  "DeviceIndex":1, "Groups":["sg-abcd1234"], "SubnetId":"subnet-123456789"},
      {"NetworkCardIndex":9,  "DeviceIndex":0, "InterfaceType":"efa-only", "Groups":["sg-abcd1234"], "SubnetId":"subnet-123456789"},
      {"NetworkCardIndex":9,  "DeviceIndex":1, "Groups":["sg-abcd1234"], "SubnetId":"subnet-123456789"},
      {"NetworkCardIndex":10, "DeviceIndex":0, "InterfaceType":"efa-only", "Groups":["sg-abcd1234"], "SubnetId":"subnet-123456789"},
      {"NetworkCardIndex":10, "DeviceIndex":1, "Groups":["sg-abcd1234"], "SubnetId":"subnet-123456789"},
      {"NetworkCardIndex":11, "DeviceIndex":0, "InterfaceType":"efa-only", "Groups":["sg-abcd1234"], "SubnetId":"subnet-123456789"},
      {"NetworkCardIndex":11, "DeviceIndex":1, "Groups":["sg-abcd1234"], "SubnetId":"subnet-123456789"},
      {"NetworkCardIndex":12, "DeviceIndex":0, "InterfaceType":"efa-only", "Groups":["sg-abcd1234"], "SubnetId":"subnet-123456789"},
      {"NetworkCardIndex":12, "DeviceIndex":1, "Groups":["sg-abcd1234"], "SubnetId":"subnet-123456789"},
      {"NetworkCardIndex":13, "DeviceIndex":0, "InterfaceType":"efa-only", "Groups":["sg-abcd1234"], "SubnetId":"subnet-123456789"},
      {"NetworkCardIndex":13, "DeviceIndex":1, "Groups":["sg-abcd1234"], "SubnetId":"subnet-123456789"},
      {"NetworkCardIndex":14, "DeviceIndex":0, "InterfaceType":"efa-only", "Groups":["sg-abcd1234"], "SubnetId":"subnet-123456789"},
      {"NetworkCardIndex":14, "DeviceIndex":1, "Groups":["sg-abcd1234"], "SubnetId":"subnet-123456789"},
      {"NetworkCardIndex":15, "DeviceIndex":0, "InterfaceType":"efa-only", "Groups":["sg-abcd1234"], "SubnetId":"subnet-123456789"},
      {"NetworkCardIndex":15, "DeviceIndex":1, "Groups":["sg-abcd1234"], "SubnetId":"subnet-123456789"},
      {"NetworkCardIndex":16, "DeviceIndex":0, "InterfaceType":"efa-only", "Groups":["sg-abcd1234"], "SubnetId":"subnet-123456789"},
      {"NetworkCardIndex":16, "DeviceIndex":1, "Groups":["sg-abcd1234"], "SubnetId":"subnet-123456789"}
    ]
  }'
```

## Step 3: Create the cluster with launch template overrides
<a name="tutorial-network-customization-v3-create-cluster"></a>

Create a cluster configuration that uses the `LaunchTemplateOverrides` parameter to reference the launch template you created.

```
Region: us-east-1
HeadNode:
  InstanceType: c5.xlarge
  Networking:
    SubnetId: subnet-abcdefghi
    AdditionalSecurityGroups:
      # Add the head node SG that allows traffic from the compute node SG
      - sg-1234abcd
...

Scheduling:
  Scheduler: slurm
  SlurmQueues:
  - Name: queue0
    Networking:
      SubnetIds:
        - subnet-123456789
    ComputeResources:
      - Name: compute-resource1
        InstanceType: p6-b300.48xlarge
        Efa:
          Enabled: false # The override replaces all network interface configuration, so this setting is ignored
        LaunchTemplateOverrides:
          LaunchTemplateId: lt-123456789
          Version: 1 # If the launch template is updated, then the new version should be specified here.
```