

本文為英文版的機器翻譯版本，如內容有任何歧義或不一致之處，概以英文版為準。

# AWS PCS CloudFormation 範本的一部分
<a name="get-started-cfn-template-parts"></a>

CloudFormation 範本有 1 個或多個區段，每個區段都用於特定用途。 會在範本中 CloudFormation 定義標準格式、語法和語言。如需詳細資訊，請參閱*AWS CloudFormation 《 使用者指南*》中的[使用 CloudFormation 範本](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/template-guide.html)。

CloudFormation 範本可高度自訂，因此其格式可能會有所不同。若要了解 CloudFormation 範本建立 AWS PCS 叢集的必要部分，建議您檢查我們提供的範例範本，以建立範例叢集。本主題簡短說明該範例範本的章節。

**重要**  
本主題中的程式碼範例**尚未完成**。出現省略符號 (`[...]`) 表示未顯示其他程式碼。若要下載完整的 YAML 格式 CloudFormation 範本，請參閱 [CloudFormation 建立範例 AWS PCS 叢集的 範本](get-started-cfn-sample-templates.md)。

**Contents**
+ [標頭](#get-started-cfn-template-parts-header)
+ [中繼資料](#get-started-cfn-template-parts-metadata)
+ [Parameters](#get-started-cfn-template-parts-parameters)
+ [映射項目](#get-started-cfn-template-parts-mappings)
+ [Resources](#get-started-cfn-template-parts-resources)
+ [輸出](#get-started-cfn-template-parts-outputs)

## 標頭
<a name="get-started-cfn-template-parts-header"></a>

```
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: AWS Parallel Computing Service "getting started" cluster
```

`AWSTemplateFormatVersion` 會識別範本符合的範本格式版本。如需詳細資訊，請參閱*AWS CloudFormation 《 使用者指南*》中的 [CloudFormation 範本格式版本語法](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/format-version-structure.html)。

`Transform` 指定 CloudFormation 用來處理範本的巨集。如需詳細資訊，請參閱*AWS CloudFormation 《 使用者指南*》中的 [CloudFormation 範本轉換一節](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/transform-section-structure.html)。`AWS::Serverless-2016-10-31` 轉換可讓 CloudFormation 處理以 AWS Serverless Application Model (AWS SAM) 語法撰寫的範本。如需詳細資訊，請參閱*AWS CloudFormation 《 使用者指南*》中的[`AWS::Serverless`轉換](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/transform-aws-serverless.html)。

## 中繼資料
<a name="get-started-cfn-template-parts-metadata"></a>

```
### Stack metadata
Metadata:
  AWS::CloudFormation::Interface:
    ParameterGroups:
      - Label:
          default: PCS Cluster configuration
        Parameters:
          - SlurmVersion
          - ManagedAccounting
          - AccountingPolicyEnforcement
      - Label:
          default: PCS ComputeNodeGroups configuration
        Parameters:
          - NodeArchitecture
          - KeyName
          - ClientIpCidr
      - Label:
          default: HPC Recipes configuration
        Parameters:
          - HpcRecipesS3Bucket
          - HpcRecipesBranch
```

CloudFormation 範本的 `metadata`區段提供範本本身的相關資訊。範例範本會建立使用 AWS PCS 的完整高效能運算 (HPC) 叢集。範例範本的中繼資料區段會宣告參數，以控制 如何 CloudFormation 啟動 （佈建） 對應的堆疊。有些參數可控制架構選擇 (`NodeArchitecture`)、Slurm 版本 (`SlurmVersion`) 和存取控制 (`KeyName` 和 `ClientIpCidr`)。

## Parameters
<a name="get-started-cfn-template-parts-parameters"></a>

`Parameters` 本節定義範本的自訂參數。 CloudFormation 使用這些參數定義來建構和驗證您從此範本啟動堆疊時與 互動的格式。

```
Parameters:

  NodeArchitecture:
    Type: String
    Default: x86
    AllowedValues:
      - x86
      - Graviton
    Description: Processor architecture for the login and compute node instances

  SlurmVersion:
    Type: String
    Default: 25.05
    Description: Version of Slurm to use
    AllowedValues:
         - 24.11
         - 25.05

  ManagedAccounting:
    Type: String
    Default: 'disabled'
    AllowedValues:
      - 'enabled'
      - 'disabled'
    Description: Monitor cluster usage, manage access control, and enforce resource limits with Slurm accounting. Requires Slurm 24.11 or newer.

  AccountingPolicyEnforcement:
    Description: Specify which Slurm accounting policies to enforce
    Type: String
    Default: none
    AllowedValues:
      - none
      - 'associations,limits,safe'

  KeyName:
    Description: SSH keypair to log in to the head node
    Type: AWS::EC2::KeyPair::KeyName
    AllowedPattern: ".+"  # Required

  ClientIpCidr:
    Description: IP(s) allowed to access the login node over SSH. We recommend that you restrict it with your own IP/subnet (x.x.x.x/32 for your own ip or x.x.x.x/24 for range. Replace x.x.x.x with your own PUBLIC IP. You can get your public IP using tools such as https://ifconfig.co/)
    Default: 127.0.0.1/32
    Type: String
    AllowedPattern: (\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})/(\d{1,2})
    ConstraintDescription: Value must be a valid IP or network range of the form x.x.x.x/x. 

  HpcRecipesS3Bucket:
    Type: String
    Default: aws-hpc-recipes
    Description: HPC Recipes for AWS S3 bucket
    AllowedValues:
         - aws-hpc-recipes
         - aws-hpc-recipes-dev
  HpcRecipesBranch:
    Type: String
    Default: main
    Description: HPC Recipes for AWS release branch
    AllowedPattern: '^(?!.*/\.git$)(?!.*/\.)(?!.*\\.\.)[a-zA-Z0-9-_\.]+$'
```

## 映射項目
<a name="get-started-cfn-template-parts-mappings"></a>

`Mappings` 區段定義鍵/值對，根據特定條件或相依性指定值。

```
Mappings:

  Architecture:
    AmiArchParameter:
      Graviton: arm64
      x86: x86_64
    LoginNodeInstances:
      Graviton: c7g.xlarge
      x86: c6i.xlarge
    ComputeNodeInstances:
      Graviton: c7g.xlarge
      x86: c6i.xlarge
```

## Resources
<a name="get-started-cfn-template-parts-resources"></a>

`Resources` 區段會宣告要佈建和設定作為堆疊一部分 AWS 的資源。

```
Resources:

  [...]
```

範本會以 layer 形式佈建範例叢集基礎設施。它從 VPC 組態`Networking`的 開始。儲存由雙系統提供： `EfsStorage` 用於共用儲存， `FSxLStorage`用於高效能儲存。核心叢集是透過 建立`PCSCluster`。

```
  Networking:
    Type: AWS::CloudFormation::Stack
    Properties:
      Parameters:
        ProvisionSubnetsC: "False"
      TemplateURL: !Sub 'https://${HpcRecipesS3Bucket}.s3.amazonaws.com/${HpcRecipesBranch}/recipes/net/hpc_large_scale/assets/main.yaml'

  EfsStorage:
    Type: AWS::CloudFormation::Stack
    Properties:
      Parameters:
        SubnetIds: !GetAtt [ Networking, Outputs.DefaultPrivateSubnet ]
        SubnetCount: 1
        VpcId: !GetAtt [ Networking, Outputs.VPC ]
      TemplateURL: !Sub 'https://${HpcRecipesS3Bucket}.s3.amazonaws.com/${HpcRecipesBranch}/recipes/storage/efs_simple/assets/main.yaml'

  FSxLStorage:
    Type: AWS::CloudFormation::Stack
    Properties:
      Parameters:
        PerUnitStorageThroughput: 125
        SubnetId: !GetAtt [ Networking, Outputs.DefaultPrivateSubnet ]
        VpcId: !GetAtt [ Networking, Outputs.VPC ]
      TemplateURL: !Sub 'https://${HpcRecipesS3Bucket}.s3.amazonaws.com/${HpcRecipesBranch}/recipes/storage/fsx_lustre/assets/persistent.yaml'

  [...]
  
  # Cluster
  PCSCluster:
    Type: AWS::PCS::Cluster
    Properties:
      Name: !Sub '${AWS::StackName}'
      Size: SMALL
      Scheduler:
        Type: SLURM
        Version: !Ref SlurmVersion
      Networking:
        SubnetIds:
          - !GetAtt [ Networking, Outputs.DefaultPrivateSubnet ]
        SecurityGroupIds:
          - !GetAtt [ PCSSecurityGroup, Outputs.ClusterSecurityGroupId ]
```

對於運算資源，範本會建立兩個節點群組：`PCSNodeGroupLogin`適用於單一登入節點，以及`PCSNodeGroupCompute`適用於最多四個運算節點。`PCSInstanceProfile` 許可和`PCSLaunchTemplate`執行個體組態支援這些節點群組。

```
  # Compute Node groups
  PCSInstanceProfile:
    Type: AWS::CloudFormation::Stack
    Properties:
      Parameters:
        # We have to regionalize this in case CX use the template in more than one region. Otherwise,
        # the create action will fail since instance-role-${AWS::StackName} already exists!
        RoleName: !Sub '${AWS::StackName}-${AWS::Region}'
      TemplateURL: !Sub 'https://${HpcRecipesS3Bucket}.s3.amazonaws.com/${HpcRecipesBranch}/recipes/pcs/getting_started/assets/pcs-iip-minimal.yaml'

  PCSLaunchTemplate:
    Type: AWS::CloudFormation::Stack
    Properties:
      Parameters:
        VpcDefaultSecurityGroupId: !GetAtt [ Networking, Outputs.SecurityGroup ]
        ClusterSecurityGroupId: !GetAtt [ PCSSecurityGroup, Outputs.ClusterSecurityGroupId ]
        SshSecurityGroupId: !GetAtt [ PCSSecurityGroup, Outputs.InboundSshSecurityGroupId ]
        EfsFilesystemSecurityGroupId: !GetAtt [ EfsStorage, Outputs.SecurityGroupId ]
        FSxLustreFilesystemSecurityGroupId: !GetAtt [ FSxLStorage, Outputs.FSxLustreSecurityGroupId ]
        SshKeyName: !Ref KeyName
        EfsFilesystemId: !GetAtt [ EfsStorage, Outputs.EFSFilesystemId ]
        FSxLustreFilesystemId: !GetAtt [ FSxLStorage, Outputs.FSxLustreFilesystemId ]
        FSxLustreFilesystemMountName: !GetAtt [ FSxLStorage, Outputs.FSxLustreMountName ]
      TemplateURL: !Sub 'https://${HpcRecipesS3Bucket}.s3.amazonaws.com/${HpcRecipesBranch}/recipes/pcs/getting_started/assets/cfn-pcs-lt-efs-fsxl.yaml'

  # Compute Node groups - Login Nodes
  PCSNodeGroupLogin:
    Type: AWS::PCS::ComputeNodeGroup
    Properties:
      ClusterId: !GetAtt [PCSCluster, Id]
      Name: login
      ScalingConfiguration:
        MinInstanceCount: 1
        MaxInstanceCount: 1
      IamInstanceProfileArn: !GetAtt [ PCSInstanceProfile, Outputs.InstanceProfileArn ]
      CustomLaunchTemplate:
        TemplateId: !GetAtt [ PCSLaunchTemplate, Outputs.LoginLaunchTemplateId ]
        Version: 1
      SubnetIds:
        - !GetAtt [ Networking, Outputs.DefaultPublicSubnet ]
      AmiId: !GetAtt [PcsSampleAmi, AmiId]
      InstanceConfigs:
        - InstanceType: !FindInMap [ Architecture, LoginNodeInstances, !Ref NodeArchitecture ]

  # Compute Node groups - Compute Nodes
  PCSNodeGroupCompute:
    Type: AWS::PCS::ComputeNodeGroup
    Properties:
      ClusterId: !GetAtt [PCSCluster, Id]
      Name: compute-1
      ScalingConfiguration:
        MinInstanceCount: 0
        MaxInstanceCount: 4
      IamInstanceProfileArn: !GetAtt [ PCSInstanceProfile, Outputs.InstanceProfileArn ]
      CustomLaunchTemplate:
        TemplateId: !GetAtt [ PCSLaunchTemplate, Outputs.ComputeLaunchTemplateId ]
        Version: 1
      SubnetIds:
        - !GetAtt [ Networking, Outputs.DefaultPrivateSubnet ]
      AmiId: !GetAtt [PcsSampleAmi, AmiId]
      InstanceConfigs:
        - InstanceType: !FindInMap [ Architecture, ComputeNodeInstances, !Ref NodeArchitecture ]
```

任務排程是透過 處理`PCSQueueCompute`。

```
  PCSQueueCompute:
    Type: AWS::PCS::Queue
    Properties:
      ClusterId: !GetAtt [PCSCluster, Id]
      Name: demo
      ComputeNodeGroupConfigurations:
        - ComputeNodeGroupId: !GetAtt [PCSNodeGroupCompute, Id]
```

AMI 選擇會透過 PcsAMILookupFn Lambda 函數和相關資源自動執行。

```
        
  PcsAMILookupRole:
    Type: AWS::IAM::Role
    [...]
    
  PcsAMILookupFn:
    Type: AWS::Lambda::Function
    Properties:
      Runtime: python3.12
      Handler: index.handler
      Role: !GetAtt PcsAMILookupRole.Arn
      Code:
        [...]
      Timeout: 30
      MemorySize: 128

  # Example of using the custom resource to look up an AMI
  PcsSampleAmi:
    Type: Custom::AMILookup
    Properties:
      ServiceToken: !GetAtt PcsAMILookupFn.Arn
      OperatingSystem: 'amzn2'
      Architecture: !FindInMap [ Architecture, AmiArchParameter, !Ref NodeArchitecture ]
      SlurmVersion: !Ref SlurmVersion
```

## 輸出
<a name="get-started-cfn-template-parts-outputs"></a>

範本會透過 `ClusterId`、 和 輸出叢集識別和管理 URLs`PcsConsoleUrl``Ec2ConsoleUrl`。

```
Outputs:
  ClusterId:
    Description: The Id of the PCS cluster
    Value: !GetAtt [ PCSCluster, Id ]
    
  PcsConsoleUrl:
    Description: URL to access the cluster in the PCS console
    Value: !Sub
      - https://${ConsoleDomain}/pcs/home?region=${AWS::Region}#/clusters/${ClusterId}
      - { ConsoleDomain: !If [ GovCloud, 'console.amazonaws-us-gov.com', !If [ China, 'console.amazonaws.cn', !Sub '${AWS::Region}.console.aws.amazon.com']],
          ClusterId: !GetAtt [ PCSCluster, Id ] 
        }
    Export:
      Name: !Sub ${AWS::StackName}-PcsConsoleUrl
      
  Ec2ConsoleUrl:
    Description: URL to access instance(s) in the login node group via Session Manager
    Value: !Sub
      - https://${ConsoleDomain}/ec2/home?region=${AWS::Region}#Instances:instanceState=running;tag:aws:pcs:compute-node-group-id=${NodeGroupLoginId}
      - { ConsoleDomain: !If [ GovCloud, 'console.amazonaws-us-gov.com', !If [ China, 'console.amazonaws.cn', !Sub '${AWS::Region}.console.aws.amazon.com']],
          NodeGroupLoginId: !GetAtt [ PCSNodeGroupLogin, Id ] 
        }
    Export:
      Name: !Sub ${AWS::StackName}-Ec2ConsoleUrl
```