

本文属于机器翻译版本。若本译文内容与英语原文存在差异，则一律以英文原文为准。

# AWS Identity and Access Management 对于 SageMaker HyperPod
<a name="sagemaker-hyperpod-prerequisites-iam"></a>

AWS Identity and Access Management (IAM) 是一项 AWS 服务，可帮助管理员安全地控制对 AWS 资源的访问。IAM 管理员控制谁可以通过*身份验证*（登录）和*授权*（具有权限）使用 Amazon EKS 资源。IAM 是一项无需额外付费即可使用的 AWS 服务。

**重要**  
允许 Amazon SageMaker Studio 或 Amazon SageMaker Studio Classic 创建亚马逊 SageMaker资源的自定义 IAM 策略还必须授予向这些资源添加标签的权限。之所以需要为资源添加标签的权限，是因为 Studio 和 Studio Classic 会自动为创建的任何资源添加标签。如果 IAM 策略允许 Studio 和 Studio Classic 创建资源但不允许标记，则在尝试创建资源时可能会出现 AccessDenied “” 错误。有关更多信息，请参阅 [提供标记 A SageMaker I 资源的权限](security_iam_id-based-policy-examples.md#grant-tagging-permissions)。  
[AWS 亚马逊 A SageMaker I 的托管策略](security-iam-awsmanpol.md)授予创建 SageMaker 资源的权限已经包括在创建这些资源时添加标签的权限。

假设 SageMaker HyperPod 用户主要分为两层：*集群管理员用户*和*数据科学家用户*。
+ **集群管理员用户**-负责创建和管理 SageMaker HyperPod 集群。这包括配置 HyperPod 集群和管理用户对集群的访问权限。
  + 使用 Slurm 或 Amazon EKS 创建和配置 SageMaker HyperPod 集群。
  + 为数据科学家用户和 HyperPod 集群资源创建和配置 IAM 角色。
  + 要使用 Amazon EKS 进行 SageMaker HyperPod 编排，请创建和配置 [EKS 访问条目](https://docs.aws.amazon.com/eks/latest/userguide/access-entries.html)、[基于角色的访问控制 (RBAC)](sagemaker-hyperpod-eks-setup-rbac.md) 和 Pod Identity 以满足数据科学用例。
+ **数据科学家用户**：专注于 ML 模型训练。他们使用开源协调器或 CL SageMaker HyperPod I 来提交和管理训练作业。
  + 假设并使用集群管理员用户提供的 IAM 角色。
  + 与 SageMaker HyperPod（Slurm 或 Kubernetes） CLIs 支持的开源协调器或 SageMaker HyperPod CLI 进行交互，以检查集群容量、连接到集群并提交工作负载。

通过附加正确的权限或策略来操作 SageMaker HyperPod 集群，为集群管理员设置 IAM 角色。集群管理员还应创建 IAM 角色以提供给 SageMaker HyperPod 资源，让其假设运行并与必要的 AWS 资源（例如 Amazon S3、Amazon 和 AWS Systems Manager (SSM)）进行通信。 CloudWatch最后， AWS 账户管理员或集群管理员应向科学家授予访问集 SageMaker HyperPod 群和运行机器学习工作负载的权限。

根据您选择的编排工具，集群管理员和科学家所需的权限可能会有所不同。您还可以使用每个服务的条件键来控制角色中各种操作的权限范围。使用以下服务授权参考为相关服务添加详细范围 SageMaker HyperPod。
+ [Amazon Elastic Compute Cloud](https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazonec2.html)
+ [亚马逊弹性容器注册表](https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazonelasticcontainerregistry.html)（用于使用 Amazon EKS 进行 SageMaker HyperPod 集群编排）
+ [亚马逊 Elastic Kubernetes Service（用于使用亚马逊 EK](https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazonelastickubernetesservice.html) S 进行 SageMaker HyperPod 集群编排）
+ [Amazon FSx](https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazonfsx.html)
+ [AWS IAM 身份中心（ AWS 单点登录的继任者）](https://docs.aws.amazon.com/service-authorization/latest/reference/list_awsiamidentitycentersuccessortoawssinglesign-on.html)
+ [AWS Identity and Access Management (IAM)](https://docs.aws.amazon.com/service-authorization/latest/reference/list_awsidentityandaccessmanagementiam.html)
+ [Amazon Simple Storage Service](https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazons3.html)
+ [亚马逊 SageMaker AI](https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazonsagemaker.html)
+ [AWS Systems Manager](https://docs.aws.amazon.com/service-authorization/latest/reference/list_awssystemsmanager.html)

**Topics**
+ [创建集群所需的 IAM 权限](#sagemaker-hyperpod-prerequisites-iam-cluster-creation)
+ [集群管理员的 IAM 用户](#sagemaker-hyperpod-prerequisites-iam-cluster-admin)
+ [科学家的 IAM 用户](#sagemaker-hyperpod-prerequisites-iam-cluster-user)
+ [的 IAM 角色适用于 SageMaker HyperPod](#sagemaker-hyperpod-prerequisites-iam-role-for-hyperpod)

## 创建集群所需的 IAM 权限
<a name="sagemaker-hyperpod-prerequisites-iam-cluster-creation"></a>

创建 HyperPod 集群需要以下策略示例中概述的 IAM 权限。如果您 AWS 账户 拥有[https://docs.aws.amazon.com//aws-managed-policy/latest/reference/AdministratorAccess.html](https://docs.aws.amazon.com//aws-managed-policy/latest/reference/AdministratorAccess.html)权限，则默认情况下会授予这些权限。

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "sagemaker:CreateCluster",
                "sagemaker:DeleteCluster",
                "sagemaker:UpdateCluster"
            ],
            "Resource": "arn:aws:sagemaker:*:*:cluster/*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "sagemaker:AddTags"
            ],
            "Resource": "arn:aws:sagemaker:*:*:cluster/*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "sagemaker:ListTags",
                "sagemaker:ListClusters",
                "sagemaker:ListClusterNodes",
                "sagemaker:ListComputeQuotas",
                "sagemaker:ListTrainingPlans",
                "sagemaker:DescribeCluster",
                "sagemaker:DescribeClusterNode"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "cloudformation:CreateStack",
                "cloudformation:UpdateStack",
                "cloudformation:DeleteStack",
                "cloudformation:ContinueUpdateRollback",
                "cloudformation:SetStackPolicy",
                "cloudformation:ValidateTemplate",
                "cloudformation:DescribeStacks",
                "cloudformation:DescribeStackEvents",
                "cloudformation:Get*",
                "cloudformation:List*"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::*:role/sagemaker-*",
            "Condition": {
                "StringEquals": {
                    "iam:PassedToService": [
                        "sagemaker.amazonaws.com",
                        "eks.amazonaws.com",
                        "lambda.amazonaws.com"
                    ]
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": [
                "iam:PassRole",
                "iam:GetRole"
            ],
            "Resource": "arn:aws:iam::*:role/*",
            "Condition": {
                "StringEquals": {
                    "iam:PassedToService": [
                        "sagemaker.amazonaws.com",
                        "eks.amazonaws.com",
                        "lambda.amazonaws.com",
                        "cloudformation.amazonaws.com"
                    ]
                }
            }
        },
        {
            "Sid": "AmazonVPCFullAccess",
            "Effect": "Allow",
            "Action": [
                "ec2:AcceptVpcPeeringConnection",
                "ec2:AcceptVpcEndpointConnections",
                "ec2:AllocateAddress",
                "ec2:AssignIpv6Addresses",
                "ec2:AssignPrivateIpAddresses",
                "ec2:AssociateAddress",
                "ec2:AssociateDhcpOptions",
                "ec2:AssociateRouteTable",
                "ec2:AssociateSecurityGroupVpc",
                "ec2:AssociateSubnetCidrBlock",
                "ec2:AssociateVpcCidrBlock",
                "ec2:AttachClassicLinkVpc",
                "ec2:AttachInternetGateway",
                "ec2:AttachNetworkInterface",
                "ec2:AttachVpnGateway",
                "ec2:AuthorizeSecurityGroupEgress",
                "ec2:AuthorizeSecurityGroupIngress",
                "ec2:CreateCarrierGateway",
                "ec2:CreateCustomerGateway",
                "ec2:CreateDefaultSubnet",
                "ec2:CreateDefaultVpc",
                "ec2:CreateDhcpOptions",
                "ec2:CreateEgressOnlyInternetGateway",
                "ec2:CreateFlowLogs",
                "ec2:CreateInternetGateway",
                "ec2:CreateLocalGatewayRouteTableVpcAssociation",
                "ec2:CreateNatGateway",
                "ec2:CreateNetworkAcl",
                "ec2:CreateNetworkAclEntry",
                "ec2:CreateNetworkInterface",
                "ec2:CreateNetworkInterfacePermission",
                "ec2:CreateRoute",
                "ec2:CreateRouteTable",
                "ec2:CreateSecurityGroup",
                "ec2:CreateSubnet",
                "ec2:CreateTags",
                "ec2:CreateVpc",
                "ec2:CreateVpcEndpoint",
                "ec2:CreateVpcEndpointConnectionNotification",
                "ec2:CreateVpcEndpointServiceConfiguration",
                "ec2:CreateVpcPeeringConnection",
                "ec2:CreateVpnConnection",
                "ec2:CreateVpnConnectionRoute",
                "ec2:CreateVpnGateway",
                "ec2:DeleteCarrierGateway",
                "ec2:DeleteCustomerGateway",
                "ec2:DeleteDhcpOptions",
                "ec2:DeleteEgressOnlyInternetGateway",
                "ec2:DeleteFlowLogs",
                "ec2:DeleteInternetGateway",
                "ec2:DeleteLocalGatewayRouteTableVpcAssociation",
                "ec2:DeleteNatGateway",
                "ec2:DeleteNetworkAcl",
                "ec2:DeleteNetworkAclEntry",
                "ec2:DeleteNetworkInterface",
                "ec2:DeleteNetworkInterfacePermission",
                "ec2:DeleteRoute",
                "ec2:DeleteRouteTable",
                "ec2:DeleteSecurityGroup",
                "ec2:DeleteSubnet",
                "ec2:DeleteTags",
                "ec2:DeleteVpc",
                "ec2:DeleteVpcEndpoints",
                "ec2:DeleteVpcEndpointConnectionNotifications",
                "ec2:DeleteVpcEndpointServiceConfigurations",
                "ec2:DeleteVpcPeeringConnection",
                "ec2:DeleteVpnConnection",
                "ec2:DeleteVpnConnectionRoute",
                "ec2:DeleteVpnGateway",
                "ec2:DescribeAccountAttributes",
                "ec2:DescribeAddresses",
                "ec2:DescribeAvailabilityZones",
                "ec2:DescribeCarrierGateways",
                "ec2:DescribeClassicLinkInstances",
                "ec2:DescribeCustomerGateways",
                "ec2:DescribeDhcpOptions",
                "ec2:DescribeEgressOnlyInternetGateways",
                "ec2:DescribeFlowLogs",
                "ec2:DescribeInstances",
                "ec2:DescribeInternetGateways",
                "ec2:DescribeIpv6Pools",
                "ec2:DescribeLocalGatewayRouteTables",
                "ec2:DescribeLocalGatewayRouteTableVpcAssociations",
                "ec2:DescribeKeyPairs",
                "ec2:DescribeMovingAddresses",
                "ec2:DescribeNatGateways",
                "ec2:DescribeNetworkAcls",
                "ec2:DescribeNetworkInterfaceAttribute",
                "ec2:DescribeNetworkInterfacePermissions",
                "ec2:DescribeNetworkInterfaces",
                "ec2:DescribePrefixLists",
                "ec2:DescribeRouteTables",
                "ec2:DescribeSecurityGroupReferences",
                "ec2:DescribeSecurityGroupRules",
                "ec2:DescribeSecurityGroups",
                "ec2:DescribeSecurityGroupVpcAssociations",
                "ec2:DescribeStaleSecurityGroups",
                "ec2:DescribeSubnets",
                "ec2:DescribeTags",
                "ec2:DescribeVpcAttribute",
                "ec2:DescribeVpcClassicLink",
                "ec2:DescribeVpcClassicLinkDnsSupport",
                "ec2:DescribeVpcEndpointConnectionNotifications",
                "ec2:DescribeVpcEndpointConnections",
                "ec2:DescribeVpcEndpoints",
                "ec2:DescribeVpcEndpointServiceConfigurations",
                "ec2:DescribeVpcEndpointServicePermissions",
                "ec2:DescribeVpcEndpointServices",
                "ec2:DescribeVpcPeeringConnections",
                "ec2:DescribeVpcs",
                "ec2:DescribeVpnConnections",
                "ec2:DescribeVpnGateways",
                "ec2:DetachClassicLinkVpc",
                "ec2:DetachInternetGateway",
                "ec2:DetachNetworkInterface",
                "ec2:DetachVpnGateway",
                "ec2:DisableVgwRoutePropagation",
                "ec2:DisableVpcClassicLink",
                "ec2:DisableVpcClassicLinkDnsSupport",
                "ec2:DisassociateAddress",
                "ec2:DisassociateRouteTable",
                "ec2:DisassociateSecurityGroupVpc",
                "ec2:DisassociateSubnetCidrBlock",
                "ec2:DisassociateVpcCidrBlock",
                "ec2:EnableVgwRoutePropagation",
                "ec2:EnableVpcClassicLink",
                "ec2:EnableVpcClassicLinkDnsSupport",
                "ec2:GetSecurityGroupsForVpc",
                "ec2:ModifyNetworkInterfaceAttribute",
                "ec2:ModifySecurityGroupRules",
                "ec2:ModifySubnetAttribute",
                "ec2:ModifyVpcAttribute",
                "ec2:ModifyVpcEndpoint",
                "ec2:ModifyVpcEndpointConnectionNotification",
                "ec2:ModifyVpcEndpointServiceConfiguration",
                "ec2:ModifyVpcEndpointServicePermissions",
                "ec2:ModifyVpcPeeringConnectionOptions",
                "ec2:ModifyVpcTenancy",
                "ec2:MoveAddressToVpc",
                "ec2:RejectVpcEndpointConnections",
                "ec2:RejectVpcPeeringConnection",
                "ec2:ReleaseAddress",
                "ec2:ReplaceNetworkAclAssociation",
                "ec2:ReplaceNetworkAclEntry",
                "ec2:ReplaceRoute",
                "ec2:ReplaceRouteTableAssociation",
                "ec2:ResetNetworkInterfaceAttribute",
                "ec2:RestoreAddressToClassic",
                "ec2:RevokeSecurityGroupEgress",
                "ec2:RevokeSecurityGroupIngress",
                "ec2:UnassignIpv6Addresses",
                "ec2:UnassignPrivateIpAddresses",
                "ec2:UpdateSecurityGroupRuleDescriptionsEgress",
                "ec2:UpdateSecurityGroupRuleDescriptionsIngress"
            ],
            "Resource": "*"
        },
        {
            "Sid": "CloudWatchPermissions",
            "Effect": "Allow",
            "Action": [
                "cloudwatch:*",
                "logs:*",
                "sns:CreateTopic",
                "sns:ListSubscriptions",
                "sns:ListSubscriptionsByTopic",
                "sns:ListTopics",
                "sns:Subscribe",
                "iam:GetPolicy",
                "iam:GetPolicyVersion",
                "iam:GetRole",
                "oam:ListSinks",
                "rum:*",
                "synthetics:*",
                "xray:*"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:CreateBucket",
                "s3:DeleteBucket",
                "s3:PutBucketPolicy",
                "s3:PutBucketTagging",
                "s3:PutBucketPublicAccessBlock",
                "s3:PutBucketLogging",
                "s3:DeleteBucketPolicy",
                "s3:PutObject",
                "s3:DeleteObject",
                "s3:PutEncryptionConfiguration",
                "s3:AbortMultipartUpload",
                "s3:Get*",
                "s3:List*"
            ],
            "Resource": [
                "arn:aws:s3:::*",
                "arn:aws:s3:::*/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "eks:CreateCluster",
                "eks:DeleteCluster",
                "eks:CreateNodegroup",
                "eks:DeleteNodegroup",
                "eks:UpdateNodegroupConfig",
                "eks:UpdateNodegroupVersion",
                "eks:UpdateClusterConfig",
                "eks:UpdateClusterVersion",
                "eks:CreateFargateProfile",
                "eks:DeleteFargateProfile",
                "eks:CreateAddon",
                "eks:DeleteAddon",
                "eks:UpdateAddon",
                "eks:CreateAccessEntry",
                "eks:DeleteAccessEntry",
                "eks:UpdateAccessEntry",
                "eks:AssociateAccessPolicy",
                "eks:AssociateIdentityProviderConfig",
                "eks:DisassociateIdentityProviderConfig",
                "eks:TagResource",
                "eks:UntagResource",
                "eks:AccessKubernetesApi",
                "eks:Describe*",
                "eks:List*"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ssm:GetParameter",
                "ssm:PutParameter",
                "ssm:DeleteParameter",
                "ssm:DescribeParameters"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "kms:Decrypt",
                "kms:GenerateDataKey"
            ],
            "Resource": "*",
            "Condition": {
                "StringLike": {
                    "kms:ViaService": [
                        "sagemaker.*.amazonaws.com",
                        "ec2.*.amazonaws.com",
                        "s3.*.amazonaws.com",
                        "eks.*.amazonaws.com"
                    ]
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": [
                "lambda:CreateFunction",
                "lambda:DeleteFunction",
                "lambda:GetFunction",
                "lambda:UpdateFunctionCode",
                "lambda:UpdateFunctionConfiguration",
                "lambda:AddPermission",
                "lambda:RemovePermission",
                "lambda:PublishLayerVersion",
                "lambda:DeleteLayerVersion",
                "lambda:InvokeFunction",
                "lambda:Get*",
                "lambda:List*",
                "lambda:TagResource"
            ],
            "Resource": [
                "arn:aws:lambda:*:*:function:*",
                "arn:aws:lambda:*:*:layer:*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "iam:DeleteRole",
                "iam:DeleteRolePolicy"
            ],
            "Resource": [
                "arn:aws:iam::*:role/*sagemaker*",
                "arn:aws:iam::*:role/*eks*",
                "arn:aws:iam::*:role/*hyperpod*",
                "arn:aws:iam::*:policy/*sagemaker*",
                "arn:aws:iam::*:policy/*hyperpod*",
                "arn:aws:iam::*:role/*LifeCycleScriptStack*",
                "arn:aws:iam::*:role/*LifeCycleScript*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "iam:CreateRole",
                "iam:TagRole",
                "iam:PutRolePolicy",
                "iam:Get*",
                "iam:List*",
                "iam:AttachRolePolicy",
                "iam:DetachRolePolicy"
            ],
            "Resource": [
                "arn:aws:iam::*:role/*",
                "arn:aws:iam::*:policy/*"
            ]
        },
        {
            "Sid": "FullAccessToFSx",
            "Effect": "Allow",
            "Action": [
                "fsx:AssociateFileGateway",
                "fsx:AssociateFileSystemAliases",
                "fsx:CancelDataRepositoryTask",
                "fsx:CopyBackup",
                "fsx:CopySnapshotAndUpdateVolume",
                "fsx:CreateAndAttachS3AccessPoint",
                "fsx:CreateBackup",
                "fsx:CreateDataRepositoryAssociation",
                "fsx:CreateDataRepositoryTask",
                "fsx:CreateFileCache",
                "fsx:CreateFileSystem",
                "fsx:CreateFileSystemFromBackup",
                "fsx:CreateSnapshot",
                "fsx:CreateStorageVirtualMachine",
                "fsx:CreateVolume",
                "fsx:CreateVolumeFromBackup",
                "fsx:DetachAndDeleteS3AccessPoint",
                "fsx:DeleteBackup",
                "fsx:DeleteDataRepositoryAssociation",
                "fsx:DeleteFileCache",
                "fsx:DeleteFileSystem",
                "fsx:DeleteSnapshot",
                "fsx:DeleteStorageVirtualMachine",
                "fsx:DeleteVolume",
                "fsx:DescribeAssociatedFileGateways",
                "fsx:DescribeBackups",
                "fsx:DescribeDataRepositoryAssociations",
                "fsx:DescribeDataRepositoryTasks",
                "fsx:DescribeFileCaches",
                "fsx:DescribeFileSystemAliases",
                "fsx:DescribeFileSystems",
                "fsx:DescribeS3AccessPointAttachments",
                "fsx:DescribeSharedVpcConfiguration",
                "fsx:DescribeSnapshots",
                "fsx:DescribeStorageVirtualMachines",
                "fsx:DescribeVolumes",
                "fsx:DisassociateFileGateway",
                "fsx:DisassociateFileSystemAliases",
                "fsx:ListTagsForResource",
                "fsx:ManageBackupPrincipalAssociations",
                "fsx:ReleaseFileSystemNfsV3Locks",
                "fsx:RestoreVolumeFromSnapshot",
                "fsx:TagResource",
                "fsx:UntagResource",
                "fsx:UpdateDataRepositoryAssociation",
                "fsx:UpdateFileCache",
                "fsx:UpdateFileSystem",
                "fsx:UpdateSharedVpcConfiguration",
                "fsx:UpdateSnapshot",
                "fsx:UpdateStorageVirtualMachine",
                "fsx:UpdateVolume"
            ],
            "Resource": "*"
        }
    ]
}
```

------

## 集群管理员的 IAM 用户
<a name="sagemaker-hyperpod-prerequisites-iam-cluster-admin"></a>

集群管理员（管理员）操作和配置 SageMaker HyperPod 集群，执行中的任务。[SageMaker HyperPod Slurm 集群操作](sagemaker-hyperpod-operate-slurm.md)以下策略示例包括集群管理员在您的 AWS 账户中运行 SageMaker HyperPod 核心 APIs 和管理 SageMaker HyperPod 集群的最低权限集。

**注意**  
具有集群管理员角色的 IAM 用户在管理专门针对`CreateCluster`和`UpdateCluster`操作的 SageMaker HyperPod 集群资源时，可以使用条件键来提供精细的访问控制。要查找这些操作支持的条件键，请在 [ SageMaker AI 定义的操作`UpdateCluster`](https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazonsagemaker.html#amazonsagemaker-actions-as-permissions)中搜索`CreateCluster`或。

------
#### [ Slurm ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "sagemaker:CreateCluster",
                "sagemaker:ListClusters"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "sagemaker:DeleteCluster",
                "sagemaker:DescribeCluster",
                "sagemaker:DescribeClusterNode",
                "sagemaker:ListClusterNodes",
                "sagemaker:UpdateCluster",
                "sagemaker:UpdateClusterSoftware",
                "sagemaker:BatchDeleteClusterNodes"
            ],
            "Resource": "arn:aws:sagemaker:us-east-1:111122223333:cluster/*"
        }
    ]
}
```

------
#### [ Amazon EKS ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::111122223333:role/execution-role-name"
        },
        {
            "Effect": "Allow",
            "Action": [
                "sagemaker:CreateCluster",
                "sagemaker:DeleteCluster",
                "sagemaker:DescribeCluster",
                "sagemaker:DescribeClusterNode",
                "sagemaker:ListClusterNodes",
                "sagemaker:ListClusters",
                "sagemaker:UpdateCluster",
                "sagemaker:UpdateClusterSoftware",
                "sagemaker:BatchAddClusterNodes",
                "sagemaker:BatchDeleteClusterNodes",
                "sagemaker:ListComputeQuotas",
                "sagemaker:ListClusterSchedulerConfigs",
                "sagemaker:DeleteClusterSchedulerConfig",
                "sagemaker:DeleteComputeQuota",
                "eks:DescribeCluster",
                "eks:CreateAccessEntry",
                "eks:DescribeAccessEntry",
                "eks:DeleteAccessEntry",
                "eks:AssociateAccessPolicy",
                "iam:CreateServiceLinkedRole"
            ],
            "Resource": "*"
        }
    ]
}
```

------

要授予访问 SageMaker AI 控制台的权限，请使用使用 A [mazon A SageMaker I 控制台所需的权限中](https://docs.aws.amazon.com/sagemaker/latest/dg/security_iam_id-based-policy-examples.html#console-permissions)提供的示例策略。

要授予访问 Amazon EC2 Systems Manager 控制台的权限，请[使用* AWS Systems Manager 用户指南*中使用 AWS Systems Manager 控制台](https://docs.aws.amazon.com/systems-manager/latest/userguide/security_iam_id-based-policy-examples.html#security_iam_id-based-policy-examples-console)中提供的示例策略。

您也可以考虑将[`AmazonSageMakerFullAccess`](security-iam-awsmanpol.md#security-iam-awsmanpol-AmazonSageMakerFullAccess)策略附加到该角色；但是，请注意，该`AmazonSageMakerFullAccess`策略授予对整个 SageMaker API 调用、功能和资源的权限。

有关 IAM 用户的一般指导，请参阅*《AWS Identity and Access Management 用户指南》*中的 [IAM 用户](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users.html)。

## 科学家的 IAM 用户
<a name="sagemaker-hyperpod-prerequisites-iam-cluster-user"></a>

科学家在集群管理员配置的 SageMaker HyperPod 集群节点上登录并运行机器学习工作负载。对于您 AWS 账户中的科学家，您应授`"ssm:StartSession"`予运行 SSM `start-session` 命令的权限。以下是 IAM 用户的策略示例。

------
#### [ Slurm ]

添加以下策略，授予 SSM 会话连接到所有资源的 SSM 目标的权限。这允许您访问 HyperPod集群。

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	             
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ssm:StartSession",
                "ssm:TerminateSession"
            ],
            "Resource": "*"    
        }
    ]
}
```

------

------
#### [ Amazon EKS ]

授予以下 IAM 角色权限供数据科学家运行`hyperpod list-clusters`和 HyperPod CLI 命令之间的`hyperpod connect-cluster`命令。要了解有关 HyperPod CLI 的更多信息，请参阅[在 Amazon EKS 编排的 SageMaker HyperPod 集群上运行作业](sagemaker-hyperpod-eks-run-jobs.md)。它还包括 SSM 会话权限，以便连接到所有资源的 SSM 目标。这允许您访问 HyperPod 集群。

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "DescribeHyerpodClusterPermissions",
            "Effect": "Allow",
            "Action": [
                "sagemaker:DescribeCluster"
            ],
            "Resource": "arn:aws:sagemaker:us-east-2:111122223333:cluster/hyperpod-cluster-name"
        },
        {
            "Sid": "UseEksClusterPermissions",
            "Effect": "Allow",
            "Action": [
                "eks:DescribeCluster"
            ],
            "Resource": "arn:aws:sagemaker:us-east-2:111122223333:cluster/eks-cluster-name"
        },
        {
            "Sid": "ListClustersPermission",
            "Effect": "Allow",
            "Action": [
                "sagemaker:ListClusters"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ssm:StartSession",
                "ssm:TerminateSession"
            ],
            "Resource": "*"    
        }
    ]
}
```

------

*要向数据科学家 IAM 用户或角色授予访问集群中 Kubernetes 的权限，另请参阅 Amazon EKS [用户指南 APIs 中的授予 IAM 用户和角色访问 Kubernetes APIs](https://docs.aws.amazon.com/eks/latest/userguide/grant-k8s-access.html) 的权限。*

------

## 的 IAM 角色适用于 SageMaker HyperPod
<a name="sagemaker-hyperpod-prerequisites-iam-role-for-hyperpod"></a>

为了使 SageMaker HyperPod 集群能够运行并与必要的 AWS 资源通信，您需要为 HyperPod 集群创建一个 IAM 角色来代入。

首先附加托管角色 [AWS 托管策略： AmazonSageMakerHyperPodServiceRolePolicy](security-iam-awsmanpol-AmazonSageMakerHyperPodServiceRolePolicy.md)。根据此 AWS 托管策略， SageMaker HyperPod 集群实例组将扮演与 Amazon CloudWatch、Amazon S3 和 AWS Systems Manager 代理（SSM 代理）通信的角色。此托管策略是 SageMaker HyperPod 资源正常运行的最低要求，因此您必须向所有实例组提供具有此策略的 IAM 角色。

**提示**  
您还可以根据自己对多个实例组权限级别的偏好，设置多个 IAM 角色，并将它们附加到不同的实例组。当您设置集群用户对特定 SageMaker HyperPod 群集节点的访问权限时，这些节点将使用您手动附加的选择性权限来扮演该角色。  
当您通过 [AWS Systems Manager](https://aws.amazon.com/systems-manager/)（另请参阅 [为集群用户访问控制设置 AWS Systems Manager 和运行方式](sagemaker-hyperpod-prerequisites.md#sagemaker-hyperpod-prerequisites-ssm)）设置科学家对特定集群节点的访问权限时，集群节点会根据您手动附加的选择性权限承担角色。

创建 IAM 角色后，记下他们的名字和 ARNs。您在创建 SageMaker HyperPod 集群时使用这些角色，授予每个实例组与必要 AWS 资源通信所需的正确权限。

------
#### [ Slurm ]

要使用 Slurm 进行 HyperPod 编排，您必须将以下托管策略附加到 IAM 角色。 SageMaker HyperPod 
+ [AmazonSageMakerClusterInstanceRolePolicy](https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AmazonSageMakerClusterInstanceRolePolicy.html)

**（可选）用于亚马逊 Virtual Priv SageMaker HyperPod ate Cloud 的额外权限**

如果您想使用自己的亚马逊虚拟私有云 (VPC) 而不是默认 SageMaker AI VPC，则应向 IAM 角色添加以下额外权限。 SageMaker HyperPod

```
{
    "Effect": "Allow",
    "Action": [
        "ec2:CreateNetworkInterface",
        "ec2:CreateNetworkInterfacePermission",
        "ec2:DeleteNetworkInterface",
        "ec2:DeleteNetworkInterfacePermission",
        "ec2:DescribeNetworkInterfaces",
        "ec2:DescribeVpcs",
        "ec2:DescribeDhcpOptions",
        "ec2:DescribeSubnets",
        "ec2:DescribeSecurityGroups",
        "ec2:DetachNetworkInterface"
    ],
    "Resource": "*"
}
{
    "Effect": "Allow",
    "Action": "ec2:CreateTags",
    "Resource": [
        "arn:aws:ec2:*:*:network-interface/*"
    ]
}
```

以下列表详细列出了在使用自己的 Amazon VPC 配置 SageMaker HyperPod 集群时需要哪些权限才能启用集群功能。
+ 需要以下`ec2`权限才能使用您的 VPC 配置 SageMaker HyperPod 集群。

  ```
  {
      "Effect": "Allow",
      "Action": [
          "ec2:CreateNetworkInterface",
          "ec2:CreateNetworkInterfacePermission",
          "ec2:DeleteNetworkInterface",
          "ec2:DeleteNetworkInterfacePermission",
          "ec2:DescribeNetworkInterfaces",
          "ec2:DescribeVpcs",
          "ec2:DescribeDhcpOptions",
          "ec2:DescribeSubnets",
          "ec2:DescribeSecurityGroups"
      ],
      "Resource": "*"
  }
  ```
+ 启用[SageMaker HyperPod 自动恢复功能](sagemaker-hyperpod-resiliency-slurm-auto-resume.md)需要以下`ec2`权限。

  ```
  {
      "Effect": "Allow",
      "Action": [
          "ec2:DetachNetworkInterface"
      ],
      "Resource": "*"
  }
  ```
+ 以下`ec2`权限 SageMaker HyperPod 允许在您账户内的网络接口上创建标签。

  ```
  {
      "Effect": "Allow",
      "Action": "ec2:CreateTags",
      "Resource": [
          "arn:aws:ec2:*:*:network-interface/*"
      ]
  }
  ```

------
#### [ Amazon EKS ]

要使用 Amazon EKS 进行 HyperPod 编排，您必须将以下托管策略附加到 SageMaker HyperPod IAM 角色。
+ [AmazonSageMakerClusterInstanceRolePolicy](https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AmazonSageMakerClusterInstanceRolePolicy.html)

除托管式策略外，还可为角色附加以下权限策略。

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ec2:AssignPrivateIpAddresses",
        "ec2:AttachNetworkInterface",
        "ec2:CreateNetworkInterface",
        "ec2:CreateNetworkInterfacePermission",
        "ec2:DeleteNetworkInterface",
        "ec2:DeleteNetworkInterfacePermission",
        "ec2:DescribeInstances",
        "ec2:DescribeInstanceTypes",
        "ec2:DescribeNetworkInterfaces",
        "ec2:DescribeTags",
        "ec2:DescribeVpcs",
        "ec2:DescribeDhcpOptions",
        "ec2:DescribeSubnets",
        "ec2:DescribeSecurityGroups",
        "ec2:DetachNetworkInterface",
        "ec2:ModifyNetworkInterfaceAttribute",
        "ec2:UnassignPrivateIpAddresses",
        "ecr:BatchCheckLayerAvailability",
        "ecr:BatchGetImage",
        "ecr:GetAuthorizationToken",
        "ecr:GetDownloadUrlForLayer",
        "eks-auth:AssumeRoleForPodIdentity"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "ec2:CreateTags"
      ],
      "Resource": [
        "arn:aws:ec2:*:*:network-interface/*"
      ]
    }
  ]
}
```

------

**注意**  
`"eks-auth:AssumeRoleForPodIdentity"` 权限是可选的。如果您打算使用 EKS 容器组身份，则必须使用该功能。

**SageMaker HyperPod 服务相关角色**

要获得中的 Amazon EKS 支持 SageMaker HyperPod，[AWS 托管策略： AmazonSageMakerHyperPodServiceRolePolicy](security-iam-awsmanpol-AmazonSageMakerHyperPodServiceRolePolicy.md)请使用 HyperPod 创建一个服务相关角色来监控和支持 EKS 集群的弹性，例如更换节点和重启作业。

**适用于带受限实例组（RIG）的 Amazon EKS 集群的其他 IAM 策略**

在受限实例组中运行的工作负载依赖执行角色从 Amazon S3 加载数据。您必须向执行角色添加其他 Amazon S3 权限，以使受限实例组中运行的自定义作业能够正常获取输入数据。

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::amzn-s3-demo-bucket"      
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject"
      ],
      "Resource": [
        "arn:aws:s3:::amzn-s3-demo-bucket/*"
      ]
    }
  ]
}
```

------

------