本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。
IAM 信任关系问题
HyperPod 推理运算符启动失败, AssumeRoleWithWebIdentity出现 STS 错误,这表明 IAM 信任关系配置存在问题。
错误消息:
failed to enable inference watcher for HyperPod cluster *****: operation error SageMaker: UpdateClusterInference, get identity: get credentials: failed to refresh cached credentials, failed to retrieve credentials, operation error STS: AssumeRoleWithWebIdentity, https response error StatusCode: 403, RequestID: ****, api error AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity
解决方法:
使用以下配置更新推理运算符的 IAM 执行角色的信任关系。
替换以下占位符:
-
<ACCOUNT_ID>: 您的 AWS 账户 ID -
<REGION>: 你所在 AWS 的地区 -
<OIDC_ID>: 您的 Amazon EKS 集群的 OIDC 提供商 ID
{ "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::<ACCOUNT_ID>:oidc-provider/oidc.eks.<REGION>.amazonaws.com/id/<OIDC_ID>" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringLike": { "oidc.eks.<REGION>.amazonaws.com/id/<OIDC_ID>:sub": "system:serviceaccount:<namespace>:<service-account-name>", "oidc.eks.<REGION>.amazonaws.com/id/<OIDC_ID>:aud": "sts.amazonaws.com" } } }, { "Effect": "Allow", "Principal": { "Service": [ "sagemaker.amazonaws.com" ] }, "Action": "sts:AssumeRole" } ] }
验证:
更新信任关系后:
-
在 IAM 控制台中验证角色配置
-
如有必要,请重新启动推理运算符
-
监控操作员日志以成功启动