

本文属于机器翻译版本。若本译文内容与英语原文存在差异，则一律以英文原文为准。

# 使用服务账户的 IAM 角色（IRSA）设置集群访问权限
<a name="spark-operator-security-irsa"></a>

本节使用一个示例来演示如何配置 Kubernetes 服务账号来代入角色。 AWS Identity and Access Management 然后，使用该服务账号的 Pod 可以访问该角色有权访问的任何 AWS 服务。

以下示例运行一个 Spark 应用程序来统计 Amazon S3 中某个文件的字数。为此，您可以设置服务账户的 IAM 角色（IRSA），对 Kubernetes 服务账户进行身份验证和授权。

**注意**  
此示例将“spark-operator”命名空间用于 Spark Operator 和提交 Spark 应用程序的命名空间。

## 先决条件
<a name="spark-operator-security-irsa-prereqs"></a>

在尝试本页的示例之前，请先完成下述先决条件：
+ [设置好 Spark Operator]()。
+ [安装 Spark Operator](spark-operator-gs.md#spark-operator-install).
+ [创建 Amazon S3 存储桶](https://docs.aws.amazon.com/AmazonS3/latest/userguide/creating-bucket.html)。
+ 将您最喜欢的诗歌保存在名为 `poem.txt` 的文本文件中，然后将该文件上传到 S3 存储桶。您在此页面上创建的 Spark 应用程序会读取该文本文件中的内容。有关如何将文件上传到 S3 的信息，请参阅《Amazon Simple Storage Service 用户指南》**中的[将对象上传到存储桶](https://docs.aws.amazon.com/AmazonS3/latest/userguide/uploading-an-object-bucket.html)。

## 配置 Kubernetes 服务账户来代入 IAM 角色
<a name="spark-operator-security-irsa-config"></a>

使用以下步骤将 Kubernetes 服务账户配置为代入一个 IAM 角色，Pod 可以使用该角色访问该角色有权访问的 AWS 服务。

1. 完成后[先决条件](#spark-operator-security-irsa-prereqs)，使用创建允许 AWS Command Line Interface 对您上传到 Amazon S3 的文件进行只读访问的文件：`example-policy.json`

   ```
   cat >example-policy.json <<EOF
   {
       "Version": "2012-10-17",		 	 	 
       "Statement": [
           {
               "Effect": "Allow",
               "Action": [
                   "s3:GetObject",
                   "s3:ListBucket"
               ],
               "Resource": [
                   "arn:aws:s3:::my-pod-bucket",
                   "arn:aws:s3:::my-pod-bucket/*"
               ]
           }
       ]
   }
   EOF
   ```

1. 然后，创建 IAM policy `example-policy`：

   ```
   aws iam create-policy --policy-name example-policy --policy-document file://example-policy.json
   ```

1. 接下来，创建一个 IAM 角色 `example-role`，将该角色与 Spark 驱动程序的 Kubernetes 服务账户关联起来：

   ```
   eksctl create iamserviceaccount --name driver-account-sa --namespace spark-operator \
   --cluster my-cluster --role-name "example-role" \
   --attach-policy-arn arn:aws:iam::111122223333:policy/example-policy --approve
   ```

1. 创建一个 yaml 文件，其中包含 Spark 驱动程序服务账户所需的集群角色绑定：

   ```
   cat >spark-rbac.yaml <<EOF
   apiVersion: v1
   kind: ServiceAccount
   metadata:
     name: driver-account-sa
   ---
   apiVersion: rbac.authorization.k8s.io/v1
   kind: ClusterRoleBinding
   metadata:
     name: spark-role
   roleRef:
     apiGroup: rbac.authorization.k8s.io
     kind: ClusterRole
     name: edit
   subjects:
     - kind: ServiceAccount
       name: driver-account-sa
       namespace: spark-operator
   EOF
   ```

1. 应用集群角色绑定配置：

   ```
   kubectl apply -f spark-rbac.yaml
   ```

kubectl 命令应确认成功创建账户：

```
serviceaccount/driver-account-sa created
clusterrolebinding.rbac.authorization.k8s.io/spark-role configured
```

## 通过 Spark Operator 运行应用程序
<a name="spark-operator-security-irsa-run"></a>

[配置 Kubernetes 服务账户]()后，您可以运行 Spark 应用程序来统计作为 [先决条件](#spark-operator-security-irsa-prereqs) 的一部分上传的文本文件中的字数。

1. 基于 Amazon EMR 版本 6 创建一个新文件 `word-count.yaml`，其中包含字数统计应用程序的 `SparkApplication` 定义。

   ```
   cat >word-count.yaml <<EOF
   apiVersion: "sparkoperator.k8s.io/v1beta2"
   kind: SparkApplication
   metadata:
     name: word-count
     namespace: spark-operator
   spec:
     type: Java
     mode: cluster
     image: "895885662937.dkr.ecr.us-west-2.amazonaws.com/spark/emr-6.10.0:latest"
     imagePullPolicy: Always
     mainClass: org.apache.spark.examples.JavaWordCount
     mainApplicationFile: local:///usr/lib/spark/examples/jars/spark-examples.jar
     arguments:
       - s3://my-pod-bucket/poem.txt
     hadoopConf:
      # EMRFS filesystem
       fs.s3.customAWSCredentialsProvider: com.amazonaws.auth.WebIdentityTokenCredentialsProvider
       fs.s3.impl: com.amazon.ws.emr.hadoop.fs.EmrFileSystem
       fs.AbstractFileSystem.s3.impl: org.apache.hadoop.fs.s3.EMRFSDelegate
       fs.s3.buffer.dir: /mnt/s3
       fs.s3.getObject.initialSocketTimeoutMilliseconds: "2000"
       mapreduce.fileoutputcommitter.algorithm.version.emr_internal_use_only.EmrFileSystem: "2"
       mapreduce.fileoutputcommitter.cleanup-failures.ignored.emr_internal_use_only.EmrFileSystem: "true"
     sparkConf:
       # Required for EMR Runtime
       spark.driver.extraClassPath: /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/home/hadoop/extrajars/*
       spark.driver.extraLibraryPath: /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
       spark.executor.extraClassPath: /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/home/hadoop/extrajars/*
       spark.executor.extraLibraryPath: /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
     sparkVersion: "3.3.1"
     restartPolicy:
       type: Never
     driver:
       cores: 1
       coreLimit: "1200m"
       memory: "512m"
       labels:
         version: 3.3.1
       serviceAccount: my-spark-driver-sa
     executor:
       cores: 1
       instances: 1
       memory: "512m"
       labels:
         version: 3.3.1
   EOF
   ```

   如果您使用的是 Spark Operator 版本 7，则需要调整一些配置值：

   ```
   cat >word-count.yaml <<EOF
   apiVersion: "sparkoperator.k8s.io/v1beta2"
   kind: SparkApplication
   metadata:
     name: word-count
     namespace: spark-operator
   spec:
     type: Java
     mode: cluster
     image: "895885662937.dkr.ecr.us-west-2.amazonaws.com/spark/emr-7.7.0:latest"
     imagePullPolicy: Always
     mainClass: org.apache.spark.examples.JavaWordCount
     mainApplicationFile: local:///usr/lib/spark/examples/jars/spark-examples.jar
     arguments:
       - s3://my-pod-bucket/poem.txt
     hadoopConf:
      # EMRFS filesystem
       fs.s3.customAWSCredentialsProvider: com.amazonaws.auth.WebIdentityTokenCredentialsProvider
       fs.s3.impl: com.amazon.ws.emr.hadoop.fs.EmrFileSystem
       fs.AbstractFileSystem.s3.impl: org.apache.hadoop.fs.s3.EMRFSDelegate
       fs.s3.buffer.dir: /mnt/s3
       fs.s3.getObject.initialSocketTimeoutMilliseconds: "2000"
       mapreduce.fileoutputcommitter.algorithm.version.emr_internal_use_only.EmrFileSystem: "2"
       mapreduce.fileoutputcommitter.cleanup-failures.ignored.emr_internal_use_only.EmrFileSystem: "true"
     sparkConf:
       # Required for EMR Runtime
       spark.driver.extraClassPath: /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/aws-java-sdk-v2/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/home/hadoop/extrajars/*
       spark.driver.extraLibraryPath: /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
       spark.executor.extraClassPath: /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/aws-java-sdk-v2/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/home/hadoop/extrajars/*
       spark.executor.extraLibraryPath: /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
     sparkVersion: "3.3.1"
     restartPolicy:
       type: Never
     driver:
       cores: 1
       coreLimit: "1200m"
       memory: "512m"
       labels:
         version: 3.3.1
       serviceAccount: my-spark-driver-sa
     executor:
       cores: 1
       instances: 1
       memory: "512m"
       labels:
         version: 3.3.1
   EOF
   ```

1. 提交 Spark 应用程序。

   ```
   kubectl apply -f word-count.yaml
   ```

   kubectl 命令应返回您已成功创建名为 `word-count` 的 `SparkApplication` 对象的确认信息。

   ```
   sparkapplication.sparkoperator.k8s.io/word-count configured
   ```

1. 运行以下命令检查 `SparkApplication` 对象的事件：

   ```
   kubectl describe sparkapplication word-count -n spark-operator
   ```

   kubectl 命令应返回 `SparkApplication` 的描述和事件：

   ```
   Events:
     Type     Reason                               Age                    From            Message
     ----     ------                               ----                   ----            -------
     Normal   SparkApplicationSpecUpdateProcessed  3m2s (x2 over 17h)     spark-operator  Successfully processed spec update for SparkApplication word-count
     Warning  SparkApplicationPendingRerun         3m2s (x2 over 17h)     spark-operator  SparkApplication word-count is pending rerun
     Normal   SparkApplicationSubmitted            2m58s (x2 over 17h)    spark-operator  SparkApplication word-count was submitted successfully
     Normal   SparkDriverRunning                   2m56s (x2 over 17h)    spark-operator  Driver word-count-driver is running
     Normal   SparkExecutorPending                 2m50s                  spark-operator  Executor [javawordcount-fdd1698807392c66-exec-1] is pending
     Normal   SparkExecutorRunning                 2m48s                  spark-operator  Executor [javawordcount-fdd1698807392c66-exec-1] is running
     Normal   SparkDriverCompleted                 2m31s (x2 over 17h)    spark-operator  Driver word-count-driver completed
     Normal   SparkApplicationCompleted            2m31s (x2 over 17h)    spark-operator  SparkApplication word-count completed
     Normal   SparkExecutorCompleted               2m31s (x2 over 2m31s)  spark-operator  Executor [javawordcount-fdd1698807392c66-exec-1] completed
   ```

应用程序现在正在统计 S3 文件中的字数。要查找统计的字数，请参阅驱动程序的日志文件：

```
kubectl logs pod/word-count-driver -n spark-operator
```

kubectl 命令应返回日志文件的内容以及字数统计应用程序的结果。

```
INFO DAGScheduler: Job 0 finished: collect at JavaWordCount.java:53, took 5.146519 s
                Software: 1
```

有关如何通过 Spark 运算符向 Spark 提交应用程序的更多信息，请参阅 SparkApplication在 *Apache Spark 的 Kubernetes 运算符（spark-on-k8s 操作员*）文档中[使用](https://www.kubeflow.org/docs/components/spark-operator/user-guide/using-sparkapplication/) a。 GitHub