

本文為英文版的機器翻譯版本，如內容有任何歧義或不一致之處，概以英文版為準。

# 使用服務帳戶的 IAM 角色 (IRSA) 來設定叢集存取許可
<a name="spark-operator-security-irsa"></a>

本節使用範例示範如何設定 Kubernetes 服務帳戶以擔任 AWS Identity and Access Management 角色。然後，使用服務帳戶的 Pod 可以存取角色有權存取的任何 AWS 服務。

下列範例會執行 Spark 應用程式，以計算 Amazon S3 中檔案的字數。為此，您可以設定服務帳戶的 IAM 角色 (IRSA)，以驗證和授權 Kubernetes 服務帳戶。

**注意**  
此範例將 "spark-operator" 命名空間用於 Spark Operator 以及您在其中提交 Spark 應用程式的命名空間。

## 先決條件
<a name="spark-operator-security-irsa-prereqs"></a>

嘗試此頁面的範例之前，請先完成下列先決條件：
+ [設定 Spark Operator]()。
+ [安裝 Spark Operator](spark-operator-gs.md#spark-operator-install).
+ [建立 Amazon S3 儲存貯體](https://docs.aws.amazon.com/AmazonS3/latest/userguide/creating-bucket.html)。
+ 將您最喜愛的詩歌儲存在名為 `poem.txt` 的文字檔案中，然後將檔案上傳到 S3 儲存貯體。在此頁面中建立的 Spark 應用程式將讀取文字檔案的內容。如需有關將檔案上傳到 S3 的詳細資訊，請參閱 *Amazon Simple Storage Service 使用者指南*中的[上傳物件至儲存貯體](https://docs.aws.amazon.com/AmazonS3/latest/userguide/uploading-an-object-bucket.html)。

## 設定要擔任 IAM 角色的 Kubernetes 服務帳戶
<a name="spark-operator-security-irsa-config"></a>

使用下列步驟來設定 Kubernetes 服務帳戶，以擔任 IAM 角色，讓 Pod 可用來存取該角色有權存取 AWS 的服務。

1. 完成 後[先決條件](#spark-operator-security-irsa-prereqs)，請使用 AWS Command Line Interface 建立 `example-policy.json` 檔案，允許唯讀存取您上傳至 Amazon S3 的檔案：

   ```
   cat >example-policy.json <<EOF
   {
       "Version": "2012-10-17",		 	 	 
       "Statement": [
           {
               "Effect": "Allow",
               "Action": [
                   "s3:GetObject",
                   "s3:ListBucket"
               ],
               "Resource": [
                   "arn:aws:s3:::my-pod-bucket",
                   "arn:aws:s3:::my-pod-bucket/*"
               ]
           }
       ]
   }
   EOF
   ```

1. 然後，建立 IAM 政策 `example-policy`：

   ```
   aws iam create-policy --policy-name example-policy --policy-document file://example-policy.json
   ```

1. 接下來，建立 IAM 角色 `example-role`，並將其與 Spark 驅動程式的 Kubernetes 服務帳戶建立關聯：

   ```
   eksctl create iamserviceaccount --name driver-account-sa --namespace spark-operator \
   --cluster my-cluster --role-name "example-role" \
   --attach-policy-arn arn:aws:iam::111122223333:policy/example-policy --approve
   ```

1. 使用 Spark 驅動程式服務帳戶所需的叢集角色連結來建立 yaml 檔案：

   ```
   cat >spark-rbac.yaml <<EOF
   apiVersion: v1
   kind: ServiceAccount
   metadata:
     name: driver-account-sa
   ---
   apiVersion: rbac.authorization.k8s.io/v1
   kind: ClusterRoleBinding
   metadata:
     name: spark-role
   roleRef:
     apiGroup: rbac.authorization.k8s.io
     kind: ClusterRole
     name: edit
   subjects:
     - kind: ServiceAccount
       name: driver-account-sa
       namespace: spark-operator
   EOF
   ```

1. 套用叢集角色連結組態：

   ```
   kubectl apply -f spark-rbac.yaml
   ```

kubectl 命令應該確認成功建立帳戶：

```
serviceaccount/driver-account-sa created
clusterrolebinding.rbac.authorization.k8s.io/spark-role configured
```

## 從 Spark Operator 中執行應用程式
<a name="spark-operator-security-irsa-run"></a>

[設定 Kubernetes 服務帳戶]()之後，可以執行 Spark 應用程式來計算作為 [先決條件](#spark-operator-security-irsa-prereqs) 的一部分上傳的文字檔案中的字數。

1. 根據 Amazon EMR 第 6 版`word-count.yaml`，使用文字計數應用程式`SparkApplication`的定義建立新的檔案 。

   ```
   cat >word-count.yaml <<EOF
   apiVersion: "sparkoperator.k8s.io/v1beta2"
   kind: SparkApplication
   metadata:
     name: word-count
     namespace: spark-operator
   spec:
     type: Java
     mode: cluster
     image: "895885662937.dkr.ecr.us-west-2.amazonaws.com/spark/emr-6.10.0:latest"
     imagePullPolicy: Always
     mainClass: org.apache.spark.examples.JavaWordCount
     mainApplicationFile: local:///usr/lib/spark/examples/jars/spark-examples.jar
     arguments:
       - s3://my-pod-bucket/poem.txt
     hadoopConf:
      # EMRFS filesystem
       fs.s3.customAWSCredentialsProvider: com.amazonaws.auth.WebIdentityTokenCredentialsProvider
       fs.s3.impl: com.amazon.ws.emr.hadoop.fs.EmrFileSystem
       fs.AbstractFileSystem.s3.impl: org.apache.hadoop.fs.s3.EMRFSDelegate
       fs.s3.buffer.dir: /mnt/s3
       fs.s3.getObject.initialSocketTimeoutMilliseconds: "2000"
       mapreduce.fileoutputcommitter.algorithm.version.emr_internal_use_only.EmrFileSystem: "2"
       mapreduce.fileoutputcommitter.cleanup-failures.ignored.emr_internal_use_only.EmrFileSystem: "true"
     sparkConf:
       # Required for EMR Runtime
       spark.driver.extraClassPath: /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/home/hadoop/extrajars/*
       spark.driver.extraLibraryPath: /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
       spark.executor.extraClassPath: /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/home/hadoop/extrajars/*
       spark.executor.extraLibraryPath: /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
     sparkVersion: "3.3.1"
     restartPolicy:
       type: Never
     driver:
       cores: 1
       coreLimit: "1200m"
       memory: "512m"
       labels:
         version: 3.3.1
       serviceAccount: my-spark-driver-sa
     executor:
       cores: 1
       instances: 1
       memory: "512m"
       labels:
         version: 3.3.1
   EOF
   ```

   如果您使用版本 7 的 spark Operator，請調整一些組態值：

   ```
   cat >word-count.yaml <<EOF
   apiVersion: "sparkoperator.k8s.io/v1beta2"
   kind: SparkApplication
   metadata:
     name: word-count
     namespace: spark-operator
   spec:
     type: Java
     mode: cluster
     image: "895885662937.dkr.ecr.us-west-2.amazonaws.com/spark/emr-7.7.0:latest"
     imagePullPolicy: Always
     mainClass: org.apache.spark.examples.JavaWordCount
     mainApplicationFile: local:///usr/lib/spark/examples/jars/spark-examples.jar
     arguments:
       - s3://my-pod-bucket/poem.txt
     hadoopConf:
      # EMRFS filesystem
       fs.s3.customAWSCredentialsProvider: com.amazonaws.auth.WebIdentityTokenCredentialsProvider
       fs.s3.impl: com.amazon.ws.emr.hadoop.fs.EmrFileSystem
       fs.AbstractFileSystem.s3.impl: org.apache.hadoop.fs.s3.EMRFSDelegate
       fs.s3.buffer.dir: /mnt/s3
       fs.s3.getObject.initialSocketTimeoutMilliseconds: "2000"
       mapreduce.fileoutputcommitter.algorithm.version.emr_internal_use_only.EmrFileSystem: "2"
       mapreduce.fileoutputcommitter.cleanup-failures.ignored.emr_internal_use_only.EmrFileSystem: "true"
     sparkConf:
       # Required for EMR Runtime
       spark.driver.extraClassPath: /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/aws-java-sdk-v2/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/home/hadoop/extrajars/*
       spark.driver.extraLibraryPath: /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
       spark.executor.extraClassPath: /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/aws-java-sdk-v2/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/home/hadoop/extrajars/*
       spark.executor.extraLibraryPath: /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
     sparkVersion: "3.3.1"
     restartPolicy:
       type: Never
     driver:
       cores: 1
       coreLimit: "1200m"
       memory: "512m"
       labels:
         version: 3.3.1
       serviceAccount: my-spark-driver-sa
     executor:
       cores: 1
       instances: 1
       memory: "512m"
       labels:
         version: 3.3.1
   EOF
   ```

1. 提交 Spark 應用程式。

   ```
   kubectl apply -f word-count.yaml
   ```

   kubectl 命令應該傳回您已成功建立名為 `word-count` 的 `SparkApplication` 物件的確認資訊。

   ```
   sparkapplication.sparkoperator.k8s.io/word-count configured
   ```

1. 若要檢查 `SparkApplication` 物件的事件，請執行下列命令：

   ```
   kubectl describe sparkapplication word-count -n spark-operator
   ```

   kubectl 命令應傳回 `SparkApplication` 的描述和事件：

   ```
   Events:
     Type     Reason                               Age                    From            Message
     ----     ------                               ----                   ----            -------
     Normal   SparkApplicationSpecUpdateProcessed  3m2s (x2 over 17h)     spark-operator  Successfully processed spec update for SparkApplication word-count
     Warning  SparkApplicationPendingRerun         3m2s (x2 over 17h)     spark-operator  SparkApplication word-count is pending rerun
     Normal   SparkApplicationSubmitted            2m58s (x2 over 17h)    spark-operator  SparkApplication word-count was submitted successfully
     Normal   SparkDriverRunning                   2m56s (x2 over 17h)    spark-operator  Driver word-count-driver is running
     Normal   SparkExecutorPending                 2m50s                  spark-operator  Executor [javawordcount-fdd1698807392c66-exec-1] is pending
     Normal   SparkExecutorRunning                 2m48s                  spark-operator  Executor [javawordcount-fdd1698807392c66-exec-1] is running
     Normal   SparkDriverCompleted                 2m31s (x2 over 17h)    spark-operator  Driver word-count-driver completed
     Normal   SparkApplicationCompleted            2m31s (x2 over 17h)    spark-operator  SparkApplication word-count completed
     Normal   SparkExecutorCompleted               2m31s (x2 over 2m31s)  spark-operator  Executor [javawordcount-fdd1698807392c66-exec-1] completed
   ```

應用程式現在正在計算 S3 檔案中的單詞。要尋找字數，請參閱驅動程式的日誌檔案：

```
kubectl logs pod/word-count-driver -n spark-operator
```

kubectl 命令應傳回日誌檔案的內容及字數統計應用程式的結果。

```
INFO DAGScheduler: Job 0 finished: collect at JavaWordCount.java:53, took 5.146519 s
                Software: 1
```

如需有關如何透過 Spark Operator 將應用程式提交至 Spark 的詳細資訊，請參閱 GitHub 上 *Kubernetes Operator for Apache Spark (spark-on-k8s-operator)* 文件中的[使用 SparkApplication](https://www.kubeflow.org/docs/components/spark-operator/user-guide/using-sparkapplication/)。