

Terjemahan disediakan oleh mesin penerjemah. Jika konten terjemahan yang diberikan bertentangan dengan versi bahasa Inggris aslinya, utamakan versi bahasa Inggris.

# Menyiapkan izin akses klaster dengan peran IAM untuk akun layanan (IRSA)
<a name="spark-operator-security-irsa"></a>

Bagian ini menggunakan contoh untuk menunjukkan cara mengonfigurasi akun layanan Kubernetes untuk mengambil peran. AWS Identity and Access Management Pod yang menggunakan akun layanan kemudian dapat mengakses AWS layanan apa pun yang perannya memiliki izin untuk diakses.

Contoh berikut menjalankan aplikasi Spark untuk menghitung kata-kata dari file di Amazon S3. Untuk melakukan ini, Anda dapat mengatur peran IAM untuk akun layanan (IRSA) untuk mengautentikasi dan mengotorisasi akun layanan Kubernetes.

**catatan**  
Contoh ini menggunakan namespace “spark-operator” untuk operator Spark dan untuk namespace tempat Anda mengirimkan aplikasi Spark.

## Prasyarat
<a name="spark-operator-security-irsa-prereqs"></a>

Sebelum Anda mencoba contoh di halaman ini, lengkapi prasyarat berikut:
+ [Siapkan untuk operator Spark]().
+ [Instal operator Spark](spark-operator-gs.md#spark-operator-install).
+ [Buat bucket Amazon S3](https://docs.aws.amazon.com/AmazonS3/latest/userguide/creating-bucket.html).
+ Simpan puisi favorit Anda dalam file teks bernama`poem.txt`, dan unggah file ke bucket S3 Anda. Aplikasi Spark yang Anda buat di halaman ini akan membaca isi file teks. Untuk informasi selengkapnya tentang mengunggah file ke S3, lihat [Mengunggah objek ke bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/uploading-an-object-bucket.html) di *Panduan Pengguna Layanan Penyimpanan Sederhana Amazon*.

## Konfigurasikan akun layanan Kubernetes untuk mengambil peran IAM
<a name="spark-operator-security-irsa-config"></a>

Gunakan langkah-langkah berikut untuk mengonfigurasi akun layanan Kubernetes untuk mengambil peran IAM yang dapat digunakan pod untuk mengakses AWS layanan yang memiliki izin untuk diakses oleh peran tersebut.

1. Setelah menyelesaikan[Prasyarat](#spark-operator-security-irsa-prereqs), gunakan AWS Command Line Interface untuk membuat `example-policy.json` file yang memungkinkan akses hanya-baca ke file yang Anda unggah ke Amazon S3:

   ```
   cat >example-policy.json <<EOF
   {
       "Version": "2012-10-17",		 	 	 
       "Statement": [
           {
               "Effect": "Allow",
               "Action": [
                   "s3:GetObject",
                   "s3:ListBucket"
               ],
               "Resource": [
                   "arn:aws:s3:::my-pod-bucket",
                   "arn:aws:s3:::my-pod-bucket/*"
               ]
           }
       ]
   }
   EOF
   ```

1. Kemudian, buat kebijakan `example-policy` IAM:

   ```
   aws iam create-policy --policy-name example-policy --policy-document file://example-policy.json
   ```

1. Selanjutnya, buat peran IAM `example-role` dan kaitkan dengan akun layanan Kubernetes untuk driver Spark:

   ```
   eksctl create iamserviceaccount --name driver-account-sa --namespace spark-operator \
   --cluster my-cluster --role-name "example-role" \
   --attach-policy-arn arn:aws:iam::111122223333:policy/example-policy --approve
   ```

1. Buat file yaml dengan binding peran cluster yang diperlukan untuk akun layanan driver Spark:

   ```
   cat >spark-rbac.yaml <<EOF
   apiVersion: v1
   kind: ServiceAccount
   metadata:
     name: driver-account-sa
   ---
   apiVersion: rbac.authorization.k8s.io/v1
   kind: ClusterRoleBinding
   metadata:
     name: spark-role
   roleRef:
     apiGroup: rbac.authorization.k8s.io
     kind: ClusterRole
     name: edit
   subjects:
     - kind: ServiceAccount
       name: driver-account-sa
       namespace: spark-operator
   EOF
   ```

1. Menerapkan konfigurasi pengikatan peran cluster:

   ```
   kubectl apply -f spark-rbac.yaml
   ```

Perintah kubectl harus mengonfirmasi keberhasilan pembuatan akun:

```
serviceaccount/driver-account-sa created
clusterrolebinding.rbac.authorization.k8s.io/spark-role configured
```

## Menjalankan aplikasi dari operator Spark
<a name="spark-operator-security-irsa-run"></a>

Setelah Anda [mengkonfigurasi akun layanan Kubernetes](), Anda dapat menjalankan aplikasi Spark yang menghitung jumlah kata dalam file teks yang Anda unggah sebagai bagian dari. [Prasyarat](#spark-operator-security-irsa-prereqs)

1. Buat file baru`word-count.yaml`, dengan `SparkApplication` definisi untuk aplikasi penghitungan kata Anda, berdasarkan Amazon EMR versi 6.

   ```
   cat >word-count.yaml <<EOF
   apiVersion: "sparkoperator.k8s.io/v1beta2"
   kind: SparkApplication
   metadata:
     name: word-count
     namespace: spark-operator
   spec:
     type: Java
     mode: cluster
     image: "895885662937.dkr.ecr.us-west-2.amazonaws.com/spark/emr-6.10.0:latest"
     imagePullPolicy: Always
     mainClass: org.apache.spark.examples.JavaWordCount
     mainApplicationFile: local:///usr/lib/spark/examples/jars/spark-examples.jar
     arguments:
       - s3://my-pod-bucket/poem.txt
     hadoopConf:
      # EMRFS filesystem
       fs.s3.customAWSCredentialsProvider: com.amazonaws.auth.WebIdentityTokenCredentialsProvider
       fs.s3.impl: com.amazon.ws.emr.hadoop.fs.EmrFileSystem
       fs.AbstractFileSystem.s3.impl: org.apache.hadoop.fs.s3.EMRFSDelegate
       fs.s3.buffer.dir: /mnt/s3
       fs.s3.getObject.initialSocketTimeoutMilliseconds: "2000"
       mapreduce.fileoutputcommitter.algorithm.version.emr_internal_use_only.EmrFileSystem: "2"
       mapreduce.fileoutputcommitter.cleanup-failures.ignored.emr_internal_use_only.EmrFileSystem: "true"
     sparkConf:
       # Required for EMR Runtime
       spark.driver.extraClassPath: /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/home/hadoop/extrajars/*
       spark.driver.extraLibraryPath: /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
       spark.executor.extraClassPath: /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/home/hadoop/extrajars/*
       spark.executor.extraLibraryPath: /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
     sparkVersion: "3.3.1"
     restartPolicy:
       type: Never
     driver:
       cores: 1
       coreLimit: "1200m"
       memory: "512m"
       labels:
         version: 3.3.1
       serviceAccount: my-spark-driver-sa
     executor:
       cores: 1
       instances: 1
       memory: "512m"
       labels:
         version: 3.3.1
   EOF
   ```

   Jika Anda menggunakan operator percikan dengan rilis versi 7, Anda menyesuaikan beberapa nilai konfigurasi:

   ```
   cat >word-count.yaml <<EOF
   apiVersion: "sparkoperator.k8s.io/v1beta2"
   kind: SparkApplication
   metadata:
     name: word-count
     namespace: spark-operator
   spec:
     type: Java
     mode: cluster
     image: "895885662937.dkr.ecr.us-west-2.amazonaws.com/spark/emr-7.7.0:latest"
     imagePullPolicy: Always
     mainClass: org.apache.spark.examples.JavaWordCount
     mainApplicationFile: local:///usr/lib/spark/examples/jars/spark-examples.jar
     arguments:
       - s3://my-pod-bucket/poem.txt
     hadoopConf:
      # EMRFS filesystem
       fs.s3.customAWSCredentialsProvider: com.amazonaws.auth.WebIdentityTokenCredentialsProvider
       fs.s3.impl: com.amazon.ws.emr.hadoop.fs.EmrFileSystem
       fs.AbstractFileSystem.s3.impl: org.apache.hadoop.fs.s3.EMRFSDelegate
       fs.s3.buffer.dir: /mnt/s3
       fs.s3.getObject.initialSocketTimeoutMilliseconds: "2000"
       mapreduce.fileoutputcommitter.algorithm.version.emr_internal_use_only.EmrFileSystem: "2"
       mapreduce.fileoutputcommitter.cleanup-failures.ignored.emr_internal_use_only.EmrFileSystem: "true"
     sparkConf:
       # Required for EMR Runtime
       spark.driver.extraClassPath: /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/aws-java-sdk-v2/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/home/hadoop/extrajars/*
       spark.driver.extraLibraryPath: /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
       spark.executor.extraClassPath: /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/aws-java-sdk-v2/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/home/hadoop/extrajars/*
       spark.executor.extraLibraryPath: /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
     sparkVersion: "3.3.1"
     restartPolicy:
       type: Never
     driver:
       cores: 1
       coreLimit: "1200m"
       memory: "512m"
       labels:
         version: 3.3.1
       serviceAccount: my-spark-driver-sa
     executor:
       cores: 1
       instances: 1
       memory: "512m"
       labels:
         version: 3.3.1
   EOF
   ```

1. Kirim aplikasi Spark.

   ```
   kubectl apply -f word-count.yaml
   ```

   Perintah kubectl harus mengembalikan konfirmasi bahwa Anda berhasil membuat objek yang `SparkApplication` dipanggil. `word-count`

   ```
   sparkapplication.sparkoperator.k8s.io/word-count configured
   ```

1. Untuk memeriksa peristiwa untuk `SparkApplication` objek, jalankan perintah berikut:

   ```
   kubectl describe sparkapplication word-count -n spark-operator
   ```

   Perintah kubectl harus mengembalikan deskripsi `SparkApplication` dengan peristiwa:

   ```
   Events:
     Type     Reason                               Age                    From            Message
     ----     ------                               ----                   ----            -------
     Normal   SparkApplicationSpecUpdateProcessed  3m2s (x2 over 17h)     spark-operator  Successfully processed spec update for SparkApplication word-count
     Warning  SparkApplicationPendingRerun         3m2s (x2 over 17h)     spark-operator  SparkApplication word-count is pending rerun
     Normal   SparkApplicationSubmitted            2m58s (x2 over 17h)    spark-operator  SparkApplication word-count was submitted successfully
     Normal   SparkDriverRunning                   2m56s (x2 over 17h)    spark-operator  Driver word-count-driver is running
     Normal   SparkExecutorPending                 2m50s                  spark-operator  Executor [javawordcount-fdd1698807392c66-exec-1] is pending
     Normal   SparkExecutorRunning                 2m48s                  spark-operator  Executor [javawordcount-fdd1698807392c66-exec-1] is running
     Normal   SparkDriverCompleted                 2m31s (x2 over 17h)    spark-operator  Driver word-count-driver completed
     Normal   SparkApplicationCompleted            2m31s (x2 over 17h)    spark-operator  SparkApplication word-count completed
     Normal   SparkExecutorCompleted               2m31s (x2 over 2m31s)  spark-operator  Executor [javawordcount-fdd1698807392c66-exec-1] completed
   ```

Aplikasi ini sekarang menghitung kata-kata dalam file S3 Anda. Untuk menemukan jumlah kata, lihat file log untuk driver Anda:

```
kubectl logs pod/word-count-driver -n spark-operator
```

Perintah kubectl harus mengembalikan isi file log dengan hasil aplikasi hitungan kata Anda.

```
INFO DAGScheduler: Job 0 finished: collect at JavaWordCount.java:53, took 5.146519 s
                Software: 1
```

Untuk informasi selengkapnya tentang cara mengirimkan aplikasi ke Spark melalui operator Spark, lihat dokumentasi [Using a SparkApplication](https://www.kubeflow.org/docs/components/spark-operator/user-guide/using-sparkapplication/) in the *Kubernetes Operator for Apache Spark (8s-operator*). spark-on-k GitHub