# Security and the Spark operator with Amazon EMR on EKS There are a couple ways to set up cluster-access permissions when you use the Spark operator. The first is to use role-based access control, Role-based access control (RBAC) restricts access based on a person's role within an organization. It has become a primary way to handle access. The second access method is to assume an AWS Identity and Access Management role, which provides resource access by means of specific assigned permissions. **Topics** + [ # Setting up cluster access permissions with role-based access control (RBAC) ](spark-operator-security-rbac.md) + [ # Setting up cluster access permissions with IAM roles for service accounts (IRSA) ](spark-operator-security-irsa.md) # Setting up cluster access permissions with role-based access control (RBAC) To deploy the Spark operator, Amazon EMR on EKS creates two roles and service accounts for the Spark operator and the Spark apps. **Topics** + [ ## Operator service account and role ](#spark-operator-sa-oper) + [ ## Spark service account and role ](#spark-operator-sa-spark) ## Operator service account and role Amazon EMR on EKS creates the **operator service account and role** to manage `SparkApplications` for Spark jobs and for other resources such as services. The default name for this service account is `emr-containers-sa-spark-operator`. The following rules apply to this service role: ``` rules: - apiGroups: - "" resources: - pods verbs: - "*" - apiGroups: - "" resources: - services - configmaps - secrets verbs: - create - get - delete - update - apiGroups: - extensions - networking.k8s.io resources: - ingresses verbs: - create - get - delete - apiGroups: - "" resources: - nodes verbs: - get - apiGroups: - "" resources: - events verbs: - create - update - patch - apiGroups: - "" resources: - resourcequotas verbs: - get - list - watch - apiGroups: - apiextensions.k8s.io resources: - customresourcedefinitions verbs: - create - get - update - delete - apiGroups: - admissionregistration.k8s.io resources: - mutatingwebhookconfigurations - validatingwebhookconfigurations verbs: - create - get - update - delete - apiGroups: - sparkoperator.k8s.io resources: - sparkapplications - sparkapplications/status - scheduledsparkapplications - scheduledsparkapplications/status verbs: - "*" {{- if .Values.batchScheduler.enable }} # required for the `volcano` batch scheduler - apiGroups: - scheduling.incubator.k8s.io - scheduling.sigs.dev - scheduling.volcano.sh resources: - podgroups verbs: - "*" {{- end }} {{ if .Values.webhook.enable }} - apiGroups: - batch resources: - jobs verbs: - delete {{- end }} ``` ## Spark service account and role A Spark driver pod needs a Kubernetes service account in the same namespace as the pod. This service account needs permissions to create, get, list, patch and delete executor pods, and to create a Kubernetes headless service for the driver. The driver fails and exits without the service account unless the default service account in the pod's namespace has the required permissions. The default name for this service account is `emr-containers-sa-spark`. The following rules apply to this service role: ``` rules: - apiGroups: - "" resources: - pods verbs: - "*" - apiGroups: - "" resources: - services verbs: - "*" - apiGroups: - "" resources: - configmaps verbs: - "*" - apiGroups: - "" resources: - persistentvolumeclaims verbs: - "*" ``` # Setting up cluster access permissions with IAM roles for service accounts (IRSA) This section uses an example to demonstrate how to configure a Kubernetes service account to assume an AWS Identity and Access Management role. Pods that use the service account can then access any AWS service that the role has permissions to access. The following example runs a Spark application to count the words from a file in Amazon S3. To do this, you can set up IAM roles for service accounts (IRSA) to authenticate and authorize Kubernetes service accounts. **Note** This example uses the "spark-operator" namespace for the Spark operator and for the namespace where you submit the Spark application. ## Prerequisites Before you try the example on this page, complete the following prerequisites: + [Get set up for the Spark operator](). + [Install the Spark operator](spark-operator-gs.md#spark-operator-install). + [Create an Amazon S3 bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/creating-bucket.html). + Save your favorite poem in a text file named `poem.txt`, and upload the file to your S3 bucket. The Spark application that you create on this page will read the contents of the text file. For more information on uploading files to S3, see [Upload an object to your bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/uploading-an-object-bucket.html) in the *Amazon Simple Storage Service User Guide*. ## Configure a Kubernetes service account to assume an IAM role Use the following steps to configure a Kubernetes service account to assume an IAM role that pods can use to access AWS services that the role has permissions to access. 1. After completing the [Prerequisites](#spark-operator-security-irsa-prereqs), use the AWS Command Line Interface to create an `example-policy.json` file that allows read-only access to the file that you uploaded to Amazon S3: ``` cat >example-policy.json <spark-rbac.yaml < After you [configure the Kubernetes service account](), you can run a Spark application that counts the number of words in the text file that you uploaded as part of the [Prerequisites](#spark-operator-security-irsa-prereqs). 1. Create a new file `word-count.yaml`, with a `SparkApplication` definition for your word-count application, based on Amazon EMR version 6. ``` cat >word-count.yaml <word-count.yaml <