

# Getting started with Amazon EMR Serverless
<a name="getting-started"></a>

This tutorial helps you get started with EMR Serverless when you deploy a sample Spark or Hive workload. You'll create, run, and debug your own application. We show default options in most parts of this tutorial.

Before you launch an EMR Serverless application, complete the following tasks.

**Topics**
+ [Grant permissions to use EMR Serverless](#gs-permissions)
+ [Prepare storage for EMR Serverless](#gs-prepare-storage)
+ [Create an EMR Studio to run interactive workloads](#gs-interactive)
+ [Create a job runtime role](#gs-runtime-role)
+ [Getting started with EMR Serverless from the console](gs-console.md)
+ [Getting started from the AWS CLI](gs-cli.md)

## Grant permissions to use EMR Serverless
<a name="gs-permissions"></a>

To use EMR Serverless, you need a user or IAM role with an attached policy that grants permissions for EMR Serverless. To create a user and attach the appropriate policy to that user, follow the instructions in [Grant permissions](setting-up.md#setting-up-iam).

## Prepare storage for EMR Serverless
<a name="gs-prepare-storage"></a>

In this tutorial, you'll use an S3 bucket to store output files and logs from the sample Spark or Hive workload that you'll run using an EMR Serverless application. To create a bucket, follow the instructions in [Creating a bucket](https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-bucket.html) in the *Amazon Simple Storage Service Console User Guide*. Replace any further reference to `amzn-s3-demo-bucket` with the name of the newly created bucket. 

## Create an EMR Studio to run interactive workloads
<a name="gs-interactive"></a>

If you want to use EMR Serverless to execute interactive queries through notebooks that are hosted in EMR Studio, you need to specify an S3 bucket and the [minimum service role for EMR Serverless](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-studio-service-role.html#emr-studio-service-role-serverless) to create a Workspace. For steps to get set up, see [Set up an EMR Studio](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-studio-set-up.html) in the *Amazon EMR Management Guide*. For more information on interactive workloads, see [Run interactive workloads with EMR Serverless through EMR Studio](interactive-workloads.md).

## Create a job runtime role
<a name="gs-runtime-role"></a>

Job runs in EMR Serverless use a runtime role that provides granular permissions to specific AWS services and resources at runtime. In this tutorial, a public S3 bucket hosts the data and scripts. The bucket `amzn-s3-demo-bucket` stores the output. 

To set up a job runtime role, first create a runtime role with a trust policy so that EMR Serverless can use the new role. Next, attach the required S3 access policy to that role. The following steps guide you through the process.

------
#### [ Console ]

1. Navigate to the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/).

1. In the left navigation pane, choose **Policies**.

1. Choose **Create Policy**.

1. The **Create policy** page opens on a new tab. Select the **Policy editor **as Json and Paste the policy JSON below.
**Important**  
Replace `amzn-s3-demo-bucket` in the policy below with the actual bucket name created in [Prepare storage for EMR Serverless](#gs-prepare-storage). This is a basic policy for S3 access. For more job runtime role examples, see [Job runtime roles for Amazon EMR Serverless](security-iam-runtime-role.md).

------
#### [ JSON ]

****  

   ```
   {
     "Version":"2012-10-17",		 	 	 
     "Statement": [
       {
         "Sid": "ReadAccessForEMRSamples",
         "Effect": "Allow",
         "Action": [
           "s3:GetObject",
           "s3:ListBucket"
         ],
         "Resource": [
           "arn:aws:s3:::*.elasticmapreduce",
           "arn:aws:s3:::*.elasticmapreduce/*"
         ]
       },
       {
         "Sid": "FullAccessToOutputBucket",
         "Effect": "Allow",
         "Action": [
           "s3:PutObject",
           "s3:GetObject",
           "s3:ListBucket",
           "s3:DeleteObject"
         ],
         "Resource": [
           "arn:aws:s3:::amzn-s3-demo-bucket",
           "arn:aws:s3:::amzn-s3-demo-bucket/*"
         ]
       },
       {
         "Sid": "GlueCreateAndReadDataCatalog",
         "Effect": "Allow",
         "Action": [
           "glue:GetDatabase",
           "glue:CreateDatabase",
           "glue:GetDataBases",
           "glue:CreateTable",
           "glue:GetTable",
           "glue:UpdateTable",
           "glue:DeleteTable",
           "glue:GetTables",
           "glue:GetPartition",
           "glue:GetPartitions",
           "glue:CreatePartition",
           "glue:BatchCreatePartition",
           "glue:GetUserDefinedFunctions"
         ],
         "Resource": [
           "*"
         ]
       }
     ]
   }
   ```

------

1. Choose **Next** to enter a name for your policy, such as `EMRServerlessS3AndGlueAccessPolicy` and **Create policy** 

1. In the left navigation pane of IAM console , choose **Roles**.

1. Choose **Create role**.

1. For role type, choose **Custom trust policy** and paste the following trust policy. This allows jobs submitted to your Amazon EMR Serverless applications to access other AWS services on your behalf.

------
#### [ JSON ]

****  

   ```
   {
     "Version":"2012-10-17",		 	 	 
     "Statement": [
       {
         "Effect": "Allow",
         "Action": [
           "sts:AssumeRole"
         ],
         "Resource": "arn:aws:iam::123456789012:role/EMRServerlessExecutionRole",
         "Sid": "AllowSTSAssumerole"
       }
     ]
   }
   ```

------

1. Choose **Next** to navigate to the **Add permissions** page, then choose **EMRServerlessS3AndGlueAccessPolicy**.

1. In the **Name, review, and create** page, for **Role name**, enter a name for your role, for example, `EMRServerlessS3RuntimeRole`. To create this IAM role, choose **Create role**.

------
#### [ CLI ]

1. Create a file named `emr-serverless-trust-policy.json` that contains the trust policy to use for the IAM role. The file should contain the following policy.

------
#### [ JSON ]

****  

   ```
   {
     "Version":"2012-10-17",		 	 	 
     "Statement": [
       {
         "Sid": "EMRServerlessTrustPolicy",
         "Action": [
           "sts:AssumeRole"
         ],
         "Effect": "Allow",
         "Resource": "arn:aws:iam::123456789012:role/EMRServerlessExecutionRole"
       }
     ]
   }
   ```

------

1. Create an IAM role named `EMRServerlessS3RuntimeRole`. Use the trust policy that you created in the previous step.

   ```
   aws iam create-role \
       --role-name EMRServerlessS3RuntimeRole \
       --assume-role-policy-document file://emr-serverless-trust-policy.json
   ```

   Note the ARN in the output. You use the ARN of the new role during job submission, referred to after this as the `job-role-arn`.

1. Create a file named `emr-sample-access-policy.json` that defines the IAM policy for your workload. This provides read access to the script and data stored in public S3 buckets and read-write access to `amzn-s3-demo-bucket`. 
**Important**  
Replace `amzn-s3-demo-bucket` in the policy below with the actual bucket name created in [Prepare storage for EMR Serverless](#gs-prepare-storage).. This is a basic policy for AWS Glue and S3 access. For more job runtime role examples, see [Job runtime roles for Amazon EMR Serverless](security-iam-runtime-role.md).

------
#### [ JSON ]

****  

   ```
   {
     "Version":"2012-10-17",		 	 	 
     "Statement": [
       {
         "Sid": "ReadAccessForEMRSamples",
         "Effect": "Allow",
         "Action": [
           "s3:GetObject",
           "s3:ListBucket"
         ],
         "Resource": [
           "arn:aws:s3:::*.elasticmapreduce",
           "arn:aws:s3:::*.elasticmapreduce/*"
         ]
       },
       {
         "Sid": "FullAccessToOutputBucket",
         "Effect": "Allow",
         "Action": [
           "s3:PutObject",
           "s3:GetObject",
           "s3:ListBucket",
           "s3:DeleteObject"
         ],
         "Resource": [
           "arn:aws:s3:::amzn-s3-demo-bucket",
           "arn:aws:s3:::amzn-s3-demo-bucket/*"
         ]
       },
       {
         "Sid": "GlueCreateAndReadDataCatalog",
         "Effect": "Allow",
         "Action": [
           "glue:GetDatabase",
           "glue:CreateDatabase",
           "glue:GetDataBases",
           "glue:CreateTable",
           "glue:GetTable",
           "glue:UpdateTable",
           "glue:DeleteTable",
           "glue:GetTables",
           "glue:GetPartition",
           "glue:GetPartitions",
           "glue:CreatePartition",
           "glue:BatchCreatePartition",
           "glue:GetUserDefinedFunctions"
         ],
         "Resource": [
           "*"
         ]
       }
     ]
   }
   ```

------

1. Create an IAM policy named `EMRServerlessS3AndGlueAccessPolicy` with the policy file that you created in **Step 3**. Take note of the ARN in the output, as you will use the ARN of the new policy in the next step. 

   ```
   aws iam create-policy \
       --policy-name EMRServerlessS3AndGlueAccessPolicy \
       --policy-document file://emr-sample-access-policy.json
   ```

   Note the new policy's ARN in the output. You'll substitute it for `policy-arn` in the next step.

1. Attach the IAM policy `EMRServerlessS3AndGlueAccessPolicy` to the job runtime role `EMRServerlessS3RuntimeRole`.

   ```
   aws iam attach-role-policy \
       --role-name EMRServerlessS3RuntimeRole \
       --policy-arn policy-arn
   ```

------

# Getting started with EMR Serverless from the console
<a name="gs-console"></a>

This section describes working with EMR Serverless, including creating an EMR Studio. It also describes how to submit job runs and view logs.

**Topics**
+ [Step 1: Create an EMR Serverless application](#gs-application-console)
+ [Step 2: Submit a job run or interactive workload](#gs-job-run-console)
+ [Step 3: View application UI and logs](#gs-output-console)
+ [Step 4: Clean up](#gs-cleanup-console)

## Step 1: Create an EMR Serverless application
<a name="gs-application-console"></a>

Create a new application with EMR Serverless as follows.

1. Sign in to the AWS Management Console and open the Amazon EMR console at [https://console.aws.amazon.com/emr](https://console.aws.amazon.com/emr).

1. In the left navigation pane, choose **EMR Serverless** to navigate to the EMR Serverless landing page.

1. To create or manage EMR Serverless applications, you need the EMR Studio UI.
   + If you already have an EMR Studio in the AWS Region where you want to create an application, then select **Manage applications** to navigate to your EMR Studio, or select the studio that you want to use. 
   + If you don't have an EMR Studio in the AWS Region where you want to create an application, choose **Get started** and then Choose **Create and launch Studio**. EMR Serverless creates a EMR Studio for you so that you can create and manage applications.

1. In the **Create studio** UI that opens in a new tab, enter the name, type, and release version for your application. If you only want to run batch jobs, select **Use default settings for batch jobs only**. For interactive workloads, select **Use default settings for interactive workloads**. You can also run batch jobs on interactive-enabled applications with this option. If you need to, you can change these settings later.

   For more information, see [Create a studio](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-studio-create-studio.html).

1. Select **Create application** to create your first application. 

Continue to the next section [Step 2: Submit a job run or interactive workload](#gs-job-run-console) to submit a job run or interactive workload.

## Step 2: Submit a job run or interactive workload
<a name="gs-job-run-console"></a>

------
#### [ Spark job run ]

In this tutorial, we use a PySpark script to compute the number of occurrences of unique words across multiple text files. A public, read-only S3 bucket stores both the script and the dataset.

**To run a Spark job**

1. Upload the sample script `wordcount.py` into your new bucket with the following command.

   ```
   aws s3 cp s3://us-east-1.elasticmapreduce/emr-containers/samples/wordcount/scripts/wordcount.py s3://amzn-s3-demo-bucket/scripts/
   ```

1. Completing [Step 1: Create an EMR Serverless application](#gs-application-console) takes you to the **Application details** page in EMR Studio. There, choose the **Submit job** option.

1. On the **Submit job** page, complete the following.
   + In the **Name** field, enter the name that you want to call your job run.
   + In the **Runtime role** field, enter the name of the role that you created in [Create a job runtime role](getting-started.md#gs-runtime-role).
   + In the **Script location** field, enter `s3://amzn-s3-demo-bucket/scripts/wordcount.py` as the S3 URI.
   + In the **Script arguments** field, enter `["s3://amzn-s3-demo-bucket/emr-serverless-spark/output"]`.
   + In the **Spark properties** section, choose **Edit as text** and enter the following configurations.

     ```
     --conf spark.executor.cores=1 --conf spark.executor.memory=4g --conf spark.driver.cores=1 --conf spark.driver.memory=4g --conf spark.executor.instances=1
     ```

1. To start the job run, choose **Submit job** .

1. In the **Job runs** tab, you should see your new job run with a **Running** status.

------
#### [ Hive job run ]

In this part of the tutorial, we create a table, insert a few records, and run a count aggregation query. To run the Hive job, first create a file that contains all Hive queries to run as part of single job, upload the file to S3, and specify this S3 path when starting the Hive job.

**To run a Hive job**

1. Create a file called `hive-query.ql` that contains all the queries that you want to run in your Hive job.

   ```
   create database if not exists emrserverless;
   use emrserverless;
   create table if not exists test_table(id int);
   drop table if exists Values__Tmp__Table__1;
   insert into test_table values (1),(2),(2),(3),(3),(3);
   select id, count(id) from test_table group by id order by id desc;
   ```

1. Upload `hive-query.ql` to your S3 bucket with the following command.

   ```
   aws s3 cp hive-query.ql s3://amzn-s3-demo-bucket/emr-serverless-hive/query/hive-query.ql
   ```

1. Completing [Step 1: Create an EMR Serverless application](#gs-application-console) takes you to the **Application details** page in EMR Studio. There, choose the **Submit job** option.

1. On the **Submit job** page, complete the following.
   + In the **Name** field, enter the name that you want to call your job run.
   + In the **Runtime role** field, enter the name of the role that you created in [Create a job runtime role](getting-started.md#gs-runtime-role).
   + In the **Script location** field, enter `s3://amzn-s3-demo-bucket/emr-serverless-hive/query/hive-query.ql` as the S3 URI.
   + In the **Hive properties** section, choose **Edit as text**, and enter the following configurations.

     ```
     --hiveconf hive.log.explain.output=false
     ```
   + In the **Job configuration** section, choose **Edit as JSON**, and enter the following JSON.

     ```
     {
        "applicationConfiguration": 
        [{
            "classification": "hive-site",
               "properties": {
                   "hive.exec.scratchdir": "s3://amzn-s3-demo-bucket/emr-serverless-hive/hive/scratch",
                   "hive.metastore.warehouse.dir": "s3://amzn-s3-demo-bucket/emr-serverless-hive/hive/warehouse",
                   "hive.driver.cores": "2",
                   "hive.driver.memory": "4g",
                   "hive.tez.container.size": "4096",
                   "hive.tez.cpu.vcores": "1"
                }
         }]
     }
     ```

1. To start the job run, choose **Submit job**.

1. In the **Job runs** tab, you should see your new job run with a **Running** status.

------
#### [ Interactive workload ]

With Amazon EMR 6.14.0 and higher, you can use notebooks that are hosted in EMR Studio to run interactive workloads for Spark in EMR Serverless. For more information including permissions and prerequisites, see [Run interactive workloads with EMR Serverless through EMR Studio](interactive-workloads.md).

Once you've created your application and set up the required permissions, use the following steps to run an interactive notebook with EMR Studio:

1. Navigate to the **Workspaces** tab in EMR Studio. If you still need to configure an Amazon S3 storage location and [EMR Studio service role](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-studio-service-role.html), select the **Configure studio** button in the banner at the top of the screen.

1. To access a notebook, select a Workspace or create a new Workspace. Use **Quick launch** to open your Workspace in a new tab.

1. Go to the newly opened tab. Select the **Compute** icon from the left navigation. Select EMR Serverless as the **Compute type**.

1. Select the interactive-enabled application that you created in the previous section.

1. In the **Runtime role** field, enter the name of the IAM role that your EMR Serverless application can assume for the job run. To learn more about runtime roles, see [Job runtime roles](https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/security-iam-runtime-role.html) in the *Amazon EMR Serverless User Guide*.

1. Select **Attach**. This may take up to a minute. The page will refresh when attached.

1. Pick a kernel and start a notebook. You can also browse example notebooks on EMR Serverless and copy them to your Workspace. To access the example notebooks, navigate to the **`{...}`** menu in the left navigation and browse through notebooks that have `serverless` in the notebook file name.

1. In the notebook, you can access the driver log link and a link to the Apache Spark UI, a real-time interface that provides metrics to monitor your job. For more information, see [Monitoring EMR Serverless applications and jobs](https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/app-job-metrics.html) in the *Amazon EMR Serverless User Guide*.

When you attach an application to an Studio workspace, the application start triggers automatically if it's not already running. You can also pre-start the application and keep it ready before you attach it to the workspace.

------

## Step 3: View application UI and logs
<a name="gs-output-console"></a>

To view the application UI, first identify the job run. An option for **Spark UI** or **Hive Tez UI** is available in the first row of options for that job run, based on the job type. Select the appropriate option.

If you chose the Spark UI, choose the **Executors** tab to view the driver and executors logs. If you chose the Hive Tez UI, choose the **All Tasks** tab to view the logs.

Once the job run status shows as **Success**, you can view the output of the job in your S3 bucket.

## Step 4: Clean up
<a name="gs-cleanup-console"></a>

While the application you created should auto-stop after 15 minutes of inactivity, we still recommend that you release resources that you don't intend to use again.

To delete the application, navigate to the **List applications** page. Select the application that you created and choose **Actions → Stop** to stop the application. After the application is in the `STOPPED` state, select the same application and choose **Actions → Delete**.

For more examples of running Spark and Hive jobs, see [Using Spark configurations when you run EMR Serverless jobs](jobs-spark.md) and [Using Hive configurations when you run EMR Serverless jobs](jobs-hive.md).

# Getting started from the AWS CLI
<a name="gs-cli"></a>

Get started with EMR Serverless from the AWS CLI with commands to create an application, run jobs, check job run output, and delete your resources.

## Step 1: Create an EMR Serverless application
<a name="gs-application-cli"></a>

Use the [https://docs.aws.amazon.com/emr-serverless/latest/APIReference/API_CreateApplication.html](https://docs.aws.amazon.com/emr-serverless/latest/APIReference/API_CreateApplication.html) command to create your first EMR Serverless application. You need to specify the application type and the the Amazon EMR release label associated with the application version you want to use. The name of the application is optional.

------
#### [ Spark ]

To create a Spark application, run the following command.

```
aws emr-serverless create-application \
    --release-label emr-6.6.0 \
    --type "SPARK" \
    --name my-application
```

------
#### [ Hive ]

To create a Hive application, run the following command. 

```
aws emr-serverless create-application \
    --release-label emr-6.6.0 \
    --type "HIVE" \
    --name my-application
```

------

Note the application ID returned in the output. You'll use the ID to start the application and during job submission, referred to after this as the `application-id`.

Before you move on to [Step 2: Submit a job run to your EMR Serverless application](#gs-job-run-cli), make sure that your application has reached the `CREATED` state with the [https://docs.aws.amazon.com/emr-serverless/latest/APIReference/API_GetApplication.html](https://docs.aws.amazon.com/emr-serverless/latest/APIReference/API_GetApplication.html) API.

```
aws emr-serverless get-application \
    --application-id application-id
```

EMR Serverless creates workers to accommodate your requested jobs. By default, these are created on demand, but you can also specify a pre-initialized capacity by setting the `initialCapacity` parameter when you create the application. You can also limit the total maximum capacity that an application can use with the `maximumCapacity` parameter. To learn more about these options, see [Configuring an application when working with EMR Serverless](application-capacity.md).

## Step 2: Submit a job run to your EMR Serverless application
<a name="gs-job-run-cli"></a>

Now your EMR Serverless application is ready to run jobs.

------
#### [ Spark ]

In this step, we use a PySpark script to compute the number of occurrences of unique words across multiple text files. A public, read-only S3 bucket stores both the script and the dataset. The application sends the output file and the log data from the Spark runtime to `/output` and `/logs` directories in the S3 bucket that you created. 

**To run a Spark job**

1. Use the following command to copy the sample script we will run into your new bucket.

   ```
   aws s3 cp s3://us-east-1.elasticmapreduce/emr-containers/samples/wordcount/scripts/wordcount.py s3://amzn-s3-demo-bucket/scripts/
   ```

1. In the following command, substitute `application-id` with your application ID. Substitute `job-role-arn` with the runtime role ARN you created in [Create a job runtime role](getting-started.md#gs-runtime-role). Substitute *`job-run-name`* with the name you want to call your job run. Replace all `amzn-s3-demo-bucket` strings with the Amazon S3 bucket that you created, and add `/output` to the path. This creates a new folder in your bucket where EMR Serverless can copy the output files of your application.

   ```
   aws emr-serverless start-job-run \
       --application-id application-id \
       --execution-role-arn job-role-arn \
       --name job-run-name \
       --job-driver '{
           "sparkSubmit": {
             "entryPoint": "s3://amzn-s3-demo-bucket/scripts/wordcount.py",
             "entryPointArguments": ["s3://amzn-s3-demo-bucket/emr-serverless-spark/output"],
             "sparkSubmitParameters": "--conf spark.executor.cores=1 --conf spark.executor.memory=4g --conf spark.driver.cores=1 --conf spark.driver.memory=4g --conf spark.executor.instances=1"
           }
       }'
   ```

1. Note the job run ID returned in the output . Replace `job-run-id` with this ID in the following steps.

------
#### [ Hive ]

In this tutorial, we create a table, insert a few records, and run a count aggregation query. To run the Hive job, first create a file that contains all Hive queries to run as part of single job, upload the file to S3, and specify this S3 path when you start the Hive job.

**To run a Hive job**

1. Create a file called `hive-query.ql` that contains all the queries that you want to run in your Hive job.

   ```
   create database if not exists emrserverless;
   use emrserverless;
   create table if not exists test_table(id int);
   drop table if exists Values__Tmp__Table__1;
   insert into test_table values (1),(2),(2),(3),(3),(3);
   select id, count(id) from test_table group by id order by id desc;
   ```

1. Upload `hive-query.ql` to your S3 bucket with the following command.

   ```
   aws s3 cp hive-query.ql s3://amzn-s3-demo-bucket/emr-serverless-hive/query/hive-query.ql
   ```

1. In the following command, substitute `application-id` with your own application ID. Substitute `job-role-arn` with the runtime role ARN you created in [Create a job runtime role](getting-started.md#gs-runtime-role). Replace all `amzn-s3-demo-bucket` strings with the Amazon S3 bucket that you created, and add `/output` and `/logs` to the path. This creates new folders in your bucket, where EMR Serverless can copy the output and log files of your application.

   ```
   aws emr-serverless start-job-run \
       --application-id application-id \
       --execution-role-arn job-role-arn \
       --job-driver '{
           "hive": {
             "query": "s3://amzn-s3-demo-bucket/emr-serverless-hive/query/hive-query.ql",
             "parameters": "--hiveconf hive.log.explain.output=false"
           }
       }' \
       --configuration-overrides '{
         "applicationConfiguration": [{
           "classification": "hive-site",
             "properties": {
               "hive.exec.scratchdir": "s3://amzn-s3-demo-bucket/emr-serverless-hive/hive/scratch",
               "hive.metastore.warehouse.dir": "s3://amzn-s3-demo-bucket/emr-serverless-hive/hive/warehouse",
               "hive.driver.cores": "2",
               "hive.driver.memory": "4g",
               "hive.tez.container.size": "4096",
               "hive.tez.cpu.vcores": "1"
               }
           }],
           "monitoringConfiguration": {
             "s3MonitoringConfiguration": {
               "logUri": "s3://amzn-s3-demo-bucket/emr-serverless-hive/logs"
              }
           }
       }'
   ```

1. Note the job run ID returned in the output. Replace `job-run-id` with this ID in the following steps.

------

## Step 3: Review your job run's output
<a name="gs-output-cli"></a>

The job run should typically take 3-5 minutes to complete. 

------
#### [ Spark ]

You can check for the state of your Spark job with the following command.

```
aws emr-serverless get-job-run \
    --application-id application-id \
    --job-run-id job-run-id
```

With your log destination set to `s3://amzn-s3-demo-bucket/emr-serverless-spark/logs`, you can find the logs for this specific job run under `s3://amzn-s3-demo-bucket/emr-serverless-spark/logs/applications/application-id/jobs/job-run-id`. 

For Spark applications, EMR Serverless pushes event logs every 30 seconds to the `sparklogs` folder in your S3 log destination. When your job completes, Spark runtime logs for the driver and executors upload to folders named appropriately by the worker type, such as `driver` or `executor`. The output of the PySpark job uploads to `s3://amzn-s3-demo-bucket/output/`.

------
#### [ Hive ]

You can check for the state of your Hive job with the following command.

```
aws emr-serverless get-job-run \
    --application-id application-id \
    --job-run-id job-run-id
```

With your log destination set to `s3://amzn-s3-demo-bucket/emr-serverless-hive/logs`, you can find the logs for this specific job run under `s3://amzn-s3-demo-bucket/emr-serverless-hive/logs/applications/application-id/jobs/job-run-id`. 

For Hive applications, EMR Serverless continuously uploads the Hive driver to the `HIVE_DRIVER` folder, and Tez tasks logs to the `TEZ_TASK` folder, of your S3 log destination. After the job run reaches the `SUCCEEDED` state, the output of your Hive query becomes available in the Amazon S3 location that you specified in the `monitoringConfiguration` field of `configurationOverrides`. 

------

## Step 4: Clean up
<a name="gs-cleanup-cli"></a>

When you’re done working with this tutorial, consider deleting the resources that you created. We recommend that you release resources that you don't intend to use again.

### Delete your application
<a name="delete-application-cli"></a>

To delete an application, use the following command. 

```
aws emr-serverless delete-application \
    --application-id application-id
```

### Delete your S3 log bucket
<a name="delete-s3-bucket-cli"></a>

To delete your S3 logging and output bucket, use the following command. Replace `amzn-s3-demo-bucket` with the actual name of the S3 bucket created in [Prepare storage for EMR Serverless](getting-started.md#gs-prepare-storage)..

```
aws s3 rm s3://amzn-s3-demo-bucket --recursive
aws s3api delete-bucket --bucket amzn-s3-demo-bucket
```

### Delete your job runtime role
<a name="delete-runtime-role-cli"></a>

To delete the runtime role, detach the policy from the role. You can then delete both the role and the policy.

```
aws iam detach-role-policy \
    --role-name EMRServerlessS3RuntimeRole \
    --policy-arn policy-arn
```

To delete the role, use the following command.

```
aws iam delete-role \
    --role-name EMRServerlessS3RuntimeRole
```

To delete the policy that was attached to the role, use the following command.

```
aws iam delete-policy \
    --policy-arn policy-arn
```

For more examples of running Spark and Hive jobs, see [Using Spark configurations when you run EMR Serverless jobs](jobs-spark.md) and [Using Hive configurations when you run EMR Serverless jobs](jobs-hive.md).