

# Troubleshoot Athena for Spark
<a name="notebooks-spark-troubleshooting"></a>

Use the following information to troubleshoot issues you may have when using notebooks and sessions on Athena.

**Topics**
+ [Learn about known issues in Athena for Spark](notebooks-spark-known-issues.md)
+ [Troubleshoot Spark-enabled workgroups](notebooks-spark-troubleshooting-workgroups.md)
+ [Use the Spark EXPLAIN statement to troubleshoot Spark SQL](notebooks-spark-troubleshooting-explain.md)
+ [Log Spark application events in Athena](notebooks-spark-logging.md)
+ [Use CloudTrail to troubleshoot Athena notebook API calls](notebooks-spark-troubleshooting-cloudtrail.md)
+ [Overcome the 68k code block size limit](notebooks-spark-troubleshooting-code-block-size-limit.md)
+ [Troubleshoot session errors](notebooks-spark-troubleshooting-sessions.md)
+ [Troubleshoot table errors](notebooks-spark-troubleshooting-tables.md)
+ [Get support](notebooks-spark-troubleshooting-support.md)

# Learn about known issues in Athena for Spark
<a name="notebooks-spark-known-issues"></a>

This page documents some of the known issues in Athena for Apache Spark.

## Illegal argument exception when creating a table
<a name="notebooks-spark-known-issues-illegal-argument-exception"></a>

Although Spark does not allow databases to be created with an empty location property, databases in AWS Glue can have an empty `LOCATION` property if they are created outside of Spark.

If you create a table and specify a AWS Glue database that has an empty `LOCATION` field, an exception like the following can occur: IllegalArgumentException: Cannot create a path from an empty string.

For example, the following command throws an exception if the default database in AWS Glue contains an empty `LOCATION` field:

```
spark.sql("create table testTable (firstName STRING)")
```

**Suggested solution A** – Use AWS Glue to add a location to the database that you are using.

**To add a location to an AWS Glue database**

1. Sign in to the AWS Management Console and open the AWS Glue console at [https://console.aws.amazon.com/glue/](https://console.aws.amazon.com/glue/).

1. In the navigation pane, choose **Databases**.

1. In the list of databases, choose the database that you want to edit.

1. On the details page for the database, choose **Edit**.

1. On the **Update a database** page, for **Location**, enter an Amazon S3 location.

1. Choose **Update Database**.

**Suggested solution B** – Use a different AWS Glue database that has an existing, valid location in Amazon S3. For example, if you have a database named `dbWithLocation`, use the command `spark.sql("use dbWithLocation")` to switch to that database.

**Suggested solution C** – When you use Spark SQL to create the table, specify a value for `location`, as in the following example.

```
spark.sql("create table testTable (firstName STRING) 
       location 's3://amzn-s3-demo-bucket/'").
```

**Suggested solution D** – If you specified a location when you created the table, but the issue still occurs, make sure the Amazon S3 path you provide has a trailing forward slash. For example, the following command throws an illegal argument exception:

```
spark.sql("create table testTable (firstName STRING) 
       location 's3://amzn-s3-demo-bucket'")
```

To correct this, add a trailing slash to the location (for example, `'s3://amzn-s3-demo-bucket/'`).

## Database created in a workgroup location
<a name="notebooks-spark-known-issues-database-created-in-a-workgroup-location"></a>

If you use a command like `spark.sql('create database db')` to create a database and do not specify a location for the database, Athena creates a subdirectory in your workgroup location and uses that location for the newly created database.

## Issues with Hive managed tables in the AWS Glue default database
<a name="notebooks-spark-known-issues-managed-tables"></a>

If the `Location` property of your default database in AWS Glue is nonempty and specifies a valid location in Amazon S3, and you use Athena for Spark to create a Hive managed table in your AWS Glue default database, data are written to the Amazon S3 location specified in your Athena Spark workgroup instead of to the location specified by the AWS Glue database.

This issue occurs because of how Apache Hive handles its default database. Apache Hive creates table data in the Hive warehouse root location, which can be different from the actual default database location.

When you use Athena for Spark to create a Hive managed table under the default database in AWS Glue, the AWS Glue table metadata can point to two different locations. This can cause unexpected behavior when you attempt an `INSERT` or `DROP TABLE` operation.

The steps to reproduce the issue are the following:

1. In Athena for Spark, you use one of the following methods to create or save a Hive managed table:
   + A SQL statement like `CREATE TABLE $tableName`
   + A PySpark command like `df.write.mode("overwrite").saveAsTable($tableName)` that does not specify the `path` option in the Dataframe API.

   At this point, the AWS Glue console may show an incorrect location in Amazon S3 for the table.

1. In Athena for Spark, you use the `DROP TABLE $table_name` statement to drop the table that you created.

1. After you run the `DROP TABLE` statement, you notice that the underlying files in Amazon S3 are still present.

To resolve this issue, do one of the following:

**Solution A** – Use a different AWS Glue database when you create Hive managed tables.

**Solution B** – Specify an empty location for the default database in AWS Glue. Then, create your managed tables in the default database.

## CSV and JSON file format incompatibility between Athena for Spark and Athena SQL
<a name="notebooks-spark-known-issues-csv-and-json-file-format-incompatibility"></a>

Due to a known issue with open source Spark, when you create a table in Athena for Spark on CSV or JSON data, the table might not be readable from Athena SQL, and vice versa. 

For example, you might create a table in Athena for Spark in one of the following ways: 
+ With the following `USING csv` syntax: 

  ```
  spark.sql('''CREATE EXTERNAL TABLE $tableName ( 
  $colName1 $colType1, 
  $colName2 $colType2, 
  $colName3 $colType3) 
  USING csv 
  PARTITIONED BY ($colName1) 
  LOCATION $s3_location''')
  ```
+  With the following [DataFrame](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/dataframe.html) API syntax: 

  ```
  df.write.format('csv').saveAsTable($table_name)
  ```

Due to the known issue with open source Spark, queries from Athena SQL on the resulting tables might not succeed. 

**Suggested solution** – Try creating the table in Athena for Spark using Apache Hive syntax. For more information, see [CREATE HIVEFORMAT TABLE](https://spark.apache.org/docs/latest/sql-ref-syntax-ddl-create-table-hiveformat.html) in the Apache Spark documentation. 

# Troubleshoot Spark-enabled workgroups
<a name="notebooks-spark-troubleshooting-workgroups"></a>

Use the following information to troubleshoot Spark-enabled workgroups in Athena.

## Session stops responding when using an existing IAM role
<a name="notebooks-spark-troubleshooting-workgroups-existing-role"></a>

If you did not create a new `AWSAthenaSparkExecutionRole` for your Spark enabled workgroup and instead updated or chose an existing IAM role, your session might stop responding. In this case, you may need to add the following trust and permissions policies to your Spark enabled workgroup execution role.

Add the following example trust policy. The policy includes a confused deputy check for the execution role. Replace the values for `111122223333`, `aws-region`, and `workgroup-name` with the AWS account ID, AWS Region, and workgroup that you are using.

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "athena.amazonaws.com"
            },
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "aws:SourceAccount": "111122223333"
                },
                "ArnLike": {
                    "aws:SourceArn": "arn:aws:athena:us-east-1:111122223333:workgroup/workgroup-name"
                }
            }
        }
    ]
}
```

------

Add a permissions policy like the following default policy for notebook enabled workgroups. Modify the placeholder Amazon S3 locations and AWS account IDs to correspond to the ones that you are using. Replace the values for `amzn-s3-demo-bucket`, `aws-region`, `111122223333`, and `workgroup-name` with the Amazon S3 bucket, AWS Region, AWS account ID, and workgroup that you are using.

------
#### [ JSON ]

****  

```
{ "Version":"2012-10-17",		 	 	  "Statement": [ { "Effect": "Allow", "Action": [
    "s3:PutObject", "s3:ListBucket", "s3:DeleteObject", "s3:GetObject" ], "Resource": [
        "arn:aws:s3:::amzn-s3-demo-bucket/*",
        "arn:aws:s3:::amzn-s3-demo-bucket" ] }, { "Effect": "Allow",
    "Action": [ "athena:GetWorkGroup", "athena:CreatePresignedNotebookUrl",
    "athena:TerminateSession", "athena:GetSession", "athena:GetSessionStatus",
    "athena:ListSessions", "athena:StartCalculationExecution", "athena:GetCalculationExecutionCode",
    "athena:StopCalculationExecution", "athena:ListCalculationExecutions",
    "athena:GetCalculationExecution", "athena:GetCalculationExecutionStatus",
    "athena:ListExecutors", "athena:ExportNotebook", "athena:UpdateNotebook" ], "Resource":
            "arn:aws:athena:us-east-1:111122223333:workgroup/workgroup-name"
    }, { "Effect": "Allow", "Action": [ "logs:CreateLogStream", "logs:DescribeLogStreams",
    "logs:CreateLogGroup", "logs:PutLogEvents" ], "Resource": [
            "arn:aws:logs:us-east-1:111122223333:log-group:/aws-athena:*",
            "arn:aws:logs:us-east-1:111122223333:log-group:/aws-athena*:log-stream:*"
    ] }, { "Effect": "Allow", "Action": "logs:DescribeLogGroups", "Resource":
            "arn:aws:logs:us-east-1:111122223333:log-group:*"
    }, { "Effect": "Allow", "Action": [ "cloudwatch:PutMetricData" ], "Resource": "*", "Condition":
    { "StringEquals": { "cloudwatch:namespace": "AmazonAthenaForApacheSpark" } } } ] }
```

------

# Use the Spark EXPLAIN statement to troubleshoot Spark SQL
<a name="notebooks-spark-troubleshooting-explain"></a>

You can use the Spark `EXPLAIN` statement with Spark SQL to troubleshoot your Spark code. The following code and output examples show this usage.

**Example – Spark SELECT statement**  

```
spark.sql("select * from select_taxi_table").explain(True)
```
**Output**  

```
Calculation started (calculation_id=20c1ebd0-1ccf-ef14-db35-7c1844876a7e) in 
(session=24c1ebcb-57a8-861e-1023-736f5ae55386). 
Checking calculation status...

Calculation completed.
== Parsed Logical Plan ==
'Project [*]
+- 'UnresolvedRelation [select_taxi_table], [], false

== Analyzed Logical Plan ==
VendorID: bigint, passenger_count: bigint, count: bigint
Project [VendorID#202L, passenger_count#203L, count#204L]
+- SubqueryAlias spark_catalog.spark_demo_database.select_taxi_table
   +- Relation spark_demo_database.select_taxi_table[VendorID#202L,
       passenger_count#203L,count#204L] csv

== Optimized Logical Plan ==
Relation spark_demo_database.select_taxi_table[VendorID#202L,
passenger_count#203L,count#204L] csv

== Physical Plan ==
FileScan csv spark_demo_database.select_taxi_table[VendorID#202L,
passenger_count#203L,count#204L] 
Batched: false, DataFilters: [], Format: CSV, 
Location: InMemoryFileIndex(1 paths)
[s3://amzn-s3-demo-bucket/select_taxi], 
PartitionFilters: [], PushedFilters: [], 
ReadSchema: struct<VendorID:bigint,passenger_count:bigint,count:bigint>
```

**Example – Spark data frame**  
The following example shows how to use `EXPLAIN` with a Spark data frame.  

```
taxi1_df=taxi_df.groupBy("VendorID", "passenger_count").count()
taxi1_df.explain("extended")
```
**Output**  

```
Calculation started (calculation_id=d2c1ebd1-f9f0-db25-8477-3effc001b309) in 
(session=24c1ebcb-57a8-861e-1023-736f5ae55386). 
Checking calculation status...

Calculation completed.
== Parsed Logical Plan ==
'Aggregate ['VendorID, 'passenger_count], 
['VendorID, 'passenger_count, count(1) AS count#321L]
+- Relation [VendorID#49L,tpep_pickup_datetime#50,tpep_dropoff_datetime#51,
passenger_count#52L,trip_distance#53,RatecodeID#54L,store_and_fwd_flag#55,
PULocationID#56L,DOLocationID#57L,payment_type#58L,fare_amount#59,
extra#60,mta_tax#61,tip_amount#62,tolls_amount#63,improvement_surcharge#64,
total_amount#65,congestion_surcharge#66,airport_fee#67] parquet

== Analyzed Logical Plan ==
VendorID: bigint, passenger_count: bigint, count: bigint
Aggregate [VendorID#49L, passenger_count#52L], 
[VendorID#49L, passenger_count#52L, count(1) AS count#321L]
+- Relation [VendorID#49L,tpep_pickup_datetime#50,tpep_dropoff_datetime#51,
passenger_count#52L,trip_distance#53,RatecodeID#54L,store_and_fwd_flag#55,
PULocationID#56L,DOLocationID#57L,payment_type#58L,fare_amount#59,extra#60,
mta_tax#61,tip_amount#62,tolls_amount#63,improvement_surcharge#64,
total_amount#65,congestion_surcharge#66,airport_fee#67] parquet

== Optimized Logical Plan ==
Aggregate [VendorID#49L, passenger_count#52L], 
[VendorID#49L, passenger_count#52L, count(1) AS count#321L]
+- Project [VendorID#49L, passenger_count#52L]
   +- Relation [VendorID#49L,tpep_pickup_datetime#50,tpep_dropoff_datetime#51,
passenger_count#52L,trip_distance#53,RatecodeID#54L,store_and_fwd_flag#55,
PULocationID#56L,DOLocationID#57L,payment_type#58L,fare_amount#59,extra#60,
mta_tax#61,tip_amount#62,tolls_amount#63,improvement_surcharge#64,
total_amount#65,congestion_surcharge#66,airport_fee#67] parquet

== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- HashAggregate(keys=[VendorID#49L, passenger_count#52L], functions=[count(1)], 
output=[VendorID#49L, passenger_count#52L, count#321L])
   +- Exchange hashpartitioning(VendorID#49L, passenger_count#52L, 1000), 
      ENSURE_REQUIREMENTS, [id=#531]
      +- HashAggregate(keys=[VendorID#49L, passenger_count#52L], 
         functions=[partial_count(1)], output=[VendorID#49L, 
         passenger_count#52L, count#326L])
         +- FileScan parquet [VendorID#49L,passenger_count#52L] Batched: true, 
            DataFilters: [], Format: Parquet, 
            Location: InMemoryFileIndex(1 paths)[s3://amzn-s3-demo-bucket/
            notebooks/yellow_tripdata_2016-01.parquet], PartitionFilters: [], 
            PushedFilters: [], 
            ReadSchema: struct<VendorID:bigint,passenger_count:bigint>
```

# Log Spark application events in Athena
<a name="notebooks-spark-logging"></a>

The Athena notebook editor allows for standard Jupyter, Spark, and Python logging. You can use `df.show()` to display PySpark DataFrame contents or use `print("Output")` to display values in the cell output. The `stdout`, `stderr`, and `results` outputs for your calculations are written to your query results bucket location in Amazon S3.

## Log Spark application events to Amazon CloudWatch
<a name="notebooks-spark-logging-logging-spark-application-events-to-amazon-cloudwatch"></a>

Your Athena sessions can also write logs to [Amazon CloudWatch](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/WhatIsCloudWatchLogs.html) in the account that you are using.

### Understand log streams and log groups
<a name="notebooks-spark-logging-understanding-log-streams-and-log-groups"></a>

CloudWatch organizes log activity into log streams and log groups.

**Log streams** – A CloudWatch log stream is a sequence of log events that share the same source. Each separate source of logs in CloudWatch Logs makes up a separate log stream.

**Log groups** – In CloudWatch Logs, a log group is a group of log streams that share the same retention, monitoring, and access control settings.

There is no limit on the number of log streams that can belong to one log group.

In Athena, when you start a notebook session for the first time, Athena creates a log group in CloudWatch that uses the name of your Spark-enabled workgroup, as in the following example.

```
/aws-athena/workgroup-name
```

This log group receives one log stream for each executor in your session that produces at least one log event. An executor is the smallest unit of compute that a notebook session can request from Athena. In CloudWatch, the name of the log stream begins with the session ID and executor ID.

For more information about CloudWatch log groups and log streams, see [Working with log groups and log streams](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/Working-with-log-groups-and-streams.html) in the Amazon CloudWatch Logs User Guide.

### Use standard logger objects in Athena for Spark
<a name="notebooks-spark-logging-using-standard-logger-objects-in-athena-for-spark"></a>

In an Athena for Spark session, you can use the following two global standard logger objects to write logs to Amazon CloudWatch:
+ **athena\$1user\$1logger** – Sends logs to CloudWatch only. Use this object when you want to log information your Spark applications directly to CloudWatch, as in the following example.

  ```
  athena_user_logger.info("CloudWatch log line.")
  ```

  The example writes a log event to CloudWatch like the following:

  ```
  AthenaForApacheSpark: 2022-01-01 12:00:00,000 INFO builtins: CloudWatch log line.
  ```
+ **athena\$1shared\$1logger** – Sends the same log both to CloudWatch and to AWS for support purposes. You can use this object to share logs with AWS service teams for troubleshooting, as in the following example.

  ```
  athena_shared_logger.info("Customer debug line.")
  var = [...some variable holding customer data...]
  athena_shared_logger.info(var)
  ```

  The example logs the `debug` line and the value of the `var` variable to CloudWatch Logs and sends a copy of each line to Support.
**Note**  
For your privacy, your calculation code and results are not shared with AWS. Make sure that your calls to `athena_shared_logger` write only the information that you want to make visible to Support.

The provided loggers write events through [Apache Log4j](https://logging.apache.org/log4j/) and inherit the logging levels of this interface. Possible log level values are `DEBUG`, `ERROR`, `FATAL`, `INFO`, and `WARN` or `WARNING`. You can use the corresponding named function on the logger to produce these values.

**Note**  
Do not rebind the names `athena_user_logger` or `athena_shared_logger`. Doing so makes the logging objects unable to write to CloudWatch for the remainder of the session.

### Example: Log notebook events to CloudWatch
<a name="notebooks-spark-logging-example-logging-notebook-events-to-cloudwatch"></a>

The following procedure shows how to log Athena notebook events to Amazon CloudWatch Logs.

**To log Athena notebook events to Amazon CloudWatch Logs**

1. Follow [Get started with Apache Spark on Amazon Athena](notebooks-spark-getting-started.md) to create a Spark enabled workgroup in Athena with a unique name. This tutorial uses the workgroup name `athena-spark-example`.

1. Follow the steps in [Step 7: Create your own notebook](notebooks-spark-getting-started.md#notebooks-spark-getting-started-creating-your-own-notebook) to create a notebook and launch a new session.

1. In the Athena notebook editor, in a new notebook cell, enter the following command:

   ```
   athena_user_logger.info("Hello world.")         
   ```

1. Run the cell.

1. Retrieve the current session ID by doing one of the following:
   + View the cell output (for example, `... session=72c24e73-2c24-8b22-14bd-443bdcd72de4`).
   + In a new cell, run the [magic](notebooks-spark-magics.md) command `%session_id`.

1. Save the session ID.

1. With the same AWS account that you are using to run the notebook session, open the CloudWatch console at [https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/).

1. In the CloudWatch console navigation pane, choose **Log groups**.

1. In the list of log groups, choose the log group that has the name of your Spark-enabled Athena workgroup, as in the following example.

   ```
   /aws-athena/athena-spark-example
   ```

   The **Log streams** section contains a list of one or more log stream links for the workgroup. Each log stream name contains the session ID, executor ID, and unique UUID separated by forward slash characters.

   For example, if the session ID is `5ac22d11-9fd8-ded7-6542-0412133d3177` and the executor ID is `f8c22d11-9fd8-ab13-8aba-c4100bfba7e2`, the name of the log stream resembles the following example.

   ```
   5ac22d11-9fd8-ded7-6542-0412133d3177/f8c22d11-9fd8-ab13-8aba-c4100bfba7e2/f012d7cb-cefd-40b1-90b9-67358f003d0b
   ```

1. Choose the log stream link for your session.

1. On the **Log events** page, view the **Message** column.

   The log event for the cell that you ran resembles the following:

   ```
   AthenaForApacheSpark: 2022-01-01 12:00:00,000 INFO builtins: Hello world.
   ```

1. Return to the Athena notebook editor.

1. In a new cell, enter the following code. The code logs a variable to CloudWatch:

   ```
   x = 6
   athena_user_logger.warn(x)
   ```

1. Run the cell.

1. Return to the CloudWatch console **Log events** page for the same log stream.

1. The log stream now contains a log event entry with a message like the following:

   ```
   AthenaForApacheSpark: 2022-01-01 12:00:00,000 WARN builtins: 6
   ```

# Use CloudTrail to troubleshoot Athena notebook API calls
<a name="notebooks-spark-troubleshooting-cloudtrail"></a>

To troubleshoot notebook API calls, you can examine Athena CloudTrail logs to investigate anomalies or discover actions initiated by users. For detailed information about using CloudTrail with Athena, see [Log Amazon Athena API calls with AWS CloudTrail](monitor-with-cloudtrail.md).

The following examples demonstrate CloudTrail log entries for Athena notebook APIs.

## StartSession
<a name="notebooks-spark-troubleshooting-cloudtrail-startsession"></a>

The following example shows the CloudTrail log for a notebook [StartSession](https://docs.aws.amazon.com/athena/latest/APIReference/API_StartSession.html) event.

```
{
    "eventVersion": "1.08",
    "userIdentity": {
        "type": "AssumedRole",
        "principalId": "EXAMPLE_PRINCIPAL_ID:alias",
        "arn": "arn:aws:sts::123456789012:assumed-role/Admin/alias",
        "accountId": "123456789012",
        "accessKeyId": "EXAMPLE_KEY_ID",
        "sessionContext": {
            "sessionIssuer": {
                "type": "Role",
                "principalId": "EXAMPLE_PRINCIPAL_ID",
                "arn": "arn:aws:iam::123456789012:role/Admin",
                "accountId": "123456789012",
                "userName": "Admin"
            },
            "webIdFederationData": {},
            "attributes": {
                "creationDate": "2022-10-14T16:41:51Z",
                "mfaAuthenticated": "false"
            }
        }
    },
    "eventTime": "2022-10-14T17:05:36Z",
    "eventSource": "athena.amazonaws.com",
    "eventName": "StartSession",
    "awsRegion": "us-east-1",
    "sourceIPAddress": "203.0.113.10",
    "userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36",
    "requestParameters": {
        "workGroup": "notebook-workgroup",
        "engineConfiguration": {
            "coordinatorDpuSize": 1,
            "maxConcurrentDpus": 20,
            "defaultExecutorDpuSize": 1,
            "additionalConfigs": {
                "NotebookId": "b8f5854b-1042-4b90-9d82-51d3c2fd5c04",
                "NotebookIframeParentUrl": "https://us-east-1.console.aws.amazon.com"
            }
        },
        "notebookVersion": "KeplerJupyter-1.x",
        "sessionIdleTimeoutInMinutes": 20,
        "clientRequestToken": "d646ff46-32d2-42f0-94d1-d060ec3e5d78"
    },
    "responseElements": {
        "sessionId": "a2c1ebba-ad01-865f-ed2d-a142b7451f7e",
        "state": "CREATED"
    },
    "requestID": "d646ff46-32d2-42f0-94d1-d060ec3e5d78",
    "eventID": "b58ce998-eb89-43e9-8d67-d3d8e30561c9",
    "readOnly": false,
    "eventType": "AwsApiCall",
    "managementEvent": true,
    "recipientAccountId": "123456789012",
    "eventCategory": "Management",
    "tlsDetails": {
        "tlsVersion": "TLSv1.2",
        "cipherSuite": "ECDHE-RSA-AES128-GCM-SHA256",
        "clientProvidedHostHeader": "athena.us-east-1.amazonaws.com"
    },
    "sessionCredentialFromConsole": "true"
}
```

## TerminateSession
<a name="notebooks-spark-troubleshooting-cloudtrail-terminatesession"></a>

The following example shows the CloudTrail log for a notebook [TerminateSession](https://docs.aws.amazon.com/athena/latest/APIReference/API_TerminateSession.html) event.

```
{
    "eventVersion": "1.08",
    "userIdentity": {
        "type": "AssumedRole",
        "principalId": "EXAMPLE_PRINCIPAL_ID:alias",
        "arn": "arn:aws:sts::123456789012:assumed-role/Admin/alias",
        "accountId": "123456789012",
        "accessKeyId": "EXAMPLE_KEY_ID",
        "sessionContext": {
            "sessionIssuer": {
                "type": "Role",
                "principalId": "EXAMPLE_PRINCIPAL_ID",
                "arn": "arn:aws:iam::123456789012:role/Admin",
                "accountId": "123456789012",
                "userName": "Admin"
            },
            "webIdFederationData": {},
            "attributes": {
                "creationDate": "2022-10-14T16:41:51Z",
                "mfaAuthenticated": "false"
            }
        }
    },
    "eventTime": "2022-10-14T17:21:03Z",
    "eventSource": "athena.amazonaws.com",
    "eventName": "TerminateSession",
    "awsRegion": "us-east-1",
    "sourceIPAddress": "203.0.113.11",
    "userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36",
    "requestParameters": {
        "sessionId": "a2c1ebba-ad01-865f-ed2d-a142b7451f7e"
    },
    "responseElements": {
        "state": "TERMINATING"
    },
    "requestID": "438ea37e-b704-4cb3-9a76-391997cf42ee",
    "eventID": "49026c5a-bf58-4cdb-86ca-978e711ad238",
    "readOnly": false,
    "eventType": "AwsApiCall",
    "managementEvent": true,
    "recipientAccountId": "123456789012",
    "eventCategory": "Management",
    "tlsDetails": {
        "tlsVersion": "TLSv1.2",
        "cipherSuite": "ECDHE-RSA-AES128-GCM-SHA256",
        "clientProvidedHostHeader": "athena.us-east-1.amazonaws.com"
    },
    "sessionCredentialFromConsole": "true"
}
```

## ImportNotebook
<a name="notebooks-spark-troubleshooting-cloudtrail-importnotebook"></a>

The following example shows the CloudTrail log for a notebook [ImportNotebook](https://docs.aws.amazon.com/athena/latest/APIReference/API_ImportNotebook.html) event. For security, some content is hidden.

```
{
    "eventVersion": "1.08",
    "userIdentity": {
        "type": "AssumedRole",
        "principalId": "EXAMPLE_PRINCIPAL_ID:alias",
        "arn": "arn:aws:sts::123456789012:assumed-role/Admin/alias",
        "accountId": "123456789012",
        "accessKeyId": "EXAMPLE_KEY_ID",
        "sessionContext": {
            "sessionIssuer": {
                "type": "Role",
                "principalId": "EXAMPLE_PRINCIPAL_ID",
                "arn": "arn:aws:iam::123456789012:role/Admin",
                "accountId": "123456789012",
                "userName": "Admin"
            },
            "webIdFederationData": {},
            "attributes": {
                "creationDate": "2022-10-14T16:41:51Z",
                "mfaAuthenticated": "false"
            }
        }
    },
    "eventTime": "2022-10-14T17:08:54Z",
    "eventSource": "athena.amazonaws.com",
    "eventName": "ImportNotebook",
    "awsRegion": "us-east-1",
    "sourceIPAddress": "203.0.113.12",
    "userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36",
    "requestParameters": {
        "workGroup": "notebook-workgroup",
        "name": "example-notebook-name",
        "payload": "HIDDEN_FOR_SECURITY_REASONS",
        "type": "IPYNB",
        "contentMD5": "HIDDEN_FOR_SECURITY_REASONS"
    },
    "responseElements": {
        "notebookId": "05f6225d-bdcc-4935-bc25-a8e19434652d"
    },
    "requestID": "813e777f-6dac-41f4-82a7-e99b7b33f319",
    "eventID": "4abec837-143b-4458-9c1f-fa9fb88ab69b",
    "readOnly": false,
    "eventType": "AwsApiCall",
    "managementEvent": true,
    "recipientAccountId": "123456789012",
    "eventCategory": "Management",
    "tlsDetails": {
        "tlsVersion": "TLSv1.2",
        "cipherSuite": "ECDHE-RSA-AES128-GCM-SHA256",
        "clientProvidedHostHeader": "athena.us-east-1.amazonaws.com"
    },
    "sessionCredentialFromConsole": "true"
}
```

## UpdateNotebook
<a name="notebooks-spark-troubleshooting-cloudtrail-updatenotebook"></a>

The following example shows the CloudTrail log for a notebook [UpdateNotebook](https://docs.aws.amazon.com/athena/latest/APIReference/API_UpdateNotebook.html) event. For security, some content is hidden.

```
{
    "eventVersion": "1.08",
    "userIdentity": {
        "type": "AssumedRole",
        "principalId": "EXAMPLE_PRINCIPAL_ID:AthenaExecutor-9cc1ebb2-aac5-b1ca-8247-5d827bd8232f",
        "arn": "arn:aws:sts::123456789012:assumed-role/AWSAthenaSparkExecutionRole-om0yj71w5l/AthenaExecutor-9cc1ebb2-aac5-b1ca-8247-5d827bd8232f",
        "accountId": "123456789012",
        "accessKeyId": "EXAMPLE_KEY_ID",
        "sessionContext": {
            "sessionIssuer": {
                "type": "Role",
                "principalId": "EXAMPLE_PRINCIPAL_ID",
                "arn": "arn:aws:iam::123456789012:role/service-role/AWSAthenaSparkExecutionRole-om0yj71w5l",
                "accountId": "123456789012",
                "userName": "AWSAthenaSparkExecutionRole-om0yj71w5l"
            },
            "webIdFederationData": {},
            "attributes": {
                "creationDate": "2022-10-14T16:48:06Z",
                "mfaAuthenticated": "false"
            }
        }
    },
    "eventTime": "2022-10-14T16:52:22Z",
    "eventSource": "athena.amazonaws.com",
    "eventName": "UpdateNotebook",
    "awsRegion": "us-east-1",
    "sourceIPAddress": "203.0.113.13",
    "userAgent": "Boto3/1.24.84 Python/3.8.14 Linux/4.14.225-175.364.amzn2.aarch64 Botocore/1.27.84",
    "requestParameters": {
        "notebookId": "c87553ff-e740-44b5-884f-a70e575e08b9",
        "payload": "HIDDEN_FOR_SECURITY_REASONS",
        "type": "IPYNB",
        "contentMD5": "HIDDEN_FOR_SECURITY_REASONS",
        "sessionId": "9cc1ebb2-aac5-b1ca-8247-5d827bd8232f"
    },
    "responseElements": null,
    "requestID": "baaba1d2-f73d-4df1-a82b-71501e7374f1",
    "eventID": "745cdd6f-645d-4250-8831-d0ffd2fe3847",
    "readOnly": false,
    "eventType": "AwsApiCall",
    "managementEvent": true,
    "recipientAccountId": "123456789012",
    "eventCategory": "Management",
    "tlsDetails": {
        "tlsVersion": "TLSv1.2",
        "cipherSuite": "ECDHE-RSA-AES128-GCM-SHA256",
        "clientProvidedHostHeader": "athena.us-east-1.amazonaws.com"
    }
}
```

## StartCalculationExecution
<a name="notebooks-spark-troubleshooting-cloudtrail-startcalculationexecution"></a>

The following example shows the CloudTrail log for a notebook [StartCalculationExecution](https://docs.aws.amazon.com/athena/latest/APIReference/API_StartCalculationExecution.html) event. For security, some content is hidden.

```
{
    "eventVersion": "1.08",
    "userIdentity": {
        "type": "AssumedRole",
        "principalId": "EXAMPLE_PRINCIPAL_ID:AthenaExecutor-9cc1ebb2-aac5-b1ca-8247-5d827bd8232f",
        "arn": "arn:aws:sts::123456789012:assumed-role/AWSAthenaSparkExecutionRole-om0yj71w5l/AthenaExecutor-9cc1ebb2-aac5-b1ca-8247-5d827bd8232f",
        "accountId": "123456789012",
        "accessKeyId": "EXAMPLE_KEY_ID",
        "sessionContext": {
            "sessionIssuer": {
                "type": "Role",
                "principalId": "EXAMPLE_PRINCIPAL_ID",
                "arn": "arn:aws:iam::123456789012:role/service-role/AWSAthenaSparkExecutionRole-om0yj71w5l",
                "accountId": "123456789012",
                "userName": "AWSAthenaSparkExecutionRole-om0yj71w5l"
            },
            "webIdFederationData": {},
            "attributes": {
                "creationDate": "2022-10-14T16:48:06Z",
                "mfaAuthenticated": "false"
            }
        }
    },
    "eventTime": "2022-10-14T16:52:37Z",
    "eventSource": "athena.amazonaws.com",
    "eventName": "StartCalculationExecution",
    "awsRegion": "us-east-1",
    "sourceIPAddress": "203.0.113.14",
    "userAgent": "Boto3/1.24.84 Python/3.8.14 Linux/4.14.225-175.364.amzn2.aarch64 Botocore/1.27.84",
    "requestParameters": {
        "sessionId": "9cc1ebb2-aac5-b1ca-8247-5d827bd8232f",
        "description": "Calculation started via Jupyter notebook",
        "codeBlock": "HIDDEN_FOR_SECURITY_REASONS",
        "clientRequestToken": "0111cd63-4fd0-4ad8-a738-fd350115fc21"
    },
    "responseElements": {
        "calculationExecutionId": "82c1ebb4-bd08-e4c3-5631-a662fb2ff2c5",
        "state": "CREATING"
    },
    "requestID": "1a107461-3f1b-481e-b8a2-7fbd524e2373",
    "eventID": "b74dbd00-e839-4bd1-a1da-b75fbc70ab9a",
    "readOnly": false,
    "eventType": "AwsApiCall",
    "managementEvent": true,
    "recipientAccountId": "123456789012",
    "eventCategory": "Management",
    "tlsDetails": {
        "tlsVersion": "TLSv1.2",
        "cipherSuite": "ECDHE-RSA-AES128-GCM-SHA256",
        "clientProvidedHostHeader": "athena.us-east-1.amazonaws.com"
    }
}
```

# Overcome the 68k code block size limit
<a name="notebooks-spark-troubleshooting-code-block-size-limit"></a>

Athena for Spark has a known calculation code block size limit of 68000 characters. When you run a calculation with a code block over this limit, you can receive the following error message:

 '...' at 'codeBlock' failed to satisfy constraint: Member must have length less than or equal to 68000

The following image shows this error in the Athena console notebook editor.

![\[Code block size error message in the Athena notebook editor\]](http://docs.aws.amazon.com/athena/latest/ug/images/notebooks-spark-troubleshooting-code-block-size-limit-1.png)


The same error can occur when you use the AWS CLI to run a calculation that has a large code block, as in the following example.

```
aws athena start-calculation-execution \ 
    --session-id "{SESSION_ID}" \ 
    --description "{SESSION_DESCRIPTION}" \ 
    --code-block "{LARGE_CODE_BLOCK}"
```

The command gives the following error message:

*\$1LARGE\$1CODE\$1BLOCK\$1* at 'codeBlock' failed to satisfy constraint: Member must have length less than or equal to 68000

## Workaround
<a name="notebooks-spark-troubleshooting-code-block-size-limit-workaround"></a>

To work around this issue, upload the file that has your query or calculation code to Amazon S3. Then, use boto3 to read the file and run your SQL or code.

The following examples assume that you have already uploaded the file that has your SQL query or Python code to Amazon S3.

### SQL example
<a name="notebooks-spark-troubleshooting-code-block-size-limit-sql-example"></a>

The following example code reads the `large_sql_query.sql` file from an Amazon S3 bucket and then runs the large query that the file contains.

```
s3 = boto3.resource('s3') 
def read_s3_content(bucket_name, key): 
    response = s3.Object(bucket_name, key).get() 
    return response['Body'].read() 

# SQL 
sql = read_s3_content('bucket_name', 'large_sql_query.sql') 
df = spark.sql(sql)
```

### PySpark example
<a name="notebooks-spark-troubleshooting-code-block-size-limit-pyspark-example"></a>

The following code example reads the `large_py_spark.py` file from Amazon S3 and then runs the large code block that is in the file.

```
s3 = boto3.resource('s3') 
 
def read_s3_content(bucket_name, key): 
    response = s3.Object(bucket_name, key).get() 
    return response['Body'].read() 
     
# PySpark 
py_spark_code = read_s3_content('bucket_name', 'large_py_spark.py') 
exec(py_spark_code)
```

# Troubleshoot session errors
<a name="notebooks-spark-troubleshooting-sessions"></a>

Use the information in this section to troubleshoot session issues.

When a custom configuration error occurs during a session start, the Athena for Spark console shows an error message banner. To troubleshoot session start errors, you can check session state change or logging information.

## View session state change information
<a name="notebooks-spark-troubleshooting-sessions-viewing-session-state-change"></a>

You can get details about a session state change from the Athena notebook editor or from the Athena API.

**To view session state information in the Athena console**

1. In the Athena notebook editor, from the **Session** menu on the upper right, choose **View details**.

1. View the **Current session** tab. The **Session information** section shows you information like session ID, workgroup, status, and state change reason.

   The following screen capture example shows information in the **State change reason** section of the **Session information** dialog box for a Spark session error in Athena.  
![\[Viewing session state change information in the Athena for Spark console.\]](http://docs.aws.amazon.com/athena/latest/ug/images/notebooks-spark-custom-jar-cfg-1.jpeg)

**To view session state information using the Athena API**
+ In the Athena API, you can find session state change information in the `StateChangeReason` field of [SessionStatus](https://docs.aws.amazon.com/athena/latest/APIReference/API_SessionStatus.html) object.

**Note**  
After you manually stop a session, or if the session stops after an idle timeout (the default is 20 minutes), the value of **StateChangeReason** changes to Session was terminated per request.

## Use logging to troubleshoot session start errors
<a name="notebooks-spark-troubleshooting-sessions-using-logging"></a>

Custom configuration errors that occur during a session start are logged by [Amazon CloudWatch](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/WhatIsCloudWatch.html). In your CloudWatch Logs, search for error messages from `AthenaSparkSessionErrorLogger` to troubleshoot a failed session start.

For more information about Spark logging, see [Log Spark application events in Athena](notebooks-spark-logging.md).

For more information about troubleshooting sessions in Athena for Spark, see [Troubleshoot session errors](#notebooks-spark-troubleshooting-sessions).

## Specific session issues
<a name="notebooks-spark-troubleshooting-sessions-specific-error-messages"></a>

Use the information in this section to troubleshoot some specific session issues.

### Session in unhealthy state
<a name="notebooks-spark-troubleshooting-sessions-unhealthy"></a>

If you receive the error message Session in unhealthy state. Please create a new session, terminate your existing session and create a new one.

### A connection to the notebook server could not be established
<a name="notebooks-spark-troubleshooting-sessions-wss-blocked"></a>

When you open a notebook, you may see the following error message:

```
A connection to the notebook server could not be established.  
The notebook will continue trying to reconnect.  
Check your network connection or notebook server configuration.
```

#### Cause
<a name="notebooks-spark-troubleshooting-sessions-wss-blocked-cause"></a>

When Athena opens a notebook, Athena creates a session and connects to the notebook using a pre-signed notebook URL. The connection to the notebook uses the WSS ([WebSocket Secure](https://en.wikipedia.org/wiki/WebSocket)) protocol.

The error can occur for the following reasons:
+ A local firewall (for example, a company-wide firewall) is blocking WSS traffic.
+ Proxy or anti-virus software on your local computer is blocking the WSS connection.

#### Solution
<a name="notebooks-spark-troubleshooting-sessions-wss-blocked-solution"></a>

Assume you have a WSS connection in the `us-east-1` Region like the following:

```
wss://94c2bcdf-66f9-4d17-9da6-7e7338060183.analytics-gateway.us-east-1.amazonaws.com/
api/kernels/33c78c82-b8d2-4631-bd22-1565dc6ec152/channels?session_id=
7f96a3a048ab4917b6376895ea8d7535
```

To resolve the error, use one of the following strategies.
+ Use wild card pattern syntax to allow list WSS traffic on port `443` across AWS Regions and AWS accounts.

  ```
  wss://*amazonaws.com
  ```
+ Use wild card pattern syntax to allow list WSS traffic on port `443` in one AWS Region and across AWS accounts in the AWS Region that you specify. The following example uses `us-east-1`.

  ```
  wss://*analytics-gateway.us-east-1.amazonaws.com
  ```

# Troubleshoot table errors
<a name="notebooks-spark-troubleshooting-tables"></a>

Use the information in this section to troubleshoot Athena for Spark table errors.

## Cannot create a path error when creating a table
<a name="notebooks-spark-troubleshooting-tables-illegal-argument-exception"></a>

**Error message**: IllegalArgumentException: Cannot create a path from an empty string.

**Cause**: This error can occur when you use Apache Spark in Athena to create a table in an AWS Glue database, and the database has an empty `LOCATION` property. 

**Suggested Solution**: For more information and solutions, see [Illegal argument exception when creating a table](notebooks-spark-known-issues.md#notebooks-spark-known-issues-illegal-argument-exception).

## AccessDeniedException when querying AWS Glue tables
<a name="notebooks-spark-troubleshooting-tables-glue-access-denied"></a>

**Error message**: pyspark.sql.utils.AnalysisException: Unable to verify existence of default database: com.amazonaws.services.glue.model.AccessDeniedException: User: arn:aws:sts::*aws-account-id*:assumed-role/AWSAthenaSparkExecutionRole-*unique-identifier*/AthenaExecutor-*unique-identifier* is not authorized to perform: glue:GetDatabase on resource: arn:aws:glue:*aws-region*:*aws-account-id*:catalog because no identity-based policy allows the glue:GetDatabase action (Service: AWSGlue; Status Code: 400; Error Code: AccessDeniedException; Request ID: *request-id*; Proxy: null)

**Cause**: The execution role for your Spark-enabled workgroup is missing permissions to access AWS Glue resources.

**Suggested Solution**: To resolve this issue, grant your execution role access to AWS Glue resources, and then edit your Amazon S3 bucket policy to grant access to your execution role.

The following procedure describes these steps in greater detail.

**To grant your execution role access to AWS Glue resources**

1. Open the Athena console at [https://console.aws.amazon.com/athena/](https://console.aws.amazon.com/athena/home).

1. If the console navigation pane is not visible, choose the expansion menu on the left.  
![\[Choose the expansion menu.\]](http://docs.aws.amazon.com/athena/latest/ug/images/nav-pane-expansion.png)

1. In the Athena console navigation pane, choose **Workgroups**.

1. On the **Workgroups** page, choose the link of the workgroup that you want to view.

1. On the **Overview Details** page for the workgroup, choose the **Role ARN** link. The link opens the Spark execution role in the IAM console.

1. In the **Permissions policies** section, choose the linked role policy name.

1. Choose **Edit policy**, and then choose **JSON**.

1. Add AWS Glue access to the role. Typically, you add permissions for the `glue:GetDatabase` and `glue:GetTable` actions. For more information on configuring IAM roles, see [Adding and removing IAM identity permissions](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html) in the IAM User Guide. 

1. Choose **Review policy**, and then choose **Save changes**.

1. Edit your Amazon S3 bucket policy to grant access to the execution role. Note that you must grant the role access to both the bucket and the objects in the bucket. For steps, see [Adding a bucket policy using the Amazon S3 console](https://docs.aws.amazon.com/AmazonS3/latest/userguide/add-bucket-policy.html) in the Amazon Simple Storage Service User Guide.

# Get support
<a name="notebooks-spark-troubleshooting-support"></a>

For assistance from AWS, choose **Support**, **Support Center** from the AWS Management Console. To facilitate your experience, please have the following information ready:
+ Athena query ID
+ Session ID
+ Calculation ID