# Querying Cost and Usage Reports using Amazon Athena
<a name="cur-query-athena"></a>

Amazon Athena is a serverless query service that you can use to analyze the data from your AWS Cost and Usage Reports (AWS CUR) in Amazon Simple Storage Service (Amazon S3) using standard SQL. This helps you avoid having to create your own data warehouse solutions to query AWS CUR data.

We strongly recommend that you create both a new Amazon S3 bucket and a new AWS CUR report to use with Athena. AWS CUR supports only the Apache Parquet compression format for Athena and automatically overwrites previous reports that are stored in your S3 bucket.

This section outlines how to use Athena with Cost and Usage Reports. For a full description of the Athena service, see the [Amazon Athena User Guide](https://docs.aws.amazon.com/athena/latest/ug/).

**Topics**
+ [Setting up Athena using CloudFormation templates](use-athena-cf.md)
+ [Setting up Athena manually](cur-ate-manual.md)
+ [Running Amazon Athena queries](cur-ate-run.md)
+ [Loading report data to other resources](cur-query-other.md)

For a demonstration of querying reports using Athena, see the following video.

[![AWS Videos](http://img.youtube.com/vi/https://www.youtube.com/embed/KEeJEZTYE8E/0.jpg)](http://www.youtube.com/watch?v=https://www.youtube.com/embed/KEeJEZTYE8E)


# Setting up Athena using CloudFormation templates
<a name="use-athena-cf"></a>

**Important**  
CloudFormation doesn't support cross-Region resources. If you plan to use an CloudFormation template, you must create all resources in the same AWS Region. The Region must support the following services:  
AWS Lambda
Amazon Simple Storage Service (Amazon S3)
AWS Glue
Amazon Athena

To streamline and automate integration of your Cost and Usage Reports with Athena, AWS provides an CloudFormation template with several key resources along with the reports that you set up for Athena integration. The CloudFormation template includes an AWS Glue crawler, an AWS Glue database, and an AWS Lambda event. 

The Athena integration setup process using CloudFormation removes any Amazon S3 events that your bucket might already have. This can negatively affect any existing event-based processes that you have for an existing AWS CUR report. We strongly recommend that you create both a new Amazon S3 bucket and a new AWS CUR report to use with Athena.

Before you can use a CloudFormation template to automate Athena integration, make sure that you do the following:
+ Create a new Amazon S3 bucket for your reports. For more information, see [Creating a bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/create-bucket-overview.html) in the *Amazon S3 User Guide*. 
+ [Create a new report](cur-create.md) to use with Athena. During the setup process, for **Report data integration**, choose **Athena**.
+ Wait for the first report to be delivered to your Amazon S3 bucket. It can take up to 24 hours for AWS to deliver your first report.<a name="use-athena-cf-steps"></a>

**To use the Athena CloudFormation template**

1. Open the Amazon S3 console at [https://console.aws.amazon.com/s3/](https://console.aws.amazon.com/s3/).

1. From the list of buckets, choose the bucket where you chose to receive your AWS CUR report.

1. Choose your report path prefix (*your-report-path-prefix/*). Then, choose your report name (*your-report-name/*).

1. Choose the `.yml` template file.

1. Choose **Object actions**, and then choose **Download as**.

1. Open the CloudFormation console at [https://console.aws.amazon.com/cloudformation](https://console.aws.amazon.com/cloudformation/).

1. If you have never used CloudFormation before, choose **Create New Stack**. Otherwise, choose **Create Stack**.

1. Under **Prepare template**, choose **Template is ready**.

1. Under **Template source**, choose **Upload a template file**.

1. Choose **Choose file**.

1. Choose the downloaded `.yml` template, and then choose **Open**.

1. Choose **Next**.

1. For **Stack name**, enter a name for your template and choose **Next**.

1. Choose **Next**.

1. At the bottom of the page, select **I acknowledge that AWS CloudFormation might create IAM resources.** 

   This template creates the following resources:
   + Three IAM roles
   + An AWS Glue database
   + An AWS Glue crawler
   + Two Lambda functions
   + An Amazon S3 notification

1. Choose **Create stack**.

**To update the existing Athena CloudFormation template**

1. Open the Amazon S3 console at [https://console.aws.amazon.com/s3/](https://console.aws.amazon.com/s3/).

1. From the list of buckets, choose the bucket where you chose to receive your AWS CUR report.

1. Choose your report path prefix (*your-report-path-prefix/*). Then, choose your report name (*your-report-name/*).

1. Choose the `.yml` template file.

1. Choose **Object actions**, and then choose **Download as**.

1. Open the CloudFormation console at [https://console.aws.amazon.com/cloudformation](https://console.aws.amazon.com/cloudformation/).

1. Select the stack that was previously created, and then choose **Update**.

1. Under **Prepare template**, choose **Replace current template**.

1. Under **Template source**, choose **Upload a template file**.

1. Choose **Choose file**.

1. Choose the downloaded `.yml` template, and then choose **Open**.

1. Choose **Next**.

1. On the **Specify stack details** page, modify any details, and then choose **Next**.

1. Choose **Next**.

1. At the bottom of the page, select **I acknowledge that AWS CloudFormation might create IAM resources.** 

1. Choose **Update stack**.

# Setting up Athena manually
<a name="cur-ate-manual"></a>

We strongly recommend that you use the AWS CloudFormation template to create your table instead of creating it yourself. The provided SQL query creates a table that covers only a single month of data, but the AWS CloudFormation template creates a table that can include multiple months and that updates automatically. For more information on how to set up the AWS CloudFormation template, see [Setting up Athena using CloudFormation templates](use-athena-cf.md).

If you choose not to use the CloudFormation template to set up your Athena table, manually follow the steps below. You need to create a table before you can run SQL queries on your AWS CUR data. You will need to do this step at least once a month and the table only includes data from the current AWS CUR.

As part of the table creation process, AWS transforms the AWS CUR column names. For more information about the transformation process, see [Column names](cur-ate-run.md#column-transformations).
+ [Creating an Athena table](create-manual-table.md)
+ [Creating a Cost and Usage Reports status table](create-manual-cur-table.md)
+ [Uploading your report partitions](upload-report-partitions.md)

# Creating an Athena table
<a name="create-manual-table"></a>

AWS includes the SQL that you need to run to create this table in your AWS CUR bucket.<a name="create-manual-table-steps"></a>

**To create your Athena table**

1. Sign in to the AWS Management Console and open the Amazon S3 console at [https://console.aws.amazon.com/s3/](https://console.aws.amazon.com/s3/).

1. From the list of buckets, choose the bucket where you chose to receive your Cost and Usage Reports.

1. Navigate the path `your-report-prefix-your-report-name-path-to-report`.

   The exact path depends on whether your AWS CUR is set to overwrite previous versions. For more information, see [Cost and Usage Reports delivery timeline](understanding-report-versions.md#access-cur-s3-timeline).

1. Open the file `my-report-name-create-table.sql`.

1. Copy the SQL from the file, starting with `CREATE` and ending with `LOCATION 's3://your-report-prefix/your-report-name/the-rest-of-the=path'`. Take note of the first line, as you need the database name and table to create the Athena database.

1. Open the Athena console at [https://console.aws.amazon.com/athena/](https://console.aws.amazon.com/athena/home).

1. In the **New query 1** query pane, paste the following SQL. For *`<database name>.<table name>`*, use the database and table name from the first line of the SQL that you copied.

   ```
   CREATE DATABASE <database name>
   ```

1. Choose **Run query**.

1. In the dropdown menu, choose the database that you just created.

1. In the **New query 1** query pane, paste the rest of the SQL from the SQL file.

1. Choose **Run query**.

After you create your table, you need to load your partitions before you can run a query. For more information, see [Uploading your report partitions](upload-report-partitions.md).

# Creating a Cost and Usage Reports status table
<a name="create-manual-cur-table"></a>

AWS refreshes your AWS CUR multiple times a day. There isn't a way for Athena to tell when AWS is in the process of refreshing your report, which can lead to query results with a combination of old and new data. To mitigate this, create a table to track whether AWS is refreshing your Cost and Usage Reports and query that table to see if AWS is refreshing your data. You only need to create this table once. After that, AWS keeps the table up to date.<a name="create-refresh-table"></a>

**To create your refresh table**

1. Open the Athena console at [https://console.aws.amazon.com/athena/](https://console.aws.amazon.com/athena/home).

1. In the **New query 1** query pane, paste the following SQL. 

   ```
   CREATE EXTERNAL TABLE IF NOT EXISTS cost_and_usage_data_status(
     status STRING)
   ROW FORMAT SERDE
     'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
   WITH SERDEPROPERTIES (
    'serialization.format' = '1'
   )
   LOCATION 's3://{S3_Bucket_Name}/{Report_Key}/cost_and_usage_data_status/'
   ```

1. Choose **Run query**.

To check whether AWS is refreshing your data, use the Athena console to run the following SQL query.

```
select status from cost_and_usage_data_status 
```

# Uploading your report partitions
<a name="upload-report-partitions"></a>

To query your Cost and Usage Reports data, you need to upload the data into your Athena table. You must do this for each new AWS CUR report that AWS delivers to you.<a name="upload-partitions"></a>

**To upload your latest partitions**

1. Open the Athena console at [https://console.aws.amazon.com/athena/](https://console.aws.amazon.com/athena/home).

1. Choose the vertical three dots next to your table name.

1. Choose **Load partitions**.

If you don't upload your partitions, Athena returns either no results or an error message that indicates missing data.

# Running Amazon Athena queries
<a name="cur-ate-run"></a>

To run Athena queries on your data, first use the Athena console to check whether AWS is refreshing your data and then run your query on the Athena console. When you run your SQL, make sure that the correct database is selected from the dropdown list. You can use the following SQL to check the status.

```
select status from cost_and_usage_data_status 
```

The two possible results are `READY` and `UPDATING`. If the status is `READY`, then you can query your Athena database. If the status is `UPDATING`, then Athena might return incomplete results.

After you've confirmed that AWS is refreshing your data, you can run your own queries. For example, the following query shows year-to-date costs by service for each month in the example database called `mycostandusage_parquet`. The following query shows 2018 year-to-date costs. Update the year to see current year-to-date costs.

```
SELECT line_item_product_code,
sum(line_item_blended_cost) AS cost, month
FROM mycostandusage_parquet
WHERE year='2018'
GROUP BY  line_item_product_code, month
HAVING sum(line_item_blended_cost) > 0
ORDER BY  line_item_product_code;
```

## Column names
<a name="column-transformations"></a>

Athena column name restrictions are different from the Cost and Usage Reports column name restrictions. This means that when your AWS CUR data is uploaded into an Athena table, the column names change. AWS makes the following changes:
+ An underscore is added in front of uppercase letters
+ Uppercase letters are replaced with lowercase letters 
+ Any non-alphanumeric characters are replaced with an underscore
+ Duplicate underscores are removed
+ Any leading and trailing underscores are removed
+ If the column name is longer than the allowed length of column names, underscores are removed from left to right

**Note**  
After applying these rules, some of the resource tag columns will have duplicate names. AWS merges columns when there are more than one columns with the same name. 

As examples, the column name `ExampleColumnName` becomes `example_column_name`, and the column name `Example Column Name` becomes `example_column_name`.

# Loading report data to other resources
<a name="cur-query-other"></a>

You can upload Cost and Usage Reports to Amazon Redshift and Amazon Quick to analyze your AWS cost and usage.

**Topics**
+ [Loading report data to Amazon Quick](#cur-query-other-qs)
+ [Loading report data to Amazon Redshift](#cur-query-other-rs)

## Loading report data to Amazon Quick
<a name="cur-query-other-qs"></a>

You can upload your Cost and Usage Reports into Amazon Quick.

For more information about uploading to Quick, see [Creating a Data Set Using Amazon S3 Files](https://docs.aws.amazon.com/quicksight/latest/user/create-a-data-set-s3.html) in the *Quick User Guide*.

## Loading report data to Amazon Redshift
<a name="cur-query-other-rs"></a>

This section shows how you can upload AWS CUR to Amazon Redshift to analyze your AWS costs and usage.

**Important**  
Amazon Redshift columns aren't case sensitive and has stricter character limitations than user-defined tags. To prevent conflicts between Amazon Redshift and user-defined tags, AWS replaces your tags with the tags `userTag0`, `userTag1`, `userTag2`, etc. After you create an Amazon Redshift table and upload your report into it, you can create an Amazon Redshift table that maps the AWS-defined tags to your user-defined tags. The tag table allows you to look up your original tags.  
For example, if you have the tags `OWNER` and `Owner`, Amazon Redshift doesn't allow you to create a table with two columns named "owner". Instead, you create a report table with the columns `userTag0` and `userTag1` instead of `OWNER` and `Owner`, and then create a table with the columns `remappedUserTag` and `userTag`. The `remappedUserTag` column stores the AWS-defined tags `userTag0` and `userTag1`, and the `userTag` column stores your original tags, `OWNER` and `Owner` 

AWS provides the commands to create your Amazon Redshift table, upload your report, create your tag table, and insert all of the tag rows into your tag table. The commands are provided to you in the `RedshiftCommands.sql` file that is stored alongside your manifest file in S3, and in the **Redshift file** **Helper file** in the Billing and Cost Management console. AWS also provides a RedshiftManifest file, which controls which report the commands in the RedshiftCommand file uploads. Deleting or removing the RedshiftManifest file breaks the copy command in the RedshiftCommands file.

**To find the `RedshiftCommands.sql` file in the Billing and Cost Management console**

1. Open the Billing and Cost Management console at [https://console.aws.amazon.com/costmanagement/](https://console.aws.amazon.com/costmanagement/).

1. In the navigation pane, under **Legacy Pages**, choose **Cost and Usage Reports**.

1. Choose the report that you want to upload to Amazon Redshift.

1. Next to **You have enabled viewing reports in the following service(s):**, choose **Amazon Redshift**.

1. Copy the commands from the dialog box and paste them into your SQL client.

The following procedure assumes familiarity with databases and Amazon Redshift.

**To upload an Cost and Usage Reports to Amazon Redshift**

1. Create an Amazon Redshift cluster.

   For more information, see [Creating a Cluster](https://docs.aws.amazon.com/redshift/latest/mgmt/managing-clusters-console.html#create-cluster) in the *Amazon Redshift Management Guide*.

1. Sign in to the AWS Management Console and open the Amazon S3 console at [https://console.aws.amazon.com/s3/](https://console.aws.amazon.com/s3/).

1. Navigate to the Amazon S3 location where you store your AWS CUR.

1. Open the `RedshiftCommands.sql` file.

   The file contains customized commands to create an Amazon Redshift table, upload the AWS CUR from Amazon S3, and create a tag table that allows user-defined tags to be imported into Amazon Redshift.

1. In the `copy` command, replace *<AWS\$1ROLE>* with the ARN of an IAM role that has permissions to access the Amazon S3 bucket where you store your AWS CUR.

1. Replace *<S3\$1BUCKET\$1REGION>* with the Region your Amazon S3 bucket is in. For example, `us-east-1`.

1. Use a SQL client to connect to the cluster.

   For more information, see [Accessing Amazon Redshift Clusters and Databases](https://docs.aws.amazon.com/redshift/latest/mgmt/using-rs-tools.html) in the *Amazon Redshift Management Guide*.

1. Copy the SQL commands from the `RedshiftCommands.sql` file to your SQL client in the following order:
   + create table - This command creates an Amazon Redshift table with a schema customized to match your report.
   + copy - This command uses the provided IAM role to upload the AWS CUR files from S3 to Amazon Redshift.
   + create tag table - This command creates a table that allows you to map AWS-defined tags to your user-defined tags.
   + insert - These commands insert the user-defined tags into the tag table.

1. After you have copied all of the data from your AWS CUR into Amazon Redshift, you can query the data using SQL. For more information about querying data in Amazon Redshift, see [Amazon Redshift SQL](https://docs.aws.amazon.com/redshift/latest/dg/c_redshift-sql.html) in the *Amazon Redshift Database Developer Guide*.

**Note**  
The number of columns in Cost and Usage Reports can change from month to month, such as when a new cost allocation tag is created or a service adds a new product attribute. We recommend that you copy the data from your AWS CUR into a new table every month, and then copy the columns that interest you into a separate month-by-month table.