

# Amazon Athena CloudWatch Metrics connector
CloudWatch metrics

The Amazon Athena CloudWatch Metrics connector enables Amazon Athena to query CloudWatch Metrics data with SQL.

This connector does not use Glue Connections to centralize configuration properties in Glue. Connection configuration is done through Lambda.

For information on publishing query metrics to CloudWatch from Athena itself, see [Use CloudWatch and EventBridge to monitor queries and control costs](workgroups-control-limits.md).

## Prerequisites

+ Deploy the connector to your AWS account using the Athena console or the AWS Serverless Application Repository. For more information, see [Create a data source connection](connect-to-a-data-source.md) or [Use the AWS Serverless Application Repository to deploy a data source connector](connect-data-source-serverless-app-repo.md).

## Parameters


Use the parameters in this section to configure the CloudWatch Metrics connector.

### Glue connections (recommended)


We recommend that you configure a CloudWatch Metrics connector by using a Glue connections object. To do this, set the `glue_connection` environment variable of the CloudWatch Metrics connector Lambda to the name of the Glue connection to use.

**Glue connections properties**

Use the following command to get the schema for a Glue connection object. This schema contains all the parameters that you can use to control your connection.

```
aws glue describe-connection-type --connection-type CLOUDWATCHMETRICS
```

**Lambda environment properties**
+ **glue\$1connection** – Specifies the name of the Glue connection associated with the federated connector. 

**Note**  
All connectors that use Glue connections must use AWS Secrets Manager to store credentials.
The CloudWatch Metrics connector created using Glue connections does not support the use of a multiplexing handler.
The CloudWatch Metrics connector created using Glue connections only supports `ConnectionSchemaVersion` 2.

### Legacy connections

+ **spill\$1bucket** – Specifies the Amazon S3 bucket for data that exceeds Lambda function limits.
+ **spill\$1prefix** – (Optional) Defaults to a subfolder in the specified `spill_bucket` called `athena-federation-spill`. We recommend that you configure an Amazon S3 [storage lifecycle](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lifecycle-mgmt.html) on this location to delete spills older than a predetermined number of days or hours.
+ **spill\$1put\$1request\$1headers** – (Optional) A JSON encoded map of request headers and values for the Amazon S3 `putObject` request that is used for spilling (for example, `{"x-amz-server-side-encryption" : "AES256"}`). For other possible headers, see [PutObject](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObject.html) in the *Amazon Simple Storage Service API Reference*.
+ **kms\$1key\$1id** – (Optional) By default, any data that is spilled to Amazon S3 is encrypted using the AES-GCM authenticated encryption mode and a randomly generated key. To have your Lambda function use stronger encryption keys generated by KMS like `a7e63k4b-8loc-40db-a2a1-4d0en2cd8331`, you can specify a KMS key ID.
+ **disable\$1spill\$1encryption** – (Optional) When set to `True`, disables spill encryption. Defaults to `False` so that data that is spilled to S3 is encrypted using AES-GCM – either using a randomly generated key or KMS to generate keys. Disabling spill encryption can improve performance, especially if your spill location uses [server-side encryption](https://docs.aws.amazon.com/AmazonS3/latest/userguide/serv-side-encryption.html).

The connector also supports [AIMD congestion control](https://en.wikipedia.org/wiki/Additive_increase/multiplicative_decrease) for handling throttling events from CloudWatch through the [Amazon Athena Query Federation SDK](https://github.com/awslabs/aws-athena-query-federation/tree/master/athena-federation-sdk) `ThrottlingInvoker` construct. You can tweak the default throttling behavior by setting any of the following optional environment variables:
+ **throttle\$1initial\$1delay\$1ms** – The initial call delay applied after the first congestion event. The default is 10 milliseconds.
+ **throttle\$1max\$1delay\$1ms** – The maximum delay between calls. You can derive TPS by dividing it into 1000ms. The default is 1000 milliseconds.
+ **throttle\$1decrease\$1factor** – The factor by which Athena reduces the call rate. The default is 0.5
+ **throttle\$1increase\$1ms** – The rate at which Athena decreases the call delay. The default is 10 milliseconds.

## Databases and tables


The Athena CloudWatch Metrics connector maps your namespaces, dimensions, metrics, and metric values into two tables in a single schema called `default`.

### The metrics table


The `metrics` table contains the available metrics as uniquely defined by a combination of namespace, set, and name. The `metrics` table contains the following columns.
+ **namespace** – A `VARCHAR` containing the namespace.
+ **metric\$1name** – A `VARCHAR` containing the metric name.
+ **dimensions** – A `LIST` of `STRUCT` objects composed of `dim_name (VARCHAR)` and `dim_value (VARCHAR)`.
+ **statistic** – A `LIST` of `VARCH` statistics (for example, `p90`, `AVERAGE`, ...) available for the metric.

### The metric\$1samples table


The `metric_samples` table contains the available metric samples for each metric in the `metrics` table. The `metric_samples` table contains the following columns.
+ **namespace** – A `VARCHAR` that contains the namespace.
+ **metric\$1name** – A `VARCHAR` that contains the metric name.
+ **dimensions** – A `LIST` of `STRUCT` objects composed of `dim_name (VARCHAR)` and `dim_value (VARCHAR)`.
+ **dim\$1name** – A `VARCHAR` convenience field that you can use to easily filter on a single dimension name.
+ **dim\$1value** – A `VARCHAR` convenience field that you can use to easily filter on a single dimension value.
+ **period** – An `INT` field that represents the "period" of the metric in seconds (for example, a 60 second metric).
+ **timestamp** – A `BIGINT` field that represents the epoch time in seconds that the metric sample is for.
+ **value** – A `FLOAT8` field that contains the value of the sample.
+ **statistic** – A `VARCHAR` that contains the statistic type of the sample (for example, `AVERAGE` or `p90`).

## Required Permissions


For full details on the IAM policies that this connector requires, review the `Policies` section of the [athena-cloudwatch-metrics.yaml](https://github.com/awslabs/aws-athena-query-federation/blob/master/athena-cloudwatch-metrics/athena-cloudwatch-metrics.yaml) file. The following list summarizes the required permissions.
+ **Amazon S3 write access** – The connector requires write access to a location in Amazon S3 in order to spill results from large queries.
+ **Athena GetQueryExecution** – The connector uses this permission to fast-fail when the upstream Athena query has terminated.
+ **CloudWatch Metrics ReadOnly** – The connector uses this permission to query your metrics data.
+ **CloudWatch Logs Write** – The connector uses this access to write its diagnostic logs.

## Performance


The Athena CloudWatch Metrics connector attempts to optimize queries against CloudWatch Metrics by parallelizing scans of the log streams required for your query. For certain time period, metric, namespace, and dimension filters, predicate pushdown is performed both within the Lambda function and within CloudWatch Logs.

## License information


The Amazon Athena CloudWatch Metrics connector project is licensed under the [Apache-2.0 License](https://www.apache.org/licenses/LICENSE-2.0.html).

## Additional resources


For additional information about this connector, visit [the corresponding site](https://github.com/awslabs/aws-athena-query-federation/tree/master/athena-cloudwatch-metrics) on GitHub.com.