# Monitoring with Amazon CloudWatch
Monitoring with CloudWatch

 You can monitor DynamoDB Accelerator (DAX) using Amazon CloudWatch, which collects and processes raw data from DAX into readable, near real-time metrics. These statistics are recorded for a period of two weeks. You can then access historical information for a better perspective on how your web application or service is performing. By default, DAX metric data is sent to CloudWatch automatically. For more information, see [What Is Amazon CloudWatch?](https://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/WhatIsCloudWatch.html) in the *Amazon CloudWatch User Guide*. 

**Topics**
+ [

## How do I use DAX metrics?
](#dax-how-to-use-metrics)
+ [

# Viewing DAX metrics and dimensions
](dax-metrics-dimensions-dax.md)
+ [

# Creating CloudWatch alarms to monitor DAX
](dax-creating-alarms.md)
+ [

# Production monitoring
](dax-production-monitoring.md)

## How do I use DAX metrics?
Using metrics

 The metrics reported by DAX provide information that you can analyze in different ways. The following list shows some common uses for the metrics. These are suggestions to get you started, and not a comprehensive list. 


****  

|   How Can I?   |   Relevant Metrics   | 
| --- | --- | 
|  Determine if any system errors occurred  |   Monitor `FaultRequestCount` to determine if any requests resulted in an HTTP 500 (server error) code. This can indicate a DAX internal service error or an HTTP 500 in the underlying table's [SystemErrors metric](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/metrics-dimensions.html).   | 
|  Determine if any user errors occurred  |   Monitor `ErrorRequestCount` to determine if any requests resulted in an HTTP 400 (client error) code. If you see the error count growing, you might want to investigate and make sure you are sending correct client requests.   | 
|  Determine if any cache misses occurred  |   Monitor `ItemCacheMisses` to determine the number of times an item was not found in the cache, and `QueryCacheMisses` and `ScanCacheMisses` to determine the number of times a query or scan result was not found in the cache.   | 
|  Monitor cache hit rates  |   Use [CloudWatch Metric Math](http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/using-metric-math.html) to define a cache hit rate metric using math expressions.   For example, for the item cache, you can use the expression m1/SUM([m1, m2])\$1100, where m1 is the `ItemCacheHits` metric and m2 is the `ItemCacheMisses` metric for your cluster. For the query and scan caches, you can follow the same pattern using the corresponding query and scan cache metric.   | 

# Viewing DAX metrics and dimensions
Metrics and dimensions

 When you interact with Amazon DynamoDB, it sends metrics and dimensions to Amazon CloudWatch. You can use the following procedures to view the metrics for DynamoDB Accelerator (DAX). 

**To view metrics (console)**

 Metrics are grouped first by the service namespace, and then by the various dimension combinations within each namespace. 

1. Open the CloudWatch console at [https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/).

1.  In the navigation pane, choose **Metrics**. 

1.  Select the **DAX** namespace. 

**To view metrics (AWS CLI)**
+  At a command prompt, use the following command. 

  ```
  1. aws cloudwatch list-metrics --namespace "AWS/DAX"
  ```

## DAX metrics and dimensions


 The following sections contain the metrics and dimensions that DAX sends to CloudWatch. 

### DAX Metrics


The following metrics are available from DAX. DAX sends metrics to CloudWatch only when they have a non-zero value.

**Note**  
CloudWatch aggregates the following DAX metrics at one-minute intervals:  
`CPUUtilization`
`CacheMemoryUtilization`
`NetworkBytesIn`
`NetworkBytesOut`
`BaselineNetworkBytesInUtilization`
`BaselineNetworkBytesOutUtilization`
`NetworkPacketsIn`
`NetworkPacketsOut`
`GetItemRequestCount`
`BatchGetItemRequestCount`
`BatchWriteItemRequestCount`
`DeleteItemRequestCount`
`PutItemRequestCount`
`UpdateItemRequestCount`
`TransactWriteItemsCount`
`TransactGetItemsCount`
`ItemCacheHits`
`ItemCacheMisses`
`QueryCacheHits`
`QueryCacheMisses`
`ScanCacheHits`
`ScanCacheMisses`
`TotalRequestCount`
`ErrorRequestCount`
`FaultRequestCount`
`FailedRequestCount`
`QueryRequestCount`
`ScanRequestCount`
`ClientConnections`
`EstimatedDbSize`
`EvictedSize`
`CPUCreditUsage`
`CPUCreditBalance`
`CPUSurplusCreditBalance`
`CPUSurplusCreditsCharged`

Not all statistics, such as `Average` or `Sum`, are applicable for every metric. However, all of these values are available through the DAX console, or by using the CloudWatch console, AWS CLI, or AWS SDKs for all metrics. In the following table, each metric has a list of valid statistics that are applicable to that metric.


****  

| Metric | Description | 
| --- | --- | 
| CPUUtilization |  The percentage of CPU utilization of the node or cluster. Units: `Percent` Valid Statistics: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/dax-metrics-dimensions-dax.html)  | 
| CacheMemoryUtilization |  The percentage of available cache memory that is in use by the item cache and query cache on the node or cluster. Cached data starts to be evicted prior to memory utilization reaching 100% (see `EvictedSize` metric). If `CacheMemoryUtilization` reaches 100% on any node, write requests will be throttled and you should consider switching to a cluster with a larger node type. Units: `Percent` Valid Statistics: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/dax-metrics-dimensions-dax.html)  | 
| NetworkBytesIn |  The number of bytes received on all network interfaces by the node or cluster. Units: `Bytes` Valid Statistics: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/dax-metrics-dimensions-dax.html)  | 
| NetworkBytesOut |  The number of bytes sent out on all network interfaces by the node or cluster. This metric identifies the volume of outgoing traffic in terms of the number of bytes on a single node or cluster. Units: `Bytes` Valid Statistics: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/dax-metrics-dimensions-dax.html)  | 
| BaselineNetworkBytesInUtilization |  The percentage of the consumed baseline network bandwidth at a given time for ingress traffic. For reference, 50% means half of the available network bandwidth for ingress traffic is being used. Units: `Percent` Valid Statistics: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/dax-metrics-dimensions-dax.html)  | 
| BaselineNetworkBytesOutUtilization |  The percentage of the consumed baseline network bandwidth at a given time for egress traffic. For reference, 50% means half of the available network bandwidth for egress traffic is being used. Units: `Percent` Valid Statistics: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/dax-metrics-dimensions-dax.html)  | 
| NetworkPacketsIn |  The number of packets received on all network interfaces by the node or cluster. Units: `Count` Valid Statistics: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/dax-metrics-dimensions-dax.html)  | 
| NetworkPacketsOut |  The number of packets sent out on all network interfaces by the node or cluster. This metric identifies the volume of outgoing traffic in terms of the number of packets on a single node or cluster. Units: `Count` Valid Statistics:  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/dax-metrics-dimensions-dax.html)  | 
| GetItemRequestCount |  The number of `GetItem` requests handled by the node or cluster. Units: `Count` Valid Statistics: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/dax-metrics-dimensions-dax.html)  | 
| BatchGetItemRequestCount |  The number of `BatchGetItem` requests handled by the node or cluster. Units: `Count` Valid Statistics: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/dax-metrics-dimensions-dax.html)  | 
| BatchWriteItemRequestCount |  The number of `BatchWriteItem` requests handled by the node or cluster. Units: `Count` Valid Statistics: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/dax-metrics-dimensions-dax.html)  | 
| DeleteItemRequestCount |  The number of `DeleteItem` requests handled by the node or cluster. Units: `Count` Valid Statistics: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/dax-metrics-dimensions-dax.html)  | 
| PutItemRequestCount |  The number of `PutItem` requests handled by the node or cluster. Units: `Count` Valid Statistics: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/dax-metrics-dimensions-dax.html)  | 
| UpdateItemRequestCount |  The number of `UpdateItem` requests handled by the node or cluster. Units: `Count` Valid Statistics: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/dax-metrics-dimensions-dax.html)  | 
| TransactWriteItemsCount |  The number of `TransactWriteItems` requests handled by the node or cluster. Units: `Count` Valid Statistics: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/dax-metrics-dimensions-dax.html)  | 
| TransactGetItemsCount |  The number of `TransactGetItems` requests handled by the node or cluster. Units: `Count` Valid Statistics: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/dax-metrics-dimensions-dax.html)  | 
| ItemCacheHits |  The number of times an item was returned from the cache by the node or cluster. Units: `Count` Valid Statistics: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/dax-metrics-dimensions-dax.html)  | 
| ItemCacheMisses |  The number of times an item was not in the node or cluster cache, and had to be retrieved from DynamoDB. Units: `Count` Valid Statistics: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/dax-metrics-dimensions-dax.html)  | 
| QueryCacheHits |  The number of times a query result was returned from the node or cluster cache. Units: `Count` Valid Statistics: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/dax-metrics-dimensions-dax.html)  | 
| QueryCacheMisses |  The number of times a query result was not in the node or cluster cache, and had to be retrieved from DynamoDB. Units: `Count` Valid Statistics: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/dax-metrics-dimensions-dax.html)  | 
| ScanCacheHits |  The number of times a scan result was returned from the node or cluster cache. Units: `Count` Valid Statistics: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/dax-metrics-dimensions-dax.html)  | 
| ScanCacheMisses |  The number of times a scan result was not in the node or cluster cache, and had to be retrieved from DynamoDB. Units: `Count` Valid Statistics: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/dax-metrics-dimensions-dax.html)  | 
| TotalRequestCount |  Total number of requests handled by the node or cluster. Units: `Count` Valid Statistics: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/dax-metrics-dimensions-dax.html)  | 
| ErrorRequestCount |  Total number of requests that resulted in a user error reported by the node or cluster. Requests that were throttled by the node or cluster are included. Units: `Count` Valid Statistics: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/dax-metrics-dimensions-dax.html)  | 
| ThrottledRequestCount |  Total number of requests throttled by the node or cluster. Requests that were throttled by DynamoDB are not included, and can be monitored using [DynamoDB Metrics](metrics-dimensions.md). Units: `Count` Valid Statistics: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/dax-metrics-dimensions-dax.html)  | 
| FaultRequestCount |  Total number of requests that resulted in an internal error reported by the node or cluster. Units: `Count` Valid Statistics: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/dax-metrics-dimensions-dax.html)  | 
| FailedRequestCount |  Total number of requests that resulted in an error reported by the node or cluster. Units: `Count` Valid Statistics: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/dax-metrics-dimensions-dax.html)  | 
| QueryRequestCount |  The number of query requests handled by the node or cluster. Units: `Count` Valid Statistics: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/dax-metrics-dimensions-dax.html)  | 
| ScanRequestCount |  The number of scan requests handled by the node or cluster. Units: `Count` Valid Statistics: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/dax-metrics-dimensions-dax.html)  | 
| ClientConnections |  The number of simultaneous connections made by clients to the node or cluster. Units: `Count` Valid Statistics: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/dax-metrics-dimensions-dax.html)  | 
| EstimatedDbSize |  An approximation of the amount of data cached in the item cache and the query cache by the node or cluster. Units: `Bytes` Valid Statistics: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/dax-metrics-dimensions-dax.html)  | 
| EvictedSize |  The amount of data that was evicted by the node or cluster to make room for newly requested data. If the miss rate goes up, and you see this metric also growing, it probably means that your working set has increased. You should consider switching to a cluster with a larger node type. Units: `Bytes` Valid Statistics: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/dax-metrics-dimensions-dax.html)  | 
| CPUCreditUsage |  The number of CPU credits spent by the node for CPU utilization. One CPU credit equals one vCPU running at 100% utilization for one minute or an equivalent combination of vCPUs, utilization, and time (for example, one vCPU running at 50% utilization for two minutes or two vCPUs running at 25% utilization for two minutes). CPU credit metrics are available at a five-minute frequency only. If you specify a period greater than five minutes, use the `Sum` statistic instead of the `Average`. Units: `Credits (vCPU-minutes)` Valid Statistics: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/dax-metrics-dimensions-dax.html)  | 
| CPUCreditBalance |  The number of earned CPU credits that a node has accrued since it was launched or started. Credits are accrued in the credit balance after they are earned, and removed from the credit balance when they are spent. The credit balance has a maximum limit, determined by the DAX node size. After the limit is reached, any new credits that are earned are discarded. The credits in the `CPUCreditBalance` are available for the node to spend to burst beyond its baseline CPU utilization. Units: `Credits (vCPU-minutes)` Valid Statistics: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/dax-metrics-dimensions-dax.html)  | 
| CPUSurplusCreditBalance |  The number of surplus credits that have been spent by a DAX node when its `CPUCreditBalance` value is zero. The `CPUSurplusCreditBalance` value is paid down by earned CPU credits. If the number of surplus credits exceeds the maximum number of credits that the node can earn in a 24-hour period, the spent surplus credits above the maximum incur an additional charge. Units: `Credits (vCPU-minutes)` Valid Statistics: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/dax-metrics-dimensions-dax.html)  | 
| CPUSurplusCreditsCharged |  The number of spent surplus credits that are not paid down by earned CPU credits, and which thus incur an additional charge. Spent surplus credits are charged when the spent surplus credits exceed the maximum number of credits that the node can earn in a 24-hour period. Spent surplus credits above the maximum are charged at the end of the hour or when the node is terminated. Units: `Credits (vCPU-minutes)` Valid Statistics: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/dax-metrics-dimensions-dax.html)  | 

**Note**  
The `CPUCreditUsage`, `CPUCreditBalance`, `CPUSurplusCreditBalance`, and `CPUSurplusCreditsCharged` metrics are available only for T3 nodes.

### Dimensions for DAX Metrics


The metrics for DAX are qualified by the values for the account, cluster ID, or cluster ID and node ID combination. You can use the CloudWatch console to retrieve DAX data along any of the dimensions in the following table.


****  

|  Dimension  |  CloudWatch Metric Namespace  |  Description  | 
| --- | --- | --- | 
|  Account  |  DAX Metrics  |  Provides aggregated statistics across all nodes in an account.  | 
|  ClusterId  |  Cluster Metrics  |  Limits the data to a cluster.   | 
|  ClusterId, NodeId  |  ClusterId, NodeId  |  Limits the data to a node within a cluster.   | 

# Creating CloudWatch alarms to monitor DAX
Creating alarms

 You can create an Amazon CloudWatch alarm that sends an Amazon Simple Notification Service (Amazon SNS) message when the alarm changes state. An alarm watches a single metric over a time period that you specify. It performs one or more actions based on the value of the metric relative to a given threshold over a number of time periods. The action is a notification that is sent to an Amazon SNS topic or Auto Scaling policy. Alarms invoke actions for sustained state changes only. CloudWatch alarms do not invoke actions simply because they are in a particular state. The state must have changed and been maintained for a specified number of periods. 

## How can I be notified of query cache misses?


1. Create an Amazon SNS topic, `arn:aws:sns:us-west-2:522194210714:QueryMissAlarm`.

   For more information, see [Set Up Amazon Simple Notification Service](http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/US_SetupSNS.html) in the *Amazon CloudWatch User Guide*.

1. Create the alarm.

   ```
   aws cloudwatch put-metric-alarm \
       --alarm-name QueryCacheMissesAlarm  \
       --alarm-description "Alarm over query cache misses" \
       --namespace AWS/DAX \
       --metric-name QueryCacheMisses  \
       --dimensions Name=ClusterID,Value=myCluster \
       --statistic Sum \
       --threshold 8 \
       --comparison-operator GreaterThanOrEqualToThreshold \
       --period 60 \
       --evaluation-periods 1 \
       --alarm-actions arn:aws:sns:us-west-2:522194210714:QueryMissAlarm
   ```

1. Test the alarm.

   ```
   aws cloudwatch set-alarm-state --alarm-name QueryCacheMissesAlarm --state-reason "initializing" --state-value OK
   ```

   ```
   aws cloudwatch set-alarm-state --alarm-name QueryCacheMissesAlarm --state-reason "initializing" --state-value ALARM
   ```

**Note**  
 You can increase or decrease the threshold to one that makes sense for your application. You can also use [CloudWatch Metric Math](http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/using-metric-math.html) to define a cache miss rate metric and set an alarm over that metric.

## How can I be notified if requests cause an internal error in the cluster?


1. Create an Amazon SNS topic, `arn:aws:sns:us-west-2:123456789012:notify-on-system-errors`.

   For more information, see [Set Up Amazon Simple Notification Service](http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/US_SetupSNS.html) in the *Amazon CloudWatch User Guide*.

1. Create the alarm.

   ```
   aws cloudwatch put-metric-alarm \
       --alarm-name FaultRequestCountAlarm \
       --alarm-description "Alarm when a request causes an internal error" \
       --namespace AWS/DAX \
       --metric-name FaultRequestCount \
       --dimensions Name=ClusterID,Value=myCluster \
       --statistic Sum \
       --threshold 0 \
       --comparison-operator GreaterThanThreshold \
       --period 60 \
       --unit Count \
       --evaluation-periods 1 \
       --alarm-actions arn:aws:sns:us-east-1:123456789012:notify-on-system-errors
   ```

1. Test the alarm.

   ```
   aws cloudwatch set-alarm-state --alarm-name FaultRequestCountAlarm --state-reason "initializing" --state-value OK
   ```

   ```
   aws cloudwatch set-alarm-state --alarm-name FaultRequestCountAlarm --state-reason "initializing" --state-value ALARM
   ```

# Production monitoring


 You should establish a baseline for normal DAX performance in your environment, by measuring performance at various times and under different load conditions. As you monitor DAX, you should consider storing historical monitoring data. This stored data gives you a baseline from which to compare current performance data, identify normal performance patterns and performance anomalies, and devise methods to address issues. 

 To establish a baseline, you should, at minimum, monitor the following items both during load testing and in production. 
+  CPU utilization and throttled requests, so that you can determine whether you might need to use a larger node type in your cluster. The CPU utilization of your cluster is available through the `CPUUtilization` CloudWatch metric. The average stat on this metric provides an average CPU utilization view across all the nodes in your cluster. For cluster scaling decisions, we recommend that you use the maximum stat which is the maximum utilization across all the nodes. 
**Note**  
AWS has improved the `CPUUtilization` metric's granularity. You might observe changes to the metric starting from 2024-05-17 to 2024-06-22.
+  Operation latency (as measure on the client side) should remain consistent within your application's latency requirements. 
+  Error rates should remain low, as seen from the `ErrorRequestCount`, `FaultRequestCount`, and `FailedRequestCount` CloudWatch metrics.
+  Network bytes consumption, so that you can determine if you should use more nodes or a larger node type in your cluster. To monitor consumption, you can set alerts on `BaselineNetworkBytesInUtilization` and `BaselineNetworkBytesOutUtilization` metrics available in CloudWatch, which indicates percentage consumption of available network bandwidth for your instance type, for ingress and egress traffic respectively. 
+ Cache memory utilization and evicted size, so that you can determine whether the cluster's node type has sufficient memory to hold your working set, and if not, switch to a larger node type.
**Note**  
 In case of a large number of cache misses and writes, cache memory utilization can increase up to 100% and may cause availability downtime. 
+  Client connections, so that you can monitor for any unexplained spikes in connections to the cluster.