# Concepts
<a name="alarm-concepts"></a>

CloudWatch alarms monitor metrics and trigger actions when thresholds are breached. Understanding how alarms evaluate data and respond to conditions is essential for effective monitoring.

**Topics**
+ [Alarm data queries](alarm-data-queries.md)
+ [Alarm evaluation](alarm-evaluation.md)
+ [PromQL alarms](alarm-promql.md)
+ [Composite alarms](alarm-combining.md)
+ [Alarm actions](alarm-actions.md)
+ [Alarm Mute Rules](alarm-mute-rules.md)
+ [Limits](alarm-limits.md)

# Alarm data queries
<a name="alarm-data-queries"></a>

CloudWatch alarms can monitor various data sources. Choose the appropriate query type based on your monitoring needs.

## Metrics
<a name="alarm-query-metrics"></a>

Monitor a single CloudWatch metric. This is the most common alarm type for tracking resource performance. For more information about metrics, see [CloudWatch Metrics concepts](cloudwatch_concepts.md).

For more information, see [Create a CloudWatch alarm based on a static threshold](ConsoleAlarms.md).

## Metric math
<a name="alarm-query-metric-math"></a>

You can set an alarm on the result of a math expression that is based on one or more CloudWatch metrics. A math expression used for an alarm can include as many as 10 metrics. Each metric must be using the same period.

For an alarm based on a math expression, you can specify how you want CloudWatch to treat missing data points. In this case, the data point is considered missing if the math expression doesn't return a value for that data point.

Alarms based on math expressions can't perform Amazon EC2 actions.

For more information about metric math expressions and syntax, see [Using math expressions with CloudWatch metrics](using-metric-math.md).

For more information, see [Create a CloudWatch alarm based on a metric math expression](Create-alarm-on-metric-math-expression.md).

## Metrics Insights
<a name="alarm-query-metrics-insights"></a>

 A CloudWatch Metrics Insights query helps you query metrics at scale using SQL-like syntax. You can create an alarm on any Metrics Insights query, including queries that return multiple time series. This capability significantly expands your monitoring options. When you create an alarm based on a Metrics Insights query, the alarm automatically adjusts as resources are added to or removed from your monitored group. Create the alarm once, and any resource that matches your query definition and filters joins the alarm monitoring scope when its corresponding metric becomes available. For multi-time series queries, each returned time series becomes a contributor to the alarm, allowing for more granular and dynamic monitoring.

Here are two primary use cases for CloudWatch Metrics Insights alarms:
+ Anomaly Detection and Aggregate Monitoring

  Create an alarm on a Metrics Insights query that returns a single aggregated time series. This approach works well for dynamic alarms that monitor aggregated metrics across your infrastructure or applications. For example, you can monitor the maximum CPU utilization across all your instances, with the alarm automatically adjusting as you scale your fleet.

  To create an aggregate monitoring alarm, use this query structure:

  ```
  SELECT FUNCTION(metricName)
                    FROM SCHEMA(...)
                    WHERE condition;
  ```
+ Per-Resource Fleet Monitoring

  Create an alarm that monitors multiple time series, where each time series functions as a contributor with its own state. The alarm activates when any contributor enters the ALARM state, triggering resource-specific actions. For example, monitor database connections across multiple RDS instances to prevent connection rejections.

  To monitor multiple time series, use this query structure:

  ```
  SELECT AVG(DatabaseConnections)
                    FROM AWS/RDS
                    WHERE condition
                    GROUP BY DBInstanceIdentifier
                    ORDER BY AVG() DESC;
  ```

  When creating multi-time series alarms, you must include two key clauses in your query:
  + A `GROUP BY` clause that defines how to structure the time series and determines how many time series the query will produce
  + An `ORDER BY` clause that establishes a deterministic sorting of your metrics, enabling the alarm to evaluate the most important signals first

  These clauses are essential for proper alarm evaluation. The `GROUP BY` clause splits your data into separate time series (for example, by instance ID), while the `ORDER BY` clause ensures consistent and prioritized processing of these time series during alarm evaluation.

For more information on how to create a multi time series alarm, see [Create an alarm based on a Multi Time Series Metrics Insights query](multi-time-series-alarm.md).

## Log group-metric filters
<a name="alarm-query-log-metric-filter"></a>

 You can create an alarm based on a log group-metric filter. With metric filters, you can look for terms and patterns in log data as the data is sent to CloudWatch. For more information, see [Create metrics from log events using filters](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/MonitoringLogData.html) in the *Amazon CloudWatch Logs User Guide*. 

For more information on how to create an alarm based on log group-metric filter, see [Alarming on logs](Alarm-On-Logs.md).

## PromQL
<a name="alarm-query-promql"></a>

You can create an alarm that uses a Prometheus Query Language (PromQL) instant query to monitor metrics ingested through the CloudWatch OTLP endpoint.

For more information about how PromQL alarms work, see [PromQL alarms](alarm-promql.md).

For more information on how to create a PromQL alarm, see [Create an alarm using a PromQL query](Create_PromQL_Alarm.md).

## External data source
<a name="alarm-query-external"></a>

You can create alarms that watch metrics from data sources that aren't in CloudWatch. For more information about creating connections to these other data sources, see [Query metrics from other data sources](MultiDataSourceQuerying.md).

For more information on how to create an alarm based on a connected data source, see [Create an alarm based on a connected data source](Create_MultiSource_Alarm.md).

# Alarm evaluation
<a name="alarm-evaluation"></a>

## Metric alarm states
<a name="alarm-states"></a>

A metric alarm has the following possible states:
+ `OK` – The metric or expression is within the defined threshold.
+ `ALARM` – The metric or expression is outside of the defined threshold.
+ `INSUFFICIENT_DATA` – The alarm has just started, the metric is not available, or not enough data is available for the metric to determine the alarm state.

## Alarm evaluation state
<a name="alarm-evaluation-state"></a>

In addition to the alarm state, each alarm has an evaluation state that provides information about the alarm evaluation process. The following states may occur:
+ `PARTIAL_DATA` – Indicates that not all the available data was able to be retrieved due to quota limitations. For more information, see [How partial data is handled](cloudwatch-metrics-insights-alarms-partial-data.md).
+ `EVALUATION_ERROR` – Indicates configuration errors in alarm setup that require review and correction. Refer to StateReason field of the alarm for more details.
+ `EVALUATION_FAILURE` – Indicates temporary CloudWatch issues. We recommend manual monitoring until the issue is resolved

You can view the evaluation state in the alarm details in the console, or by using the `describe-alarms` CLI command or `DescribeAlarms` API.

## Alarm evaluation settings
<a name="alarm-evaluation-settings"></a>

When you create an alarm, you specify three settings to enable CloudWatch to evaluate when to change the alarm state:
+ **Period** is the length of time to use to evaluate the metric or expression to create each individual data point for an alarm. It is expressed in seconds.
+ **Evaluation Periods** is the number of the most recent periods, or data points, to evaluate when determining alarm state.
+ **Datapoints to Alarm** is the number of data points within the Evaluation Periods that must be breaching to cause the alarm to go to the `ALARM` state. The breaching data points don't have to be consecutive, but they must all be within the last number of data points equal to **Evaluation Period**.

For any period of one minute or longer, an alarm is evaluated every minute and the evaluation is based on the window of time defined by the **Period** and **Evaluation Periods**. For example, if the **Period** is 5 minutes (300 seconds) and **Evaluation Periods** is 1, then at the end of minute 5 the alarm evaluates based on data from minutes 1 to 5. Then at the end of minute 6, the alarm is evaluated based on the data from minutes 2 to 6.

If the alarm period is 10 seconds, 20 seconds, or 30 seconds, the alarm is evaluated every 10 seconds. For more information, see [High-resolution alarms](#high-resolution-alarms).

If the number of evaluation periods multiplied by the length of each evaluation period exceeds one day, the alarm is evaluated once per hour. For more details about how these multi-day alarms are evaluated, see [Example of evaluating a multi-day alarm](#evaluate-multiday-alarm).

In the following figure, the alarm threshold for a metric alarm is set to three units. Both **Evaluation Period** and **Datapoints to Alarm** are 3. That is, when all existing data points in the most recent three consecutive periods are above the threshold, the alarm goes to `ALARM` state. In the figure, this happens in the third through fifth time periods. At period six, the value dips below the threshold, so one of the periods being evaluated is not breaching, and the alarm state changes back to `OK`. During the ninth time period, the threshold is breached again, but for only one period. Consequently, the alarm state remains `OK`.

![\[Alarm threshold trigger alarm\]](http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/images/alarm_graph.png)


When you configure **Evaluation Periods** and **Datapoints to Alarm** as different values, you're setting an "M out of N" alarm. **Datapoints to Alarm** is ("M") and **Evaluation Periods** is ("N").The evaluation interval is the number of evaluation periods multiplied by the period length. For example, if you configure 4 out of 5 data points with a period of 1 minute, the evaluation interval is 5 minutes. If you configure 3 out of 3 data points with a period of 10 minutes, the evaluation interval is 30 minutes.

**Note**  
If data points are missing soon after you create an alarm, and the metric was being reported to CloudWatch before you created the alarm, CloudWatch retrieves the most recent data points from before the alarm was created when evaluating the alarm.

## High-resolution alarms
<a name="high-resolution-alarms"></a>

If you set an alarm on a high-resolution metric, you can specify a high-resolution alarm with a period of 10 seconds, 20 seconds, or 30 seconds. There is a higher charge for high-resolution alarms. For more information about high-resolution metrics, see [Publish custom metrics](publishingMetrics.md).

## Example of evaluating a multi-day alarm
<a name="evaluate-multiday-alarm"></a>

An alarm is a multi-day alarm if the number of evaluation periods multiplied by the length of each evaluation period exceeds one day. Multi-day alarms are evaluated once per hour. When multi-day alarms are evaluated, CloudWatch takes into account only the metrics up to the current hour at the :00 minute when evaluating.

For example, consider an alarm that monitors a job that runs every 3 days at 10:00.

1. At 10:02, the job fails

1. At 10:03, the alarm evaluates and stays in `OK` state, because the evaluation considers data only up to 10:00.

1. At 11:03, the alarm considers data up to 11:00 and goes into `ALARM` state.

1. At 11:43, you correct the error and the job now runs successfully.

1. At 12:03, the alarm evaluates again, sees the successful job, and returns to `OK` state.

# Configuring how CloudWatch alarms treat missing data
<a name="alarms-and-missing-data"></a>

Sometimes, not every expected data point for a metric gets reported to CloudWatch. For example, this can happen when a connection is lost, a server goes down, or when a metric reports data only intermittently by design.

CloudWatch enables you to specify how to treat missing data points when evaluating an alarm. This helps you to configure your alarm so that it goes to `ALARM` state only when appropriate for the type of data being monitored. You can avoid false positives when missing data doesn't indicate a problem.

Similar to how each alarm is always in one of three states, each specific data point reported to CloudWatch falls under one of three categories:
+ Not breaching (within the threshold)
+ Breaching (violating the threshold)
+ Missing

For each alarm, you can specify CloudWatch to treat missing data points as any of the following:
+ `notBreaching` – Missing data points are treated as "good" and within the threshold
+ `breaching` – Missing data points are treated as "bad" and breaching the threshold
+ `ignore` – The current alarm state is maintained
+ `missing` – If all data points in the alarm evaluation range are missing, the alarm transitions to INSUFFICIENT\$1DATA.

The best choice depends on the type of metric and the purpose of the alarm. For example, if you are creating an application rollback alarm using a metric that continually reports data, you might want to treat missing data points as breaching, because it might indicate that something is wrong. But for a metric that generates data points only when an error occurs, such as `ThrottledRequests` in Amazon DynamoDB, you would want to treat missing data as `notBreaching`. The default behavior is `missing`.

**Important**  
Alarms configured on Amazon EC2 metrics can temporarily enter the INSUFFICIENT\$1DATA state if there are missing metric data points. This is rare, but can happen when the metric reporting is interrupted, even when the Amazon EC2 instance is healthy. For alarms on Amazon EC2 metrics that are configured to take stop, terminate, reboot, or recover actions, we recommend that you configure those alarms to treat missing data as `missing`, and to have these alarms trigger only when in the ALARM state.

Choosing the best option for your alarm prevents unnecessary and misleading alarm condition changes, and also more accurately indicates the health of your system.

**Important**  
Alarms that evaluate metrics in the `AWS/DynamoDB` namespace default to ignore missing data. You can override this if you choose a different option for how the alarm should treat missing data. When an `AWS/DynamoDB` metric has missing data, alarms that evaluate that metric remain in their current state.

## How alarm state is evaluated when data is missing
<a name="alarms-evaluating-missing-data"></a>

Whenever an alarm evaluates whether to change state, CloudWatch attempts to retrieve a higher number of data points than the number specified as **Evaluation Periods**. The exact number of data points it attempts to retrieve depends on the length of the alarm period and whether it is based on a metric with standard resolution or high resolution. The time frame of the data points that it attempts to retrieve is the *evaluation range*.

Once CloudWatch retrieves these data points, the following happens:
+ If no data points in the evaluation range are missing, CloudWatch evaluates the alarm based on the most recent data points collected. The number of data points evaluated is equal to the **Evaluation Periods** for the alarm. The extra data points from farther back in the evaluation range are not needed and are ignored.
+ If some data points in the evaluation range are missing, but the total number of existing data points that were successfully retrieved from the evaluation range is equal to or more than the alarm's **Evaluation Periods**, CloudWatch evaluates the alarm state based on the most recent real data points that were successfully retrieved, including the necessary extra data points from farther back in the evaluation range. In this case, the value you set for how to treat missing data is not needed and is ignored.
+ If some data points in the evaluation range are missing, and the number of actual data points that were retrieved is lower than the alarm's number of **Evaluation Periods**, CloudWatch fills in the missing data points with the result you specified for how to treat missing data, and then evaluates the alarm. However, all real data points in the evaluation range are included in the evaluation. CloudWatch uses missing data points only as few times as possible. 

**Note**  
A particular case of this behavior is that CloudWatch alarms might repeatedly re-evaluate the last set of data points for a period of time after the metric has stopped flowing. This re-evaluation might cause the alarm to change state and re-execute actions, if it had changed state immediately prior to the metric stream stopping. To mitigate this behavior, use shorter periods.

The following tables illustrate examples of the alarm evaluation behavior. In the first table, **Datapoints to Alarm** and **Evaluation Periods** are both 3. CloudWatch retrieves the 5 most recent data points when evaluating the alarm, in case some of the most recent 3 data points are missing. 5 is the evaluation range for the alarm.

Column 1 shows the 5 most recent data points, because the evaluation range is 5. These data points are shown with the most recent data point on the right. 0 is a non-breaching data point, X is a breaching data point, and - is a missing data point.

Column 2 shows how many of the 3 necessary data points are missing. Even though the most recent 5 data points are evaluated, only 3 (the setting for **Evaluation Periods**) are necessary to evaluate the alarm state. The number of data points in Column 2 is the number of data points that must be "filled in", using the setting for how missing data is being treated. 

In columns 3-6, the column headers are the possible values for how to treat missing data. The rows in these columns show the alarm state that is set for each of these possible ways to treat missing data.


| Data points | \$1 of data points that must be filled | MISSING | IGNORE | BREACHING | NOT BREACHING | 
| --- | --- | --- | --- | --- | --- | 
|  0 - X - X  |  0  |  `OK`  |  `OK`  |  `OK`  |  `OK`  | 
|  0 - - - -  |  2  |  `OK`  |  `OK`  |  `OK`  |  `OK`  | 
|  - - - - -  |  3  |  `INSUFFICIENT_DATA`  |  Retain current state  |  `ALARM`  |  `OK`  | 
|  0 X X - X  |  0  |  `ALARM`  |  `ALARM`  |  `ALARM`  |  `ALARM`  | 
|  - - X - -   |  2  |  `ALARM`  |  Retain current state  |  `ALARM`  |  `OK`  | 

In the second row of the preceding table, the alarm stays `OK` even if missing data is treated as breaching, because the one existing data point is not breaching, and this is evaluated along with two missing data points which are treated as breaching. The next time this alarm is evaluated, if the data is still missing it will go to `ALARM`, as that non-breaching data point will no longer be in the evaluation range.

The third row, where all five of the most recent data points are missing, illustrates how the various settings for how to treat missing data affect the alarm state. If missing data points are considered breaching, the alarm goes into ALARM state, while if they are considered not breaching, then the alarm goes into OK state. If missing data points are ignored, the alarm retains the current state it had before the missing data points. And if missing data points are just considered as missing, then the alarm does not have enough recent real data to make an evaluation, and goes into INSUFFICIENT\$1DATA.

In the fourth row, the alarm goes to `ALARM` state in all cases because the three most recent data points are breaching, and the alarm's **Evaluation Periods** and **Datapoints to Alarm** are both set to 3. In this case, the missing data point is ignored and the setting for how to evaluate missing data is not needed, because there are 3 real data points to evaluate.

Row 5 represents a special case of alarm evaluation called *premature alarm state*. For more information, see [Avoiding premature transitions to alarm state](#CloudWatch-alarms-avoiding-premature-transition).

In the next table, the **Period** is again set to 5 minutes, and **Datapoints to Alarm** is only 2 while **Evaluation Periods** is 3. This is a 2 out of 3, M out of N alarm.

The evaluation range is 5. This is the maximum number of recent data points that are retrieved and can be used in case some data points are missing.


| Data points | \$1 of missing data points | MISSING | IGNORE | BREACHING | NOT BREACHING | 
| --- | --- | --- | --- | --- | --- | 
|  0 - X - X  |  0  |  `ALARM`  |  `ALARM`  |  `ALARM`  |  `ALARM`  | 
|  0 0 X 0 X  |  0  |  `ALARM`  |  `ALARM`  |  `ALARM`  |  `ALARM`  | 
|  0 - X - -  |  1  |  `OK`  |  `OK`  |  `ALARM`  |  `OK`  | 
|  - - - - 0  |  2  |  `OK`  |  `OK`  |  `ALARM`  |  `OK`  | 
|  - - - X -  |  2  |  `ALARM`  |  Retain current state  |  `ALARM`  |  `OK`  | 

In rows 1 and 2, the alarm always goes to ALARM state because 2 of the 3 most recent data points are breaching. In row 2, the two oldest data points in the evaluation range are not needed because none of the 3 most recent data points are missing, so these two older data points are ignored.

In rows 3 and 4, the alarm goes to ALARM state only if missing data is treated as breaching, in which case the two most recent missing data points are both treated as breaching. In row 4, these two missing data points that are treated as breaching provide the two necessary breaching data points to trigger the ALARM state.

Row 5 represents a special case of alarm evaluation called *premature alarm state*. For more information, see the following section.

### Avoiding premature transitions to alarm state
<a name="CloudWatch-alarms-avoiding-premature-transition"></a>

CloudWatch alarm evaluation includes logic to try to avoid false alarms, where the alarm goes into ALARM state prematurely when data is intermittent. The example shown in row 5 in the tables in the previous section illustrate this logic. In those rows, and in the following examples, the **Evaluation Periods** is 3 and the evaluation range is 5 data points. **Datapoints to Alarm** is 3, except for the M out of N example, where **Datapoints to Alarm** is 2.

Suppose an alarm's most recent data is `- - - - X`, with four missing data points and then a breaching data point as the most recent data point. Because the next data point may be non-breaching, the alarm does not go immediately into ALARM state when the data is either `- - - - X` or `- - - X -` and **Datapoints to Alarm** is 3. This way, false positives are avoided when the next data point is non-breaching and causes the data to be `- - - X O` or `- - X - O`.

However, if the last few data points are `- - X - -`, the alarm goes into ALARM state even if missing data points are treated as missing. This is because alarms are designed to always go into ALARM state when the oldest available breaching datapoint during the **Evaluation Periods** number of data points is at least as old as the value of **Datapoints to Alarm**, and all other more recent data points are breaching or missing. In this case, the alarm goes into ALARM state even if the total number of datapoints available is lower than M (**Datapoints to Alarm**).

This alarm logic applies to M out of N alarms as well. If the oldest breaching data point during the evaluation range is at least as old as the value of **Datapoints to Alarm**, and all of the more recent data points are either breaching or missing, the alarm goes into ALARM state no matter the value of M (**Datapoints to Alarm**).

## Missing Data in CloudWatch Metrics Insights alarms
<a name="mi-missing-data-treatment"></a>

 ** Alarms based on Metrics Insights queries that aggregate to a single time series ** 

 The missing data scenarios and their effects upon alarm evaluation are the same as a standard metric alarm in terms of the configured missing data treatment. See, [Configuring how CloudWatch alarms treat missing data](#alarms-and-missing-data). 

 ** Alarms based on Metrics Insights queries that produce multiple time series ** 

Missing data scenarios for Metrics Insights alarms occur when: 
+  Individual datapoints within a time series are not present. 
+  One or more time series disappear when evaluating upon multiple time series. 
+  No time series are retrieved by the query. 

 Missing data scenarios affect the alarm evaluation in the following manner: 
+  For the evaluation of a time series, the treat missing data treatment is applied for individual datapoints within the time series. For example, if 3 datapoints were queried for the time series but only 1 was received, 2 datapoints would follow the configured missing data configuration. 
+  If a time series is not retrieved by the query anymore, it will transition to `OK` no matter the treat missing data treatment. Alarm actions associated with the `OK` transition at the contributor level are executed and the `StateReason` specifies that the aforementioned contributor was not found with the message, "No data was returned for this contributor". The state of the alarm will depend on the state of the other contributors that were retrieved by the query. 
+  At alarm level, if the query returns an empty result (no time series at all), the treat missing data treatment is applied. For example, if the treat missing data was set as `BREACHING`, the alarm will transition to `ALARM`. 

# How partial data is handled
<a name="cloudwatch-metrics-insights-alarms-partial-data"></a>

## How partial data from a Metrics Insights query is evaluated
<a name="cloudwatch-metrics-insights-query-evaluation"></a>

If the Metrics Insights query used for the alarm matches more than 10,000 metrics, the alarm is evaluated based on the first 10,000 metrics that the query finds. This means that the alarm is being evaluated on partial data.

You can use the following methods to find whether a Metrics Insights alarm is currently evaluating its alarm state based on partial data: 
+ In the console, if you choose an alarm to see the **Details** page, the message **Evaluation warning: Not evaluating all data** appears on that page.
+ You see the value `PARTIAL_DATA` in the `EvaluationState` field when you use the [ describe-alarms](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/cloudwatch/describe-alarms.html?highlight=describe%20alarms) AWS CLI command or the [ DescribeAlarms](https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_DescribeAlarms.html) API.

Alarms also publish events to Amazon EventBridge when it goes into the partial data state, so you can create an EventBridge rule to watch for these events. In these events, the `evaluationState` field has the value `PARTIAL_DATA`. The following is an example.

```
{
    "version": "0",
    "id": "12345678-3bf9-6a09-dc46-12345EXAMPLE",
    "detail-type": "CloudWatch Alarm State Change",
    "source": "aws.cloudwatch",
    "account": "123456789012",
    "time": "2022-11-08T11:26:05Z",
    "region": "us-east-1",
    "resources": [
        "arn:aws:cloudwatch:us-east-1:123456789012:alarm:my-alarm-name"
    ],
    "detail": {
        "alarmName": "my-alarm-name",
        "state": {
            "value": "ALARM",
            "reason": "Threshold Crossed: 3 out of the last 3 datapoints [20000.0 (08/11/22 11:25:00), 20000.0 (08/11/22 11:24:00), 20000.0 (08/11/22 11:23:00)] were greater than the threshold (0.0) (minimum 1 datapoint for OK -> ALARM transition).",
            "reasonData": "{\"version\":\"1.0\",\"queryDate\":\"2022-11-08T11:26:05.399+0000\",\"startDate\":\"2022-11-08T11:23:00.000+0000\",\"period\":60,\"recentDatapoints\":[20000.0,20000.0,20000.0],\"threshold\":0.0,\"evaluatedDatapoints\":[{\"timestamp\":\"2022-11-08T11:25:00.000+0000\",\"value\":20000.0}]}",
            "timestamp": "2022-11-08T11:26:05.401+0000",
            "evaluationState": "PARTIAL_DATA"
        },
        "previousState": {
            "value": "INSUFFICIENT_DATA",
            "reason": "Unchecked: Initial alarm creation",
            "timestamp": "2022-11-08T11:25:51.227+0000"
        },
        "configuration": {
            "metrics": [
                {
                    "id": "m2",
                    "expression": "SELECT SUM(PartialDataTestMetric) FROM partial_data_test",
                    "returnData": true,
                    "period": 60
                }
            ]
        }
    }
}
```

If the query for the alarm includes a GROUP BY statement that initially returns more than 500 time series, the alarm is evaluated based on the first 500 time series that the query finds. However, if you use an ORDER BY clause, then all the time series that the query finds are sorted, and the 500 that have the highest or lowest values according to your ORDER BY clause are used to evaluate the alarm. 

## How partial data from a multi data source alarm is evaluated
<a name="multi-data-source-partial-data"></a>

If the Lambda function returns partial data:
+ The alarm continues to be evaluated on the data points that are returned.
+ You can use the following methods to find whether an alarm on a Lambda function is currently evaluating its alarm state based on partial data:
  + In the console, choose an alarm and choose the **Details** page. If you see the message **Evaluation warning: Not evaluating all data appears on that page**, it is evaluating on partial data.
  + If you see the value `PARTIAL_DATA` in the `EvaluationState` field when you use the `describe-alarms` AWS CLI command or the DescribeAlarms API, it is evaluating on partial data.
+ An alarm also publishes events to Amazon EventBridge when it goes into the partial data state.

# Percentile-based alarms and low data samples
<a name="percentiles-with-low-samples"></a>

When you set a percentile as the statistic for an alarm, you can specify what to do when there is not enough data for a good statistical assessment. You can choose to have the alarm evaluate the statistic anyway and possibly change the alarm state. Or, you can have the alarm ignore the metric while the sample size is low, and wait to evaluate it until there is enough data to be statistically significant.

For percentiles between 0.5 (inclusive) and 1.00 (exclusive), this setting is used when there are fewer than 10/(1-percentile) data points during the evaluation period. For example, this setting would be used if there were fewer than 1000 samples for an alarm on a p99 percentile. For percentiles between 0 and 0.5 (exclusive), the setting is used when there are fewer than 10/percentile data points.

# PromQL alarms
<a name="alarm-promql"></a>

A PromQL alarm monitors metrics using a Prometheus Query Language (PromQL) instant query. The query selects metrics ingested through the CloudWatch OTLP endpoint, and all matching time series returned by the query are considered to be breaching. The alarm evaluates the query at a regular interval and tracks each breaching time series independently as a *contributor*.

For information about ingesting metrics using OpenTelemetry, see [OpenTelemetry](CloudWatch-OpenTelemetry-Sections.md).

## How PromQL alarms work
<a name="promql-alarm-how-it-works"></a>

A PromQL alarm evaluates a PromQL instant query on a recurring schedule defined by the `EvaluationInterval`. The query returns only the time series that satisfy the condition. Each returned time series is a *contributor*, identified by its unique set of attributes.

The alarm uses duration-based state transitions:
+ When a contributor is returned by the query, it is considered *breaching*. If the contributor continues breaching for the duration specified by `PendingPeriod`, the contributor transitions to `ALARM` state.
+ When a contributor stops being returned by the query, it is considered *recovering*. If the contributor remains absent for the duration specified by `RecoveryPeriod`, the contributor transitions back to `OK` state.

The alarm is in `ALARM` state when at least one contributor has been breaching for longer than the pending period. The alarm returns to `OK` state when all contributors have recovered.

## PromQL alarm configuration
<a name="promql-alarm-configuration"></a>

A PromQL alarm is configured with the following parameters:
+ **PendingPeriod** is the duration in seconds that a contributor must continuously breach before the contributor transitions to `ALARM` state. This is equivalent to the Prometheus alert rule's `for` duration.
+ **RecoveryPeriod** is the duration in seconds that a contributor must stop breaching before the contributor transitions back to `OK` state. This is equivalent to the Prometheus alert rule's `keep_firing_for` duration.
+ **EvaluationInterval** is how frequently, in seconds, the alarm evaluates the PromQL query.

To create a PromQL alarm, see [Create an alarm using a PromQL query](Create_PromQL_Alarm.md).

# Composite alarms
<a name="alarm-combining"></a>

With CloudWatch, you can combine several alarms into one *composite alarm* to create a summarized, aggregated health indicator over a whole application or group of resources. Composite alarms are alarms that determine their state by monitoring the states of other alarms. You define rules to combine the status of those monitored alarms using Boolean logic.

You can use composite alarms to reduce alarm noise by taking actions only at an aggregated level. For example, you can create a composite alarm to send a notification to your web server team if any alarm related to your web server triggers. When any of those alarms goes into the ALARM state, the composite alarm goes itself in the ALARM state and sends a notification to your team. If other alarms related to your web server also go into the ALARM state, your team does not get overloaded with new notifications since the composite alarm has already notified them about the existing situation.

You can also use composite alarms to create complex alarming conditions and take actions only when many different conditions are met. For example, you can create a composite alarm that combines a CPU alarm and a memory alarm, and would only notify your team if both the CPU and the memory alarms have triggered.

**Using composite alarms**

When you use composite alarms, you have two options:
+ Configure the actions you want to take only at the composite alarm level, and create the underlying monitored alarms without actions
+ Configure a different set of actions at the composite alarm level. For example, the composite alarm actions could engage a different team in case of a widespread issue.

Composite alarms can take only the following actions:
+ Notify Amazon SNS topics
+ Invoke Lambda functions
+ Create OpsItems in Systems Manager Ops Center
+ Create incidents in Systems Manager Incident Manager

**Note**  
All the underlying alarms in your composite alarm must be in the same account and the same Region as your composite alarm. However, if you set up a composite alarm in a CloudWatch cross-account observability monitoring account, the underlying alarms can watch metrics in different source accounts and in the monitoring account itself. For more information, see [CloudWatch cross-account observability](CloudWatch-Unified-Cross-Account.md).  
 A single composite alarm can monitor 100 underlying alarms, and 150 composite alarms can monitor a single underlying alarm.

**Rule expressions**

All composite alarms contain rule expressions. Rule expressions tell composite alarms which other alarms to monitor and determine their states from. Rule expressions can refer to metric alarms and composite alarms. When you reference an alarm in a rule expression, you designate a function to the alarm that determines which of the following three states the alarm will be in:
+ ALARM

  ALARM ("alarm-name or alarm-ARN") is TRUE if the alarm is in ALARM state.
+ OK

  OK ("alarm-name or alarm-ARN") is TRUE if the alarm is in OK state.
+ INSUFFICIENT\$1DATA

  INSUFFICIENT\$1DATA (“alarm-name or alarm-ARN") is TRUE if the named alarm is in INSUFFICIENT\$1DATA state.

**Note**  
TRUE always evaluates to TRUE, and FALSE always evaluates to FALSE.

**Alarm references**

When referencing an alarm, using either the alarm name or ARN, the rule syntax can support referencing the alarm with or without quotation marks (") around the alarm name or ARN.
+ If specified without quotes, alarm names or ARNs must not contain spaces, round brackets, or commas.
+ If specified within quotes, alarm names or ARNs that *include* double quotes (") must enclose the " using backslash escape (\$1) characters for correct interpretation of the reference.

**Syntax**

The syntax of the expression you use to combine several alarms into one composite alarm uses boolean logic and functions. The following table describes the operators and functions available in rule expressions:


| Operator/Function | Description | 
| --- | --- | 
| AND | Logical AND operator. Returns TRUE when all specified conditions are TRUE. | 
| OR | Logical OR operator. Returns TRUE when at least one of the specified conditions is TRUE. | 
| NOT | Logical NOT operator. Returns TRUE when the specified condition is FALSE. | 
| AT\$1LEAST | Function that returns TRUE when a minimum number or percentage of specified alarms are in the required state. Format: AT\$1LEAST(M, STATE\$1CONDITION, (alarm1, alarm2, ...alarmN)) where M can be an absolute number or percentage (for example, 50%), and STATE\$1CONDITION can be ALARM, OK, INSUFFICIENT\$1DATA, NOT ALARM, NOT OK, or NOT INSUFFICIENT\$1DATA. | 

You can use parentheses to group conditions and control the order of evaluation in complex expressions.

**Example expressions**

The request parameter `AlarmRule` supports the use of the logical operators `AND`, `OR`, and `NOT`, as well as the `AT_LEAST` function, so you can combine multiple functions into a single expressions. The following example expressions show how you can configure the underlying alarms in your composite alarm: 
+ `ALARM(CPUUtilizationTooHigh) AND ALARM(DiskReadOpsTooHigh)`

  The expression specifies that the composite alarm goes into `ALARM` only if `CPUUtilizationTooHigh` and `DiskReadOpsTooHigh` are in `ALARM`.
+ `AT_LEAST(2, ALARM, (WebServer1CPU, WebServer2CPU, WebServer3CPU, WebServer4CPU))`

  The expression specifies that the composite alarm goes into `ALARM` when at least 2 out of the 4 web server CPU alarms are in `ALARM` state. This allows you to trigger alerts based on a threshold of affected resources rather than requiring all or just one to be in alarm state.
+ `AT_LEAST(50%, OK, (DatabaseConnection1, DatabaseConnection2, DatabaseConnection3, DatabaseConnection4))`

  The expression specifies that the composite alarm goes into `ALARM` when at least 50% of the database connection alarms are in `OK` state. Using percentages allows the rule to adapt dynamically as you add or remove monitored alarms.
+ `ALARM(CPUUtilizationTooHigh) AND NOT ALARM(DeploymentInProgress)`

  The expression specifies that the composite alarm goes into `ALARM` if `CPUUtilizationTooHigh` is in `ALARM` and `DeploymentInProgress` is not in `ALARM`. This is an example of a composite alarm that reduces alarm noise during a deployment window.
+ `AT_LEAST(2, ALARM, (AZ1Health, AZ2Health, AZ3Health)) AND NOT ALARM(MaintenanceWindow)`

  The expression specifies that the composite alarm goes into `ALARM` when at least 2 out of 3 availability zone health alarms are in `ALARM` state and the maintenance window alarm is not in `ALARM`. This combines the AT\$1LEAST function with other logical operators for more complex monitoring scenarios.

# Alarm suppression
<a name="alarm-suppression"></a>

Composite alarm action suppression allows you to temporarily disable alarm actions without deleting or modifying the alarm configuration. This is useful during planned maintenance, deployments, or when investigating known issues.

With composite alarm action suppression, you define alarms as suppressor alarms. Suppressor alarms prevent composite alarms from taking actions. For example, you can specify a suppressor alarm that represents the status of a supporting resource. If the supporting resource is down, the suppressor alarm prevents the composite alarm from sending notifications.

## When to use alarm suppression
<a name="alarm-suppression-use-cases"></a>

Common situations where alarm suppression is useful:
+ Maintenance windows of your application
+ Application deployments
+ Ongoing incident investigation
+ Testing and development activities

## How suppressor alarms work
<a name="alarm-suppression-how-it-works"></a>

You specify suppressor alarms when you configure composite alarms. Any alarm can function as a suppressor alarm. When a suppressor alarm changes states from `OK` to `ALARM`, its composite alarm stops taking actions. When a suppressor alarm changes states from `ALARM` to `OK`, its composite alarm resumes taking actions.

Because composite alarms allow you to get an aggregated view of your health across multiple alarms, there are common situations where it is expected for those alarms to trigger. For example, during a maintenance window of your application or when you investigate an ongoing incident. In such situations, you may want to suppress the actions of your composite alarms, to prevent unwanted notifications or the creation of new incident tickets

 With composite alarm action suppression, you define alarms as suppressor alarms. Suppressor alarms prevent composite alarms from taking actions. For example, you can specify a suppressor alarm that represents the status of a supporting resource. If the supporting resource is down, the suppressor alarm prevents the composite alarm from sending notifications. Composite alarm action suppression helps you reduce alarm noise, so you spend less time managing your alarms and more time focusing on your operations. 

 You specify suppressor alarms when you configure composite alarms. Any alarm can function as a suppressor alarm. When a suppressor alarm changes states from `OK` to `ALARM`, its composite alarm stops taking actions. When a suppressor alarm changes states from `ALARM` to `OK`, its composite alarm resumes taking actions. 

### `WaitPeriod` and `ExtensionPeriod`
<a name="Create_Composite_Alarm_Suppression_Wait_Extension"></a>

 When you specify a suppressor alarm, you set the parameters `WaitPeriod` and `ExtensionPeriod`. These parameters prevent composite alarms from taking actions unexpectedly while suppressor alarms change states. Use `WaitPeriod` to compensate for any delays that can occur when a suppressor alarm changes from `OK` to `ALARM`. For example, if a suppressor alarm changes from `OK` to `ALARM` within 60 seconds, set `WaitPeriod` to 60 seconds. 

![\[Actions suppression within WaitPeriod\]](http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/images/example1border.png)


 In the image, the composite alarm changes from `OK` to `ALARM` at t2. A `WaitPeriod` starts at t2 and ends at t8. This gives the suppressor alarm time to change states from `OK` to `ALARM` at t4 before it suppresses the composite alarm's actions when the `WaitPeriod` expires at t8. 

 Use `ExtensionPeriod` to compensate for any delays that can occur when a composite alarm changes to `OK` following a suppressor alarm changing to `OK`. For example, if a composite alarm changes to `OK` within 60 seconds of a suppressor alarm changing to `OK`, set `ExtensionPeriod` to 60 seconds. 

![\[Actions suppression within ExtensionPeriod\]](http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/images/example2border.png)


 In the image, the suppressor alarm changes from `ALARM` to `OK` at t2. An `ExtensionPeriod` starts at t2 and ends at t8. This gives the composite alarm time to change from `ALARM` to `OK` before the `ExtensionPeriod` expires at t8. 

 Composite alarms don't take actions when `WaitPeriod` and `ExtensionPeriod` become active. Composite alarms take actions that are based on their currents states when `ExtensionPeriod` and `WaitPeriod` become inactive. We recommend that you set the value for each parameter to 60 seconds, as evaluates metric alarms every minute. You can set the parameters to any integer in seconds. 

 The following examples describe in more detail how `WaitPeriod` and `ExtensionPeriod` prevent composite alarms from taking actions unexpectedly. 

**Note**  
 In the following examples, `WaitPeriod` is configured as 2 time units, and `ExtensionPeriod` is configured as 3 time units. 

#### Examples
<a name="example_scenarios"></a>

 ** Example 1: Actions are not suppressed after `WaitPeriod` ** 

![\[first example of action suppression\]](http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/images/example3border.png)


 In the image, the composite alarm changes states from `OK` to `ALARM` at t2. A `WaitPeriod` starts at t2 and ends at t4, so it can prevent the composite alarm from taking actions. After the `WaitPeriod` expires at t4, the composite alarm takes its actions because the suppressor alarm is still in `OK`. 

 ** Example 2: Actions are suppressed by alarm before `WaitPeriod` expires ** 

![\[second example of action suppression\]](http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/images/example4border.png)


 In the image, the composite alarm changes states from `OK` to `ALARM` at t2. A `WaitPeriod` starts at t2 and ends at t4. This gives the suppressor alarm time to change states from `OK` to `ALARM` at t3. Because the suppressor alarm changes states from `OK` to `ALARM` at t3, the `WaitPeriod` that started at t2 is discarded, and the suppressor alarm now stops the composite alarm from taking actions. 

 ** Example 3: State transition when actions are suppressed by `WaitPeriod` ** 

![\[third example of action suppression\]](http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/images/example5border.png)


 In the image, the composite alarm changes states from `OK` to `ALARM` at t2. A `WaitPeriod` starts at t2 and ends at t4. This gives the suppressor alarm time to change states. The composite alarm changes back to `OK` at t3, so the `WaitPeriod` that started at t2 is discarded. A new `WaitPeriod` starts at t3 and ends at t5. After the new `WaitPeriod` expires at t5, the composite alarm takes its actions. 

 ** Example 4: State transition when actions are suppressed by alarm ** 

![\[fourth example of action suppression\]](http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/images/cwasexamplefourborder.png)


 In the image, the composite alarm changes states from `OK` to `ALARM` at t2. The suppressor alarm is already in `ALARM`. The suppressor alarm stops the composite alarm from taking actions. 

 ** Example 5: Actions are not suppressed after `ExtensionPeriod` ** 

![\[fifth example of action suppression\]](http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/images/example7border.png)


 In the image, the composite alarm changes states from `OK` to `ALARM` at t2. A `WaitPeriod` starts at t2 and ends at t4. This gives the suppressor alarm time to change states from `OK` to `ALARM` at t3 before it suppresses the composite alarm's actions until t6. Because the suppressor alarm changes states from `OK` to `ALARM` at t3, the `WaitPeriod` that started at t2 is discarded. At t6, the suppressor alarm changes to `OK`. An `ExtensionPeriod` starts at t6 and ends at t9. After the `ExtensionPeriod` expires, the composite alarm takes its actions. 

 ** Example 6: State transition when actions are suppressed by `ExtensionPeriod` ** 

![\[sixth example of action suppression\]](http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/images/cwasexamplesixrborder.png)


 In the image, the composite alarm changes states from `OK` to `ALARM` at t2. A `WaitPeriod` starts at t2 and ends at t4. This gives the suppressor alarm time to change states from `OK` to `ALARM` at t3 before it suppresses the composite alarm's actions until t6. Because the suppressor alarm changes states from `OK` to `ALARM` at t3, the `WaitPeriod` that started at t2 is discarded. At t6, the suppressor alarm changes back to `OK`. An `ExtensionPeriod` starts at t6 and ends at t9. When the composite alarm changes back to `OK` at t7, the `ExtensionPeriod` is discarded, and a new `WaitPeriod` starts at t7 and ends at t9. 

**Tip**  
 If you replace the action suppressor alarm, any active `WaitPeriod` or `ExtensionPeriod` is discarded. 

## Action Suppression and Mute Rules
<a name="action-suppression-and-mute-rules"></a>

 When both action suppression and alarm mute rules are active for a composite alarm, mute rules take precedence and suppress all alarm actions. After the mute window ends, the composite alarm's action suppression configuration determines whether actions are executed based on the suppressor alarm state and configured wait or extension periods. For more information about alarm mute rules, see [Alarm Mute Rules](alarm-mute-rules.md). 

# Alarm actions
<a name="alarm-actions"></a>

You can specify what actions an alarm takes when it changes state between the OK, ALARM, and INSUFFICIENT\$1DATA states.

Most actions can be set for the transition into each of the three states. Except for Auto Scaling actions, the actions happen only on state transitions, and are not performed again if the condition persists for hours or days.

The following are supported as alarm actions:
+ Notify one or more subscribers by using an Amazon Simple Notification Service topic. Subscribers can be applications as well as persons.
+ Invoke a Lambda function. This is the easiest way for you to automate custom actions on alarm state changes.
+ Alarms based on EC2 metrics can also perform EC2 actions, such as stopping, terminating, rebooting, or recovering an EC2 instance.
+ Alarms can perform actions to scale an Auto Scaling group.
+ Alarms can create OpsItems in Systems Manager Ops Center or create incidents in AWS Systems Manager Incident Manager. These actions are performed only when the alarm goes into ALARM state.
+ An alarm can start an investigation when it goes into ALARM state.

Alarms also emit events to Amazon EventBridge when they change state, and you can set up Amazon EventBridge to trigger other actions for these state changes.

## Alarm actions and notifications
<a name="alarm-actions-notifications"></a>

The following table shows the actions executed for alarms along with their behavior for multiple time series (or contributors) alarms:


| Action Type | Metrics Insights Multiple Time Series Alarm support | PromQL Alarm support | More Information | 
| --- | --- | --- | --- | 
| SNS notifications | Contributor Level | Contributor Level | [Amazon SNS event destinations](https://docs.aws.amazon.com/sns/latest/dg/sns-event-destinations.html) | 
| EC2 actions (stop, terminate, reboot, recover) | Not supported | Not supported | [Stop, terminate, reboot, or recover an EC2 instance](UsingAlarmActions.md) | 
| Auto Scaling actions | Not supported | Not supported | [Step and simple scaling policies for Amazon EC2 Auto Scaling](https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-scaling-simple-step.html) | 
| Systems Manager OpsItem creation | Alarm Level | Not supported | [Configure CloudWatch alarms to create OpsItems](https://docs.aws.amazon.com/systems-manager/latest/userguide/OpsCenter-create-OpsItems-from-CloudWatch-Alarms.html) | 
| Systems Manager Incident Manager incidents | Alarm Level | Not supported | [Creating incidents automatically with CloudWatch alarms](https://docs.aws.amazon.com/incident-manager/latest/userguide/incident-creation.html#incident-tracking-auto-alarms) | 
| Lambda function invocation | Contributor Level | Contributor Level | [Invoke a Lambda function from an alarm](alarms-and-actions-Lambda.md) | 
| CloudWatch investigations investigation | Alarm Level | Not supported | [Start a CloudWatch investigations from an alarm](Start-Investigation-Alarm.md) | 

The content of alarm notifications differs depending on the alarm type:
+ Single-metric alarms include both a state reason and detailed state reason data, showing the specific datapoints that caused the state change.
+ Multi-time series Metrics Insights alarms provide a simplified state reason for each contributor, without the detailed state reason data block.
+ PromQL alarms do not include a state reason or state reason data in their notifications.

**Example Notification Content Examples**  
Single-metric alarm notification includes detailed data:  

```
{
  "stateReason": "Threshold Crossed: 3 out of the last 3 datapoints [32.6 (03/07/25 08:29:00), 33.8 (03/07/25 08:24:00), 41.0 (03/07/25 08:19:00)] were greater than the threshold (31.0)...",
  "stateReasonData": {
    "version": "1.0",
    "queryDate": "2025-07-03T08:34:06.300+0000",
    "startDate": "2025-07-03T08:19:00.000+0000",
    "statistic": "Average",
    "period": 300,
    "recentDatapoints": [41, 33.8, 32.6],
    "threshold": 31,
    "evaluatedDatapoints": [
      {
        "timestamp": "2025-07-03T08:29:00.000+0000",
        "sampleCount": 5,
        "value": 32.6
      }
      // Additional datapoints...
    ]
  }
}
```
Multiple time series Metrics Insights Alarm SNS notification for Contributor example:  

```
{
  "AlarmName": "DynamoDBInsightsAlarm",
  "NewStateValue": "ALARM",
  "NewStateReason": "Threshold Crossed: 1 datapoint was less than the threshold (1.0). The most recent datapoint which crossed the threshold: [0.0 (01/12/25 13:34:00)].",
  "StateChangeTime": "2025-12-01T13:42:04.919+0000",
  "OldStateValue": "OK",
  "AlarmContributorId": "6d442278dba546f6",
  "AlarmContributorAttributes": {
    "TableName": "example-dynamodb-table-name"
  }
  // Additional information...
}
```
PromQL Alarm SNS notification for Contributor example:  

```
{
  "AlarmName": "HighCPUUsageAlarm",
  "NewStateValue": "ALARM",
  "StateChangeTime": "2025-12-01T13:42:04.919+0000",
  "OldStateValue": "OK",
  "AlarmContributorId": "1d502278dcd546a1",
  "AlarmContributorAttributes": {
    "team": "example-team-name"
  }
  // Additional information...
}
```

## Muting Alarm Actions
<a name="mute-alarm-actions"></a>

 Alarm mute rules allow you to automatically mute alarm actions during predefined time windows, such as maintenance periods or operational events. CloudWatch continues monitoring alarm states while preventing unwanted notifications. For more information, see [Alarm Mute Rules](alarm-mute-rules.md). 

**Mute rules vs. disabling alarm actions**  
 Alarm mute rules temporarily mute actions during scheduled time windows and automatically unmute when the window ends. In contrast, the `DisableAlarmActions` API permanently disables alarm actions until you manually call `EnableAlarmActions`. The `EnableAlarmActions` API does not unmute alarms that are muted by active mute rules. 

**Note**  
 Muting an alarm does not stop CloudWatch from sending alarm events for alarm create, update, delete, and state changes to Amazon EventBridge. 

# Alarm Mute Rules
<a name="alarm-mute-rules"></a>

 Alarm Mute Rules is a CloudWatch feature that provides you a mechanism to automatically mute alarm actions during predefined time windows. When you create a mute rule, you define specific time periods and target alarms whose actions will be muted. CloudWatch will continue monitoring and evaluating alarm states while preventing unwanted notifications or automated alarm actions during expected operational events. 

 Alarm Mute Rules help you manage critical operational scenarios where alarm actions would be unnecessary or disruptive. For example, during planned maintenance windows, you can prevent automated alarm actions while your systems are intentionally offline or experiencing expected issues, allowing you to perform maintenance without interruptions. For operations during non-business hours such as weekends or holidays, you can mute non-critical alarm actions when immediate response is not required, reducing alarm noise and unnecessary notifications to your operations team. In testing environments, mute rules allow you to temporarily mute alarm actions during scenarios such as load testing where high resource usage or error rates are expected and don't require immediate attention. When your team is actively troubleshooting issues, mute rules allow you to prevent duplicate alarm actions from being triggered, helping you focus on resolution without being distracted by redundant alarm notifications. 

## Defining alarm mute rules
<a name="defining-alarm-mute-rules"></a>

 Alarm mute rules can be defined using: **rules** and **targets**. 
+  **Rules** - define the time windows when alarm actions should be muted. Rules are composed of three attributes: 
  +  **Expression** – Defines when the mute period begins and how it repeats. You can use two types of expressions: 
    +  **Cron expressions** – Use standard cron syntax to create recurring mute windows. This approach is ideal for regular maintenance schedules, such as weekly system updates or daily backup operations. Cron expressions allow you to specify complex recurring patterns, including specific days of the week, months, or times. 

       *Syntax for cron expression* 

      ```
      ┌───────────── minute (0 - 59)
      │ ┌───────────── hour (0 - 23)
      │ │ ┌───────────── day of the month (1 - 31)
      │ │ │ ┌───────────── month (1 - 12) (or JAN-DEC)
      │ │ │ │ ┌───────────── day of the week (0 - 6) (0 or 7 is Sunday, or MON-SUN)
      │ │ │ │ │
      │ │ │ │ │
      * * * * *
      ```
      +  The characters `*`, `,`, `-` will be supported in all fields. 
      +  English names can be used for the `month` (JAN-DEC) and `day of week` (SUN-SAT) fields 
    +  **At expressions** – Use at expressions for one-time mute windows. This approach works well for planned operational events that occur once at a known time. 

      ```
      Syntax: `at(yyyy-MM-ddThh:mm)`
      ```
  +  **Duration** – Specifies how long the mute rule lasts once activated. Duration must be specified in ISO-8601 format with a minimum of 1 minute (PT1M) and maximum of 15 days (P15D). 
  +  **Timezone** – Specifies the timezone in which the mute window will be applied according to the expressions, using standard timezone identifiers such as "America/Los\$1Angeles" or "Europe/London". 
+  **Targets** - specify the list of alarm names whose actions will be muted during the defined time windows. You can include both metric alarms and composite alarms in your target list. 

 You can optionally include start and end timestamps to provide additional boundaries for your mute windows. Start timestamps ensure that mute rules don't activate before a specific date and time, while end timestamps prevent rules from being applied beyond the specified date and time. 

 For more information about creating alarm mute rules programmatically, see [PutAlarmMuteRule](https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_PutAlarmMuteRule.html). 

**Note**  
 The targeted alarms must be present in the same AWS account and same AWS Region in which the mute rule is created. 
 A single alarm mute rule can target up to 100 alarms by alarm names. 

 The CloudWatch console includes a dedicated "Alarm Mute Rules" tab that provides centralized management of all your mute rules within your AWS account. You can search for specific mute rules using the mute rule attributes such as rule name. 

## Mute Rule Status
<a name="mute-rule-status"></a>

 Once created, an alarm mute rule can be in one of the below three statuses: 
+  **SCHEDULED** – The mute rule will become active at some time in the future according to the configured time window expression. 
+  **ACTIVE** – The mute rule is currently active as per the configured time window expression and actively muting targeted alarm actions. 
+  **EXPIRED** – The mute rule will not be SCHEDULED/ACTIVE anymore in the future. This occurs for one-time mute rules after the mute window has ended, or for recurring mute rules when an end timestamp is configured and that time has passed. 

## Effects of mute rules on alarms
<a name="effects-of-mute-rules"></a>

 During an active mute window, when a targeted alarm changes state and has actions configured, CloudWatch mutes those actions from executing. Mutes are applied only to alarm actions, meaning that alarms continue to be evaluated and state changes are visible in the CloudWatch console, but configured actions such as Amazon Simple Notification Service notifications, Amazon Elastic Compute Cloud Auto Scaling actions, or Amazon EC2 actions are prevented from executing. CloudWatch continues to evaluate alarm states normally throughout the mute period, and you can view this information through alarm history. 

 When a mute window ends, if the targeted alarm(s) remains in an alarming state (OK/ALARM/INSUFFICIENT\$1DATA), CloudWatch automatically re-triggers the alarm actions that were muted during the window. This ensures that your alarm actions are executed for ongoing issues once the planned mute period ends, maintaining the integrity of your monitoring system. 

**Note**  
 When you mute an alarm:   
 All the actions associated with the targeted alarms are muted 
 Actions associated with all alarm states (OK, ALARM, and INSUFFICIENT\$1DATA) are muted 

## Viewing and managing muted alarms
<a name="viewing-managing-muted-alarms-link"></a>

For information about viewing and managing muted alarms, see [Viewing and managing muted alarms](viewing-managing-muted-alarms.md).

# How alarm mute rules work
<a name="alarm-mute-rules-behaviour"></a>

The following scenarios illustrate how alarm mute rules affect the targeted alarms and how the alarm actions are muted or executed.

**Note**  
 Muting an alarm will mute actions associated with all alarm states, including OK, ALARM, and INSUFFICIENT\$1DATA. The behaviours illustrated below apply to actions associated with all alarm states. 
 When you mute a Metrics Insights alarm, all contributor metric series for that alarm are automatically muted as well. 

## Scenario: Alarm actions are muted when a mute rule is active
<a name="scenario-actions-muted-during-active-rule"></a>

Consider that,
+ An Alarm has actions configured for its ALARM state
+ An alarm mute rule is scheduled to be active from t1 to t5 that targets the Alarm

![\[alt text not found\]](http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/images/alarm_mute_rules_scenario-1.png)

+ At **t0** - Alarm is in OK state, mute rule status is SCHEDULED
+ At **t1** - Mute rule status becomes ACTIVE
+ At **t2** - Alarm transitions to ALARM state, action is muted as the alarm is effectively muted by the mute rule.
+ At **t4** - Alarm returns to OK state while mute rule is still active
+ At **t5** - Mute rule becomes inactive, but no ALARM action is executed because alarm is now in OK state

## Scenario: Alarm action muted when mute rule is active and re-triggered after mute window
<a name="scenario-action-retriggered-after-mute"></a>

Consider that,
+ An Alarm has actions configured for its ALARM state
+ An alarm mute rule is scheduled to be active from t1 to t5 that targets the Alarm

![\[alt text not found\]](http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/images/alarm_mute_rules_scenario-2.png)

+ At **t0** - Alarm is in OK state, mute rule status is SCHEDULED
+ At **t1** - Mute rule status becomes ACTIVE
+ At **t2** - Alarm transitions to ALARM state, action is muted as the alarm is effectively muted by the mute rule.
+ At **t4** - Mute window becomes inactive, alarm is still in ALARM state
+ At **t5** - Alarm action is executed because the mute window has ended and the alarm remains in the same state (ALARM) in which it was originally muted

## Scenario: Multiple overlapping alarm mute rules
<a name="scenario-multiple-overlapping-rules"></a>

Consider that,
+ An Alarm has actions configured for its ALARM state

Consider that there are two mute rules,
+ Alarm Mute Rule 1 - mutes Alarm from t1 to t5
+ Alarm Mute Rule 2 - mutes Alarm from t3 to t9

![\[alt text not found\]](http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/images/alarm_mute_rules_scenario-3.png)

+ At **t0** - Alarm is in OK state, both mute rules are SCHEDULED
+ At **t1** - First mute rule becomes ACTIVE
+ At **t2** - Alarm transitions to ALARM state, action is muted
+ At **t3** - Second mute rule becomes ACTIVE
+ At **t5** - First mute rule becomes inactive, but alarm action remains muted because second mute rule is still active
+ At **t8** - Alarm action is executed because the second mute window has ended and the alarm remains in the same state (ALARM) in which it was originally muted

## Scenario: Muted alarm actions are executed when mute rule update ends the mute window
<a name="scenario-rule-update-ends-mute"></a>

Consider that,
+ An Alarm has actions configured for its ALARM state
+ An alarm mute rule is scheduled to be active from t1 to t8 that targets the Alarm

![\[alt text not found\]](http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/images/alarm_mute_rules_scenario-4.png)

+ At **t0** - Alarm is in OK state, mute rule is SCHEDULED
+ At **t1** - Mute rule becomes ACTIVE
+ At **t2** - Alarm transitions to ALARM state, actions are muted
+ At **t6** - The mute rule configuration is updated such that the time window ends at t6. Alarm actions are immediately executed at t6 because the mute rule is no longer active.

**Note**  
The same behaviour applies,  
If the mute rule is deleted at t6. Deleting the mute rule immediately unmutes the alarm.
If the alarm is removed from the mute rule targets (at t6) then the alarm will be unmuted immediately.

## Scenario: New actions are executed if alarm actions are updated during mute window
<a name="scenario-actions-updated-during-mute"></a>

Consider that,
+ An Alarm has actions configured for its ALARM state
+ An alarm mute rule is scheduled to be active from t1 to t8 that targets the Alarm

![\[alt text not found\]](http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/images/alarm_mute_rules_scenario-5.png)

+ At **t0** - Alarm is in OK state, mute rule is SCHEDULED. An SNS action is configured with the alarm state ALARM.
+ At **t1** - Mute rule becomes ACTIVE
+ At **t2** - Alarm transitions to ALARM state, the configured SNS action is muted
+ At **t6** - Alarm configuration is updated to remove SNS action and add Lambda action
+ At **t8** - The lambda action configured to the Alarm is executed because the mute window has ended and the alarm remains in the same state (ALARM) in which it was originally muted

**Note**  
If all the alarm actions are removed during the mute window (at t6 in above example) then no actions will be executed at the end of mute window (at t8 above)

## Example schedules for common use cases
<a name="common-use-cases"></a>

 The following examples show how to configure time window expressions for common use cases. 

 **Scenario 1: Muting alarm actions during scheduled maintenance windows** – Regular maintenance activities that occur on a predictable schedule, such as system or database updates when services are intentionally unavailable or operating in degraded mode. 
+  Cron expression `0 2 * * SUN` with duration `PT4H` - Mutes alarms every Sunday from 2:00 AM to 6:00 AM for weekly system maintenance. 
+  Cron expression `0 1 1 * *` with duration `PT6H` - Mutes alarms on the first day of each month from 1:00 AM to 7:00 AM for monthly database maintenance. 

 **Scenario 2: Muting non-critical alarms during non-business hours** – Reducing alert fatigue during weekends or holidays when immediate attention is not required. 
+  Cron expression `0 18 * * FRI` with duration `P2DT12H` - Mutes alarms every weekend from Friday 6:00 PM to Monday 6:00 AM. 

 **Scenario 3: Muting performance alarms during daily backup operations** – Daily automated backup processes that temporarily increase resource utilization and may trigger performance-related alarms during predictable time windows. 
+  Cron expression `0 23 * * *` with duration `PT2H` - Mutes alarms every day from 11:00 PM to 1:00 AM during nightly backup operations that temporarily increase disk I/O and CPU utilization. 

 **Scenario 4: Muting duplicate alarms during active troubleshooting sessions** – Temporary muting of alarm actions while teams are actively investigating and resolving issues, preventing notification noise and allowing focused problem resolution. 
+  At expression `at(2024-05-10T14:00)` with duration `PT4H` - Mutes alarms on May 10, 2024 from 2:00 PM to 6:00 PM during an active incident response session. 

 **Scenario 5: Muting alarm actions during planned company shutdowns** – One-time extended maintenance periods or company-wide shutdowns where all systems are intentionally offline for extended periods. 
+  At expression `at(2024-12-23T00:00)` with duration `P7D` - Mutes alarms for the entire week of December 23-29, 2024 during annual company shutdown. 

# Limits
<a name="alarm-limits"></a>

## General CloudWatch quotas
<a name="general-cloudwatch-quotas"></a>

For information about general CloudWatch service quotas that apply to alarms, see [CloudWatch service quotas](cloudwatch_limits.md).

## Limits that apply to alarms based on metric math expressions
<a name="metric-math-alarm-limits"></a>

Alarms based on metric math expressions can reference a maximum of 10 metrics. This is a hard limit that cannot be increased. If you need to monitor more than 10 metrics in a single alarm, consider one of the following approaches:
+ If the metrics are in the same namespace, use a Metrics Insights query in your alarm instead of a metric math expression. Metrics Insights can aggregate across many metrics with a single query.
+ Pre-aggregate metrics into custom metrics using a Lambda function, then reference the aggregated metrics in your alarm expression.
+ Split your logic across multiple alarms and combine them using a composite alarm.

## Limits that apply to alarms based on Metrics Insights queries
<a name="metrics-insights-alarm-limits"></a>

When working with CloudWatch Metrics Insights alarms, be aware of these functional limits:
+ A default of 200 alarms using the Metrics Insights query per account per Region
+ Only the latest 3 hours of data can be used for evaluating the alarm's conditions. However, you can visualize up to two weeks of data on the alarm's detail page graph
+ Alarms evaluating multiple time series will limit the number of contributors in ALARM to 100
  + Assuming the query retrieves 150 time series:
    +  If there are fewer than 100 contributors in ALARM (for example 95), the `StateReason` will be "95 out of 150 time series evaluated to ALARM" 
    +  If there are more than 100 contributors in ALARM (for example 105), the `StateReason` will be "100\$1 time series evaluated to ALARM" 
  + Furthermore, if the volume of attributes is too large, the number of contributors in ALARM can be limited to less than 100.
+ Metrics Insights limits on the maximum number of time series analyzed or returned apply
+ During alarm evaluation, the `EvaluationState` will be set to `PARTIAL_DATA` for the following limits: 
  +  If the Metrics Insights query returns more than 500 time series. 
  +  If the Metrics Insights query matches more than 10,000 metrics. 

For more information on CloudWatch service quotas and limits, see [CloudWatch Metrics Insights service quotas](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/cloudwatch-metrics-insights-limits.html).

## Limits that apply to alarms based on PromQL queries
<a name="promql-limits"></a>

When working with CloudWatch PromQL alarms, be aware of these functional limits:
+ Alarms evaluating multiple time series will limit the number of contributors in ALARM to 100
  +  If there are fewer than 100 contributors in ALARM (for example 95), the `StateReason` will be "95 time series evaluated to ALARM" 
  +  If there are more than 100 contributors in ALARM (for example 105), the `StateReason` will be "100\$1 time series evaluated to ALARM" 
  + Furthermore, if the volume of attributes is too large, the number of contributors in ALARM can be limited to less than 100.
+ PromQL query limits on the maximum number of time series analyzed or returned apply
+ During alarm evaluation, the `EvaluationState` will be set to `PARTIAL_DATA` if the PromQL query returns more than 500 time series. 

## Limits that apply to alarms based on connected data sources
<a name="MultiSource_Alarm_Details"></a>
+ When CloudWatch evaluates an alarm, it does so every minute, even if the period for the alarm is longer than one minute. For the alarm to work, the Lambda function must be able to return a list of timestamps starting on any minute, not only on multiples of the period length. These timestamps must be spaced one period length apart.

  Therefore, if the data source queried by the Lambda can only return timestamps that are multiples of the period length, the function should "re-sample" the fetched data to match the timestamps expected by the `GetMetricData` request.

  For example, an alarm with a five-minute period is evaluated every minute using five-minute windows that shift by one minute each time. In this case:
  + For the alarm evaluation at 12:15:00, CloudWatch expects data points with timestamps of `12:00:00`, `12:05:00`, and `12:10:00`. 
  + Then for the alarm evaluation at 12:16:00, CloudWatch expects data points with timestamps of `12:01:00`, `12:06:00`, and `12:11:00`. 
+ When CloudWatch evaluates an alarm, any data points returned by the Lambda function that don't align with the expected timestamps are dropped, and the alarm is evaluated using the remaining expected data points. For example, when the alarm is evaluated at `12:15:00` it expects data with timestamps of `12:00:00`, `12:05:00`, and `12:10:00`. If it receives data with timestamps of `12:00:00`, `12:05:00`, `12:06:00`, and `12:10:00`, the data from `12:06:00` is dropped and CloudWatch evaluates the alarm using the other timestamps.

  Then for the next evaluation at `12:16:00`, it expects data with timestamps of `12:01:00`, `12:06:00`, and `12:11:00`. If it only has the data with timestamps of `12:00:00`, `12:05:00`, and `12:10:00`, all of these data points are ignored at 12:16:00 and the alarm transitions into the state according to how you specified the alarm to treat missing data. For more information, see [Alarm evaluation](alarm-evaluation.md).
+ We recommend that you create these alarms to take actions when they transition to the `INSUFFICIENT_DATA` state, because several Lambda function failure use cases will transition the alarm to `INSUFFICIENT_DATA` regardless of the way that you set the alarm to treat missing data. 
+ If the Lambda function returns an error:
  + If there is a permission problem with calling the Lambda function, the alarm begins having missing data transitions according to how you specified the alarm to treat missing data when you created it.
  + Any other error coming from the Lambda function causes the alarm to transition to `INSUFFICIENT_DATA`.
+ If the metric requested by the Lambda function has some delay so that the last data point is always missing, you should use a workaround. You can create an M out of N alarm or increase the evaluation period of the alarm. For more information about M out of N alarms, see [Alarm evaluation](alarm-evaluation.md).