

# Alerting
<a name="alerting"></a>

Alerts are one of the most important information sources when it comes to the security, availability, performance, and reliability of your IT infrastructure and IT services. They notify and inform your IT teams about ongoing security threats, outages, performance issues, or system failures.

The Information Technology Infrastructure Library (ITIL), specifically IT service management (ITSM) practices, set automated alerting at the focal point of monitoring and event management and incident management best practices.

Incident alerting is when monitoring tools generate alerts to notify your team and automated tools (for items that are automatically actionable) about changes, high-risk actions, or failures in the IT environment. IT alerts are the first line of defense against system outages or changes that can turn into major incidents. By automatically monitoring systems and generating alerts for outages and risky changes, IT teams can minimize downtime and reduce the high cost that comes with it.

As best practices, the AWS Well-Architected Framework prescribes that you [use monitoring to generate alarm-based notifications](https://docs.aws.amazon.com/wellarchitected/latest/framework/perf_monitor_instances_post_launch_generate_alarms.html), and [monitor and alarm proactively](https://docs.aws.amazon.com/wellarchitected/latest/framework/perf_monitor_instances_post_launch_proactive.html). Use CloudWatch or a third-party monitoring service to set alarms that indicate when metrics are outside of expected boundaries.

The purpose of alert management is to establish efficient, standardized procedures for handling IT-related events and incidents through logging, classification, action definition and implementation, closure, and post-incident review activities.

**Sections**
+ [CloudWatch alarms](cloudwatch-alarms.md)
+ [EventBridge rules](eventbridge-rules.md)
+ [Specifying actions, enabling, and disabling alarms](enable-disable-alarms.md)

# CloudWatch alarms
<a name="cloudwatch-alarms"></a>

When you operate your Amazon RDS DB instances, you want to monitor and generate alerts on different kinds of metrics, events, and traces. For MySQL and MariaDB databases, the critical sources of information are [DB instance metrics](db-instance-monitoring.md), [OS metrics](os-monitoring.md), [events, logs, and audit trails](events-logs-audit.md). We recommend that you use [CloudWatch alarms](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html) to watch a single metric over a time period that you specify.

The following example illustrates how you can set an alarm that watches the `CPUUtilization` metric (percentage of CPU utilization) on all your Amazon RDS DB instances. You configure the alarm to be triggered if the CPU utilization on any DB instance is greater than 80 percent for the evaluation period of 5 minutes.

![\[Setting an alarm for the CPUUtilization metric\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/amazon-rds-monitoring-alerting/images/setting-alarm.png)


This means that the alarm goes into the `ALARM` state if any of your databases experiences a high CPU utilization (over 80 percent) for 5 minutes or more. The alarm remains in the `OK` state if the CPU occasionally bursts to over 80 percent utilization for a short period of time, and then drops again below the threshold. The following graph illustrates this logic.

![\[Alarm states and thresholds\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/amazon-rds-monitoring-alerting/images/thresholds.png)


CloudWatch alarms support metric and composite alarms.
+ A *metric alarm* watches a single CloudWatch metric and can perform mathematical expressions on the metric. A metric alarm can send Amazon SNS messages, which, in turn, can take one or more actions based on the value of the metric relative to a given threshold over a number of time periods.
+ A *composite alarm* is based on a rule expression, which evaluates the states of multiple alarms and goes into the `ALARM` state only if all conditions of the rule are met. Composite alarms are typically used to reduce the number of unnecessary alerts. For example, you might have a composite alarm that contains several metric alarms that are configured never to take actions. The composite alarm would send an alert when all the individual metric alarms in the composite are already in the `ALARM`

CloudWatch alarms can watch only CloudWatch metrics. If you want to create an alarm based on the error, slow query, or general logs, you must create CloudWatch metrics from the logs. You can accomplish that as discussed earlier in the [OS monitoring](os-monitoring.md) and [Events, logs, and audit trails](events-logs-audit.md) sections, by using filters to [create metrics from log events](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/MonitoringLogData.html). Similarly, to alert on Enhanced Monitoring metrics, you must create metric filters in CloudWatch from CloudWatch Logs.

# EventBridge rules
<a name="eventbridge-rules"></a>

[Amazon RDS events](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_Events.Messages.html) are delivered to Amazon EventBridge, and you can use [EventBridge rules](https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-create-rule.html) to react to those events. For example, you can create EventBridge rules that would notify you and take an action if one specific DB instance stops or starts up, as the following screen shows.

![\[EventBridge rules for DB instance stops and starts\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/amazon-rds-monitoring-alerting/images/eventbridge-rules.png)


The rule that detects `The DB instance has been stopped` event has the Amazon RDS event ID `RDS-EVENT-0087`, so you set the `Event Pattern` property of the rule to:

```
{
  "source": ["aws.rds"],
  "detail-type": ["RDS DB Instance Event"],
  "detail": {
    "SourceArn": ["arn:aws:rds:eu-west-3:111122223333:db:database-3"],
    "EventID": ["RDS-EVENT-0087"]
  }
}
```

This rule monitors the DB instance `database-3` only, and watches for the `RDS-EVENT-0087` event. When EventBridge detects the event, it sends the event to a resource or endpoint, known as a [target](https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-targets.html). This is where you can specify the action you want to take if the Amazon RDS instance shuts down. You can send the event to many possible targets, including an SNS topic, an Amazon Simple Queue Service (Amazon SQS) queue, an AWS Lambda function, AWS Systems Manager Automation, an AWS Batch job, Amazon API Gateway, and many others. For example, you might create an SNS topic that will send a notification email and SMS, and assign that SNS topic as the target of the EventBridge rule. If the Amazon RDS DB instance `database-3` has been stopped, Amazon RDS delivers the event `RDS-EVENT-0087` to EventBridge, where it gets detected. EventBridge then calls the target, which is the SNS topic. The SNS topic is configured to send an email (as shown in the following illustration) and an SMS.

![\[SNS topic configuration\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/amazon-rds-monitoring-alerting/images/sns-notification.png)


# Specifying actions, enabling, and disabling alarms
<a name="enable-disable-alarms"></a>

You can use a CloudWatch alarm to specify what actions the alarm should take when it changes between the `OK`, `ALARM`, and `INSUFFICIENT_DATA` states. CloudWatch has built-in integration with SNS topics and several additional action categories that are not applicable to Amazon RDS metrics, such as Amazon Elastic Compute Cloud (Amazon EC2) actions or Amazon EC2 Auto Scaling group actions. EventBridge is generally used to write rules and define targets that take actions when the alarm is triggered for Amazon RDS metrics. CloudWatch sends events to EventBridge every time a CloudWatch alarm changes its state. You can use these alarm state change events to trigger an event target in EventBridge. For more information, see [Alarm events and EventBridge](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/cloudwatch-and-eventbridge.html) in the CloudWatch documentation.

You might also need to manage alarms; for example, to automatically disable an alarm during planned configuration change or tests, and then re-enable the alarm when the planned action is over. For example, if you have a planned, scheduled database software upgrade that requires downtime, and you have alarms that will be activated if the database becomes unavailable, you can disable and enable alarms by using the API actions [DisableAlarmActions](https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_DisableAlarmActions.html) and [EnableAlarmActions](https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_EnableAlarmActions.html), or the [disable-alarm-actions](https://docs.aws.amazon.com/cli/latest/reference/cloudwatch/disable-alarm-actions.html) and [enable-alarm-actions](https://docs.aws.amazon.com/cli/latest/reference/cloudwatch/enable-alarm-actions.html) commands in the AWS CLI. You can also view the alarm's history on the CloudWatch console or by using the [DescribeAlarmHistory](https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_DescribeAlarmHistory.html) API action or the [describe-alarm-history](https://docs.aws.amazon.com/cli/latest/reference/cloudwatch/describe-alarm-history.html) command in the AWS CLI. CloudWatch preserves alarm history for two weeks. On the CloudWatch console, you can choose the **Favorites and recents** menu in the navigation pane to set and access your favorite and most recently visited alarms.