

# DB instance monitoring
<a name="db-instance-monitoring"></a>

A [https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Overview.DBInstance.html](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Overview.DBInstance.html) is the basic building block of Amazon RDS. It is an isolated database environment that runs in the cloud. For MySQL and MariaDB databases, the DB instance is the [mysqld](https://dev.mysql.com/doc/refman/8.0/en/mysqld.html) program, also known as the MySQL server, which includes multiple threads and components such as the SQL parser, the query optimizer, thread/connection handler, system and status variables, and one or more pluggable storage engines. Each storage engine is designed to support a specialized use case. The default and recommended storage engine is [InnoDB](https://dev.mysql.com/doc/refman/8.0/en/innodb-architecture.html), which is a transactional, general-purpose, relational database engine that complies with the atomicity, consistency, isolation, durability (ACID) model. InnoDB features [in-memory structures](https://dev.mysql.com/doc/refman/8.0/en/innodb-in-memory-structures.html) (buffer pool, change buffer, adaptive hash index, log buffer) as well as [on-disk structures](https://dev.mysql.com/doc/refman/8.0/en/innodb-on-disk-structures.html) (tablespaces, tables, indexes, undo log, redo log, doublewrite buffer files). To ensure that your database adheres closely to the ACID model, the [InnoDB storage engine implements numerous capabilities](https://dev.mysql.com/doc/refman/8.0/en/mysql-acid.html) to protect your data, including transactions, commit, rollback, crash-recovery, row-level locking, and multiversion concurrency control (MVCC).

All these internal components of a DB instance work jointly to help maintain the availability, integrity, and security of your data at the expected and satisfactory performance level. Depending on your workload, each component and feature might impose resource demands on CPU, memory, network, and storage subsystems. When a surge in demand for a specific resource exceeds the provisioned capacity or the software limits for that resource (imposed either by configuration parameters or by software design), the DB instance can experience performance degradation or complete unavailability and corruption. Therefore, it is critical to measure and monitor these internal components, compare them against defined baseline values, and generate alerts if the monitored values deviate from the expected values.

As described previously, you can use different [tools](monitoring-tools.md) to monitor your MySQL and MariaDB instances. We recommend that you use the Amazon RDS Performance Insights and CloudWatch tools for monitoring and alerting, because these tools are integrated with Amazon RDS, gather high-resolution metrics, present the latest performance information in near real time, and generate alarms.

Regardless of your preferred monitoring tool, we recommend that you [turn on the Performance Schema](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PerfInsights.EnableMySQL.html) in your MySQL and MariaDB DB instances. The [Performance Schema](https://dev.mysql.com/doc/refman/8.0/en/performance-schema.html) is an optional feature for monitoring the operation of the MySQL server (the DB instance) at a low level, and is designed to have minimal impact on overall database performance. You can manage this feature by using the `performance_schema` parameter. Although this parameter is optional, you must use it to collect high-resolution (one second) per-SQL metrics, active session metrics, wait events, and other detailed, low-level monitoring information, which is collected by Amazon RDS Performance Insights.

**Sections**
+ [Performance Insights metrics for DB instances](db-instance-performance-insights.md)
+ [CloudWatch metrics for DB instances](db-instance-cloudwatch-metrics.md)
+ [Publishing Performance Insights metrics to CloudWatch](publishing-performance-insights-to-cloudwatch.md)

# Performance Insights metrics for DB instances
<a name="db-instance-performance-insights"></a>

Performance Insights monitors different types of metrics, as discussed in the following sections.

## Database load
<a name="dbload"></a>

Database load (`DBLoad`) is a key metric in Performance Insights that measures the level of activity in your database. It is collected every second and automatically published to Amazon CloudWatch. It represents the activity of the DB instance in average active sessions (AAS), which are the number of sessions that are concurrently running SQL queries. The `DBLoad` metric is different from other time-series metrics, because it can be interpreted by using any of these five dimensions: waits, SQL, hosts, users, and databases. These dimensions are subcategories of the `DBLoad` metric. You can use them as *slice by* categories to represent different characteristics of the database load. For a detailed description of how we compute the database load, see [Database load](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PerfInsights.Overview.ActiveSessions.html) in the Amazon RDS documentation.

The following screen illustration shows the Performance Insights tool.

![\[Database load in the Performance Insights tool\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/amazon-rds-monitoring-alerting/images/database-load.png)


## Dimensions
<a name="dimensions"></a>
+ *Wait events* are conditions that a database session waits for a resource or another operation to complete in order to continue its processing. If you run an SQL statement such as `SELECT * FROM big_table` and if this table is much bigger than the allocated InnoDB buffer pool, your session will most likely wait for `wait/io/file/innodb/innodb_data_file` wait events, which are caused by physical I/O operations on the data file. Wait events are an important dimension for database monitoring, because they indicate possible performance bottlenecks. Wait events indicate the resources and operations that the SQL statements you're running within sessions spend the most time waiting for. For example, the `wait/synch/mutex/innodb/trx_sys_mutex` event occurs when there is high database activity with a large number of transactions, and the `wait/synch/mutex/innodb/buf_pool_mutex` event occurs when a thread has acquired a lock on the InnoDB buffer pool to access a page in memory. For information about all MySQL and MariaDB wait events, see [Wait Event Summary Tables](https://dev.mysql.com/doc/refman/8.0/en/performance-schema-wait-summary-tables.html) in the MySQL documentation. To understand how to interpret instrument names, see [Performance Schema Instrument Naming Conventions](https://dev.mysql.com/doc/refman/8.0/en/performance-schema-instrument-naming.html) in the MySQL documentation.
+ *SQL* shows which SQL statements are contributing the most to the total database load. The **Top dimensions** table, which is located under the **Database load** chart in Amazon RDS Performance Insights, is interactive. You can obtain a detailed list of wait events associated with the SQL statement by clicking the bar in the **Load by waits (AAS)** column. When you select an SQL statement in the list, Performance Insights displays the associated wait events in the **Database load** chart and the SQL statement text in the **SQL text** section. SQL statistics are displayed on the right side of the **Top dimensions** table.
+ *Hosts* show the host names of the connected clients. This dimension helps you identify which client hosts are sending most of the load to the database.
+ *Users* group the DB load by users who are logged in to the database.
+ *Databases* group the DB load by the name of the database the client is connected to.

## Counter metrics
<a name="counter-metrics"></a>

Counter metrics are cumulative metrics whose values can only increase or reset to zero when the DB instance restarts. The value of a counter metric cannot be reduced to its previous value. These metrics represent a single, monotonically increasing counter.
+ [Native counters](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PerfInsights_Counters.html#USER_PerfInsights_Counters.MySQL.Native) are metrics that are defined by the database engine and not by Amazon RDS. For example:
  + `SQL.Innodb_rows_inserted` represents the number of rows inserted into InnoDB tables.
  + `SQL.Select_scan` represents the number of joins that completed a full scan of the first table.
  + `Cache.Innodb_buffer_pool_reads` represents the number of logical reads that the InnoDB engine couldn't retrieve from the buffer pool and had to read directly from disk.
  + `Cache.Innodb_buffer_pool_read_requests` represents the number of logical read requests.

  For definitions of all native metrics, see [Server Status Variables](https://dev.mysql.com/doc/refman/8.0/en/server-status-variables.html) in the MySQL documentation.
+ [Non-native counters](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PerfInsights_Counters.html#USER_PerfInsights_Counters.MySQL.NonNative) are defined by Amazon RDS. You can obtain these metrics either by using a specific query or derive them by using two or more native metrics in calculations. Non-native counter metrics can represent latencies, ratios, or hit rates. For example:
  + `Cache.innoDB_buffer_pool_hits` represents the number of read operations that InnoDB could retrieve from the buffer pool without utilizing the disk. It is calculated from the native counter metrics as follows:

    ```
    db.Cache.Innodb_buffer_pool_read_requests - db.Cache.Innodb_buffer_pool_reads
    ```
  + `IO.innoDB_datafile_writes_to_disk` represents the number of InnoDB data file write operations to disk. It captures only operations on data files―not doublewrite or redo logging write operations. It is calculated as follows:

    ```
    db.IO.Innodb_data_writes - db.IO.Innodb_log_writes - db.IO.Innodb_dblwr_writes
    ```

You can visualize DB instance metrics directly in the Performance Insights dashboard. Choose **Manage Metrics**, choose the **Database metrics** tab, and then select the metrics of interest, as shown in the following illustration.

![\[Selecting DB instance metrics in Performance Insights\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/amazon-rds-monitoring-alerting/images/selecting-metrics.png)


Choose the **Update graph** button to display the metrics you selected, as shown in the following illustration.

![\[Viewing DB instance metrics in Performance Insights\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/amazon-rds-monitoring-alerting/images/selecting-metrics-results.png)


## SQL statistics
<a name="sql-stats"></a>

Performance Insights gathers performance-related metrics about SQL queries for each second that a query is running and for each SQL call. In general, Performance Insights collects [SQL statistics](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PerfInsights.UsingDashboard.AnalyzeDBLoad.AdditionalMetrics.MySQL.html) at the statement and digest levels. However, for MariaDB and MySQL DB instances, statistics are collected only at the digest level.
+ Digest statistics is a composite metric of all queries that have the same pattern but eventually have different literal values. The digest replaces specific literal values with a variable; for example:

  ```
  SELECT department_id, department_name FROM departments WHERE location_id = ?
  ```
+ There are metrics that represent statistics *per second* for each digested SQL statement. For example, `sql_tokenized.stats.count_star_per_sec` represents calls per second (that is, how many times per second the SQL statement has been run).
+ Performance Insights also includes metrics that provide *per call* statistics for an SQL statement. For example, `sql_tokenized.stats.sum_timer_wait_per_call` shows the average latency of the SQL statement per call, in milliseconds.

SQL statistics are available in the Performance Insights dashboard, in the **Top SQL** tab of the **Top dimensions** table.

![\[SQL statistics\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/amazon-rds-monitoring-alerting/images/sql-stats.png)


# CloudWatch metrics for DB instances
<a name="db-instance-cloudwatch-metrics"></a>

Amazon CloudWatch also contains metrics that Amazon RDS publishes automatically. The metrics that reside in the `AWS/RDS` namespace are *instance-level metrics*, which refers to the Amazon RDS (service) instance (that is, the isolated database environment running in the cloud) rather than the DB instance in the strict sense of the [mysqld](https://dev.mysql.com/doc/refman/8.0/en/mysqld.html) process. Therefore, most of those [default metrics](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/rds-metrics.html) fall under the category of OS metrics, in the strict definition of the term. Examples include: `CPUUtilization`, `WriteIOPS`, `SwapUsage`, and others. Nevertheless, there are some DB instance metrics that are applicable to MariaDB and MySQL:
+ `BinLogDiskUsage` – The amount of disk space occupied by binary logs.
+ `DatabaseConnections` – The number of client network connections to the DB instance.
+ `ReplicaLag` – The amount of time a read replica DB instance lags behind the source DB instance.

# Publishing Performance Insights metrics to CloudWatch
<a name="publishing-performance-insights-to-cloudwatch"></a>

Amazon RDS Performance Insights monitors most of the DB instance metrics and dimensions and makes them available through the [Performance Insights dashboard](https://console.aws.amazon.com/rds/home#performance-insights-v20206:) on the AWS Management Console. This dashboard is well-suited for database troubleshooting and root cause analysis. However, it is not possible to create alarms in Performance Insights for performance-related metrics. If you want to create alarms based on Performance Insights metrics, those metrics must be in CloudWatch.

Performance Insights [automatically publishes metrics to CloudWatch](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PerfInsights.Cloudwatch.html). You can query the same data from Performance Insights, but having the metrics in CloudWatch makes it easy to add CloudWatch alarms and add the metrics to existing CloudWatch dashboards. [Counters](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PerfInsights_Counters.html) are operating system and database performance metrics such as `os.memory.free` or `db.Locks.Innodb_row_lock_time`. The OS metrics collection depends on the Enhanced Monitoring setting. If Enhanced Monitoring is turned off, OS metrics are collected once every minute. If Enhanced Monitoring is turned on, OS metrics are collected for the selected time period. For more information, see [Turning Enhanced Monitoring on and off](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_Monitoring.OS.Enabling.html#USER_Monitoring.OS.Enabling.Procedure) in the Amazon RDS documentation.

Performance Insights lets you [export the preconfigured or custom metrics dashboard](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/PI_metrics_export_CW.html) for your DB instance to CloudWatch. You can export the metrics dashboard as a new dashboard or add it to an existing CloudWatch dashboard. Exporting the Performance Insights metrics dashboard to the CloudWatch dashboard gives you a unified, holistic view of your system’s health by providing an overview of metrics associated with various resources in your system, such as EC2 instances, Amazon Elastic File System (Amazon EFS) resources, and Elastic Load Balancing (ELB) resources, along with your DB instance metrics.

You can use the CloudWatch `DB_PERF_INSIGHTS` metric math function to query and create alarms and graphs based on Performance Insights metrics from CloudWatch. To create an alarm on a Performance Insights metric, follow the instructions in the [CloudWatch documentation](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_alarm_database_performance_insights.html). For example, if you want to trigger an alarm when the total active transactions in your DB instance reaches a specific threshold, follow the instructions on that page, use the following: `DB_PERF_INSIGHTS` math expression, and then choose **Apply**:

```
DB_PERF_INSIGHTS('RDS', 'db-BQ2TPYY7HG2GDFC7APMB3BVB3M', 'db.Transactions.active_transactions.avg')
```

where `db-BQ2TPYY7HG2GDFC7APMB3BVB3M` is the resource ID of your DB instance. Specify the period (for example, 1 minute) and conditions (for example, greater than 1000). To finalize the creation of the alarm, configure alarm actions, add a name and description, and preview and create the alarm.