# Monitoring Amazon FSx for NetApp ONTAP
Monitoring file systems

You can use the following services and tools to monitor Amazon FSx for NetApp ONTAP usage and activity:
+ **Amazon CloudWatch** – You can monitor file systems using Amazon CloudWatch, which automatically collects and processes raw data from FSx for ONTAP into readable metrics. These statistics are retained for a period of 15 months so that you can access historical information and see how your file system is performing. You can also set alarms based on your metrics over a specified time period and perform one or more actions based on the value of the metrics relative to thresholds that you specify.
+ **ONTAP EMS events** – You can monitor your FSx for ONTAP file system by using events generated by ONTAP's Events Management System (EMS). EMS events are notifications of occurrences in your file system, such as iSCSI LUN creation or automatic sizing of volumes.
+ **NetApp Data Infrastructure Insights** – You can monitor configuration, capacity, and performance metrics for your FSx for ONTAP file systems using the NetApp Data Infrastructure Insights service. You can also create alerts based on metric conditions.
+ **NetApp Harvest and NetApp Grafana** – You can monitor your FSx for ONTAP file system by using NetApp Harvest and NetApp Grafana. NetApp Harvest monitors ONTAP file systems by collecting performance, capacity, and hardware metrics from FSx for ONTAP file systems. Grafana provides a dashboard where the collected Harvest metrics can be displayed.
+ **AWS CloudTrail** – You can use AWS CloudTrail to capture all API calls for Amazon FSx as events. These events provide a record of actions taken by a user, role, or AWS service in Amazon FSx.

**Topics**
+ [

# Monitoring with Amazon CloudWatch
](monitoring-cloudwatch.md)
+ [

# Monitoring FSx for ONTAP EMS events
](ems-events.md)
+ [

# Monitoring with Data Infrastructure Insights
](monitoring-cloud-insights.md)
+ [

# Monitoring FSx for ONTAP file systems using Harvest and Grafana
](monitoring-harvest-grafana.md)
+ [

# Monitoring FSx for ONTAP API Calls with AWS CloudTrail
](logging-using-cloudtrail-win.md)

# Monitoring with Amazon CloudWatch
Monitoring with CloudWatch

You can monitor file systems using Amazon CloudWatch, which collects and processes raw data from Amazon FSx for NetApp ONTAP into readable, near real-time metrics. These statistics are retained for a period of 15 months, so that you can access historical information to determine how your file system is performing. FSx for ONTAP metric data is automatically sent to CloudWatch at 1-minute periods by default. For more information about CloudWatch, see [What is Amazon CloudWatch?](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/WhatIsCloudWatch.html) in the *Amazon CloudWatch User Guide*.

**Note**  
By default, FSx for ONTAP sends metric data to CloudWatch at 1-minute periods except for the following metrics that are sent in 5-minute intervals:   
`FileServerDiskThroughputBalance`
`FileServerDiskIopsBalance`

CloudWatch metrics for FSx for ONTAP are organized into four categories, which are defined by the dimensions that are used to query each metric. For more information about dimensions, see [Dimensions](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/cloudwatch_concepts.html#Dimension) in the *Amazon CloudWatch User Guide*.
+ **File system metrics**: File-system-level performance and storage capacity metrics.
+ **File server metrics**: File-server-level metrics.
+ **Detailed file system aggregate metrics**: Detailed file system metrics per aggregate.
+ **Detailed file system metrics**: File-system-level storage metrics per storage tier (SSD and capacity pool).
+ **Volume metrics**: Per-volume performance and storage capacity metrics.
+ **Detailed volume metrics**: Per-volume storage capacity metrics by storage tier or by the type of data (user, snapshot, or other).

All CloudWatch metrics for FSx for ONTAP are published to the `AWS/FSx` namespace in CloudWatch. 

**Topics**
+ [

# Accessing CloudWatch metrics
](accessingmetrics.md)
+ [

# Monitoring in the Amazon FSx console
](monitor-throughput-cloudwatch.md)
+ [

# File system metrics
](file-system-metrics.md)
+ [

# Second-generation file system metrics
](so-file-system-metrics.md)
+ [

# Volume metrics
](volume-metrics.md)

# Accessing CloudWatch metrics


You can see Amazon CloudWatch metrics for Amazon FSx in the following ways:
+ The Amazon FSx console
+ The Amazon CloudWatch console
+ The AWS Command Line Interface (AWS CLI) for CloudWatch
+ The CloudWatch API

The following procedure explains how to view your file system's CloudWatch metrics with the Amazon FSx console. 

**To view CloudWatch metrics for your file system using the Amazon FSx console**

1. Open the Amazon FSx console at [https://console.aws.amazon.com/fsx/](https://console.aws.amazon.com/fsx/).

1. In the left navigation pane, choose **File systems**, then choose the file system whose metrics you want to view.

1. On the **Summary** page, choose **Monitoring & performance** from the second panel to view graphs for your file system's metrics. 

There are four tabs on the **Monitoring & performance** panel. 
+ Choose **Summary** (the default tab) to display any active warnings, CloudWatch alarms, and graphs for **File system activity**. 
+ Choose **Storage** to view storage capacity and utilization metrics. 
+ Choose **Performance** to view file server and storage performance metrics. 
+ Choose **CloudWatch alarms** to view graphs of any alarms configured for your file system. 

The following procedure explains how to view your volume's CloudWatch metrics with the Amazon FSx console

**To view CloudWatch metrics for your volume using the Amazon FSx console**

1. Open the Amazon FSx console at [https://console.aws.amazon.com/fsx/](https://console.aws.amazon.com/fsx/).

1. In the left navigation pane, choose **Volumes**, then choose the volume whose metrics you want to view.

1. On the **Summary** page, choose **Monitoring** (the default tab) from the second panel to view graphs for your volume's metrics. 

The following procedure explains how to view your file system's CloudWatch metrics with the Amazon CloudWatch console. 

**To view metrics using the Amazon CloudWatch console**

1. On the **Summary** page of your file system, choose **Monitoring & performance** from the second panel to view graphs for your file system's metrics. 

1. Choose **View in metrics** from the actions menu in the upper right of the graph that you want to view in the Amazon CloudWatch console. This opens the **Metrics** page in the Amazon CloudWatch console. 

The following procedure explains how to add FSx for ONTAP file system metrics to a dashboard in the Amazon CloudWatch console. 

**To add metrics to a Amazon CloudWatch console**

1. Choose the set of metrics (**Summary**, **Storage**, or **Performance**) in the **Monitoring & performance** panel of the Amazon FSx console. 

1. Choose **Add to dashboard** in the upper right hand of the panel. This opens the Amazon CloudWatch console. 

1. Select an existing CloudWatch dashboard from the list, or create a new dashboard. For more information, see [Using Amazon CloudWatch dashboards](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Dashboards.html) in the *Amazon CloudWatch User Guide*. 

The following procedure explains how to access your file system's metrics with the AWS CLI. 

**To access metrics from the AWS CLI**
+ Use the CloudWatch [list-metrics](https://docs.aws.amazon.com/cli/latest/reference/cloudwatch/list-metrics.html) CLI command with the `--namespace "AWS/FSx"` parameter. For more information, see the [AWS CLI Command Reference](https://docs.aws.amazon.com/cli/latest/reference/).

The following procedure explains how to access your file system's metrics with the CloudWatch API. 

**To access metrics from the CloudWatch API**
+ Call the [GetMetricStatistics](https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_GetMetricStatistics.html) API operation. For more information, see the [Amazon CloudWatch API Reference](https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/). 

# Monitoring in the Amazon FSx console


The CloudWatch metrics reported by Amazon FSx provide valuable information about your FSx for ONTAP file systems and volumes. 

**Topics**
+ [

## Monitoring file system metrics in the Amazon FSx console
](#fsxn-howtomonitor-fs)
+ [

## Monitoring volume metrics in the Amazon FSx console
](#fsxn-howtomonitor-vol)
+ [

# Performance warnings and recommendations
](performance-insights-FSxN.md)
+ [

# Creating Amazon CloudWatch alarms to monitor Amazon FSx
](creating_alarms.md)

## Monitoring file system metrics in the Amazon FSx console


You can use the **Monitoring & performance** panel on your file system's dashboard in the Amazon FSx console to view the metrics that are described in the following table. For more information, see [Accessing CloudWatch metrics](accessingmetrics.md). 

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/fsx/latest/ONTAPGuide/monitor-throughput-cloudwatch.html)

**Note**  
We recommend that you maintain an average throughput capacity utilization of any performance-related dimensions such as network utilization, CPU utilization, and SSD IOPS utilization to under 50%. This ensures that you have enough spare throughput capacity for unexpected spikes in your workload, as well as for any background storage operations (such as storage synchronization, data tiering, or backups).

## Monitoring volume metrics in the Amazon FSx console


You can view the **Monitoring** panel on your volume's dashboard in the Amazon FSx console to see additional performance metrics. For more information, see [Accessing CloudWatch metrics](accessingmetrics.md). 

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/fsx/latest/ONTAPGuide/monitor-throughput-cloudwatch.html)

# Performance warnings and recommendations


FSx for ONTAP displays a warning for CloudWatch metrics whenever one of these metrics has approached or crossed a predetermined threshold for multiple consecutive data points. These warnings provide you with actionable recommendations that you can use to optimize your file system's performance.

Warnings are accessible in several areas of the **Monitoring & performance** dashboard. All active or recent Amazon FSx performance warnings and any CloudWatch alarms configured for the file system that are in an ALARM state appear in the **Monitoring & performance** panel in the **Summary** section. The warning also appears in the section of the dashboard where the metric graph is displayed.

You can create CloudWatch alarms for any of the Amazon FSx metrics. For more information, see [Creating Amazon CloudWatch alarms to monitor Amazon FSx](creating_alarms.md).

## Use performance warnings to improve file system performance
Improve file system performance

Amazon FSx provides actionable recommendations that you can use to optimize your file system's performance. These recommendations describe how you can address a potential performance bottle neck. You can take the recommended action if you expect the activity to continue, or if it's causing an impact to your file system's performance. Depending on which metric has triggered a warning, you can resolve it by increasing either the file system's throughput capacity or storage capacity, as described in the following table.

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/fsx/latest/ONTAPGuide/performance-insights-FSxN.html)

**Note**  
During an SSD decrease operation, write-heavy workloads could experience a temporary performance degradation as the operation consumes disk and network resources. To minimize performance impact, maintain adequate headroom by ensuring ongoing workloads don't consistently consume more than 50% CPU, 50% disk throughput, or 50% SSD IOPS before initiating an SSD decrease operation.  
Brief I/O pauses of up to 60 seconds might occur for each volume as client access is redirected to the new set of disks. These pauses are expected and normal during the cutover phase of the operation.

For more information about file system performance, see [Amazon FSx for NetApp ONTAP performancePerformance](performance.md).

# Creating Amazon CloudWatch alarms to monitor Amazon FSx
Creating alarms

You can create a CloudWatch alarm that sends an Amazon Simple Notification Service (Amazon SNS) message when the alarm changes state. An alarm watches a single metric over a time period that you specify. If needed, the alarm then performs one or more actions based on the value of the metric relative to a given threshold over a number of time periods. The action is a notification sent to an Amazon SNS topic or an Auto Scaling policy.

Alarms invoke actions for sustained state changes only. CloudWatch alarms don't invoke actions only because they are in a particular state; the state must have changed and been maintained for a specified number of periods. You can create an alarm from the Amazon FSx console or the Amazon CloudWatch console.

The following procedures describe how to create alarms using the Amazon FSx console, AWS Command Line Interface (AWS CLI), and API.

**To set alarms using the Amazon FSx console**

1. Open the Amazon FSx console at [https://console.aws.amazon.com/fsx/](https://console.aws.amazon.com/fsx/).

1. In the left navigation pane, choose **File systems**, and then choose the file system that you want to create the alarm for.

1. On the **Summary** page, choose **Monitoring & performance** from the second panel. 

1. Choose the **CloudWatch alarms** tab. 

1. Choose **Create CloudWatch alarm**. You are redirected to the CloudWatch console.

1. Choose **Select metric**.

1. In the **Metrics** section, choose **FSx**.

1. Choose a metric category:
   + **File System Metrics**
   + **Detailed File System Metrics**
   + **Volume Metrics**
   + **Detailed Volume Metrics**

1. Choose the metric you want to set the alarm for, and then choose **Select metric**.

1. In the **Conditions** section, choose the conditions you want for the alarm, and then choose **Next**.
**Note**  
Metrics might not be published during file system maintenance. To prevent unnecessary and misleading alarm condition changes and to configure your alarms so that they are resilient to missing data points, see [Configuring how CloudWatch alarms treat missing data](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html#alarms-and-missing-data) in the *Amazon CloudWatch User Guide*.

1. If you want CloudWatch to send you an email or Amazon SNS notification when the alarm state initiates the action, choose an alarm state for **Alarm state trigger**. 

   For **Send a notification to the following SNS topic**, choose an option. If you choose **Create topic**, you can set the name and email addresses for a new email subscription list. This list is saved and appears in the field for future alarms. Choose **Next**.
**Note**  
If you use **Create topic** to create a new Amazon SNS topic, the email addresses must be verified before they receive notifications. Emails are sent only when the alarm enters an alarm state. If this alarm state change happens before the email addresses are verified, they don't receive a notification.

1. Fill in the **Alarm name** and **Alarm description** fields, and then choose **Next**. 

1. On the **Preview and create** page, review the alarm that you're about to create, and then choose **Create alarm**. 

**To set alarms using the CloudWatch console**

1. Open the CloudWatch console at [https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/).

1. Choose **Create Alarm** to start the **Create Alarm Wizard**. 

1. Follow the procedure in **To set alarms using the Amazon FSx console**, beginning with step 6. 

**To set an alarm using the AWS CLI**
+ Call the [put-metric-alarm](https://docs.aws.amazon.com/cli/latest/reference/cloudwatch/put-metric-alarm.html) CLI command. For more information, see the [AWS CLI Command Reference](https://docs.aws.amazon.com/cli/latest/reference/). 

**To set an alarm using the CloudWatch API**
+ Call the [PutMetricAlarm](https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_PutMetricAlarm.html) API operation. For more information, see the [Amazon CloudWatch API Reference](https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/). 

# File system metrics


Your Amazon FSx for NetApp ONTAP file system metrics are classified as either **File system metrics** or **Detailed file system metrics**.
+ **File system metrics** are aggregate performance and storage metrics for a single file system that take a single dimension, `FileSystemId`. These metrics measure network performance and storage capacity usage for your file system.
+ **Detailed file system metrics** measure your file system's storage capacity and used storage in each storage tier (for example, SSD storage and capacity pool storage). Each metric includes a `FileSystemId`, `StorageTier`, and `DataType` dimension.

Note the following about when Amazon FSx publishes data points for these metrics to CloudWatch:
+ For the utilization metrics (any metric whose name ends in *Utilization*, such as `NetworkThroughputUtilization`), there is a data point emitted each period for every active file server or aggregate. For example, Amazon FSx emits one minutely metric per active file server for `FileServerDiskIopsUtilization`, and one minutely metric per aggregate for `DiskIopsUtilization`.
+ For all other metrics, there is a single data point emitted each period, corresponding to the total value of the metric across all of your active file servers (such as `DataReadBytes` for file server metrics) or all of your aggregates (such as `DiskReadBytes` for storage metrics).

**Topics**
+ [

## Network I/O metrics
](#fsxn-network-IO-metrics)
+ [

## File server metrics
](#fsxn-file-server-metrics)
+ [

## Disk I/O metrics
](#fsxn-disk-IO-metrics)
+ [

## Storage capacity metrics
](#fsxn-storage-volume-metrics)
+ [

## Detailed file system metrics
](#detailed-fs-metrics)

## Network I/O metrics


All of these metrics take one dimension, `FileSystemId`.


| Metric | Description | 
| --- | --- | 
| NetworkThroughputUtilization |  The percent utilization of network throughput for the file system. Note that this metric reflects the direction - i.e. inbound or outbound - that has the higher traffic flow. To see individual metrics for each direction, please refer to the NetworkReceivedBytes and NetworkSentBytes metrics.  The `Average` statistic is the average network throughput utilization of the file system over a specified period.  The `Minimum` statistic is the lowest network throughput utilization of the file system over a specified period.  The `Maximum` statistic is the highest network throughput utilization of the file system over a specified period.  Units: Percent  Valid statistics: `Average`, `Minimum`, and `Maximum`  | 
| NetworkSentBytes |  The number of bytes (network I/O) sent by the file system.  The `Sum` statistic is the total number of bytes sent by the file system over a specified period.  To calculate sent throughput (bytes per second) for any statistic, divide the statistic by the seconds in the specified period.  Units: Bytes  Valid statistics: `Sum`  | 
| NetworkReceivedBytes |  The number of bytes (network I/O) received by the file system.  The `Sum` statistic is the total number of bytes received by the file system over a specified period.  To calculate received throughput (bytes per second) for any statistic, divide the statistic by the seconds in the specified period.  Units: Bytes  Valid statistics: `Sum`  | 
| DataReadBytes |  The number of bytes (network I/O) from reads by clients to the file system. The `Sum` statistic is the total number of bytes associated with read operations during the specified period. To calculate the average throughput (bytes per second) for a period, divide the `Sum` statistic by the number of seconds in the specified period. Units: Bytes Valid statistics: `Sum`  | 
| DataWriteBytes |  The number of bytes (network I/O) from writes by clients to the file system. The `Sum` statistic is the total number of bytes associated with write operations during the specified period. To calculate the average throughput (bytes per second) for a period, divide the `Sum` statistic by the number of seconds in the specified period. Units: Bytes Valid statistics: `Sum`  | 
| DataReadOperations |  The count of read operations (network I/O) from reads by clients to the file system. The `Sum` statistic is the total number of I/O operations that occurred over a specified period. To calculate the average read operations per second for a period, divide the `Sum` statistic by the number of seconds in the specified period. Units: Count Valid statistics: `Sum`  | 
| DataWriteOperations |  The count of write operations (network I/O) from writes by clients to the file system. The `Sum` statistic is the total number of I/O operations that occurred over a specified period. To calculate the average write operations per second for a period, divide the `Sum` statistic by the number of seconds in the specified period. Units: Count Valid statistics: `Sum`  | 
| MetadataOperations |  The count of metadata operations (network I/O) by clients to the file system. The `Sum` statistic is the total number of I/O operations that occurred over a specified period. To calculate the average metadata operations per second for a period, divide the `Sum` statistic by the number of seconds in the specified period. Units: Count Valid statistics: `Sum`  | 
| DataReadOperationTime |  The sum of total time spent within the file system for read operations (network I/O) from clients accessing data in the file system. The `Sum` statistic is the total number of seconds spent by read operations during the specified period. To calculate the average read latency for a period, divide the `Sum` statistic by the `Sum` of the `DataReadOperations` metric over the same period. Units: Seconds Valid statistics: `Sum`  | 
| DataWriteOperationTime |  The sum of total time spent within the file system for fulfilling write operations (network I/O) from clients accessing data in the file system. The `Sum` statistic is the total number of seconds spent by write operations during the specified period. To calculate the average write latency for a period, divide the `Sum` statistic by the `Sum` of the `DataWriteOperations` metric over the same period. Units: Seconds Valid statistics: `Sum`  | 
| CapacityPoolReadBytes | The number of bytes read (network I/O) from the file system's capacity pool tier. To ensure data integrity, ONTAP performs a read operation on the capacity pool immediately after performing a write operation.  The `Sum` statistic is the total number of bytes read from the file system's capacity pool tier over a specified period. To calculate capacity pool bytes per second, divide the `Sum` statistic by the seconds in a specified period. Units: BytesValid statistics: `Sum` | 
| CapacityPoolReadOperations |  The number of read operations (network I/O) from the file system's capacity pool tier. This translates to a capacity pool read request.  To ensure data integrity, ONTAP performs a read operation on the capacity pool immediately after performing a write operation.  The `Sum` statistic is the total number of read operations from the file system's capacity pool tier over a specified period. To calculate capacity pool requests per second, divide the `Sum` statistic by the seconds in a specified period.  Units: Count Valid statistics: `Sum`  | 
| CapacityPoolWriteBytes | The number of bytes written (network I/O) to the file system's capacity pool tier. To ensure data integrity, ONTAP performs a read operation on the capacity pool immediately after performing a write operation.  The `Sum` statistic is the total number of bytes written to the file system's capacity pool tier over a specified period. To calculate capacity pool bytes per second, divide the `Sum` statistic by the seconds in a specified period. Units: BytesValid statistics: `Sum` | 
| CapacityPoolWriteOperations |  The number of write operations (network I/O) to the file system from the capacity pool tier. This translates to a write request.  To ensure data integrity, ONTAP performs a read operation on the capacity pool immediately after performing a write operation.  The `Sum` statistic is the total number of write operations to the file system's capacity pool tier over a specified period. To calculate capacity pool requests per second, divide the `Sum` statistic by the seconds in a specified period.  Units: Count Valid statistics: `Sum`  | 

## File server metrics


All of these metrics take one dimension, `FileSystemId`. 


| Metric | Description | 
| --- | --- | 
| CPUUtilization |  The percent utilization of the file system's CPU resources.  The `Average` statistic is the average CPU utilization of the file system over a specified period.  The `Minimum` statistic is the lowest CPU utilization of the file system over a specified period.  The `Maximum` statistic is the highest CPU utilization of the file system over a specified period.  Units: Percent  Valid statistics: `Average`, `Minimum`, and `Maximum`  | 
| FileServerDiskThroughputUtilization |  The disk throughput between your file server and the primary tier, as a percentage of the provisioned limit determined by throughput capacity.  The `Average` statistic is the average percent utilization of the file servers' disk throughput over a specified period. The `Minimum` statistic is the lowest percent utilization of the file servers' disk throughput over a specified period.  The `Maximum` statistic is the highest utilization of the file servers' disk throughput over a specified period.  Units: Percent Valid statistics: `Average`, `Minimum`, and `Maximum`  | 
| FileServerDiskThroughputBalance |  The percentage of available burst credits for disk throughput between your file server and the primary tier. This is valid for file systems that are provisioned with a throughput capacity of less than 512 MBps. The `Average` statistic is the average burst balance available over a specified period.  The `Minimum` statistic is the minimum burst balance available over a specified period.  The `Maximum` statistic is the maximum burst balance available over a specified period.  Units: Percent  Valid statistics: `Average`, `Minimum`, and `Maximum`  | 
| FileServerDiskIopsBalance |  The percentage of available burst credits for disk IOPS between your file server and the primary tier. This is valid for file systems that are provisioned with a throughput capacity of less than 512 MBps. The `Average` statistic is the average burst balance available over a specified period.  The `Minimum` statistic is the minimum burst balance available over a specified period.  The `Maximum` statistic is the maximum burst balance available over a specified period.  Units: Percent  Valid statistics: `Average`, `Minimum`, and `Maximum`  | 
| FileServerDiskIopsUtilization |  The percentage of IOPS utilization of available disk IOPS capacity for your file server.  The `Average` statistic is the average disk IOPS utilization of the file system over a specified period.  The `Minimum` statistic is the minimum disk IOPS utilization of the file system over a specified period.  The `Maximum` statistic is the maximum disk IOPS utilization of the file system over a specified period.  Units: Percent  Valid statistics: `Average`, `Minimum`, and `Maximum`  | 
| FileServerCacheHitRatio |   The percentage of all read requests that are served by data in the file system's RAM and NVMe caches. A higher percentage means that more reads are served by the file system's read caches.   Units: Percent  The `Average` statistic is the average cache hit percent for the file system over a specified period.   The `Minimum` statistic is the lowest cache hit percent for the file system over a specified period.  The `Maximum` statistic is the highest cache hit percent for the file system over a specified period.   Valid statistics: `Average`, `Minimum`, and `Maximum`   | 

## Disk I/O metrics


All of these metrics take one dimension, `FileSystemId`. 


| Metric | Description | 
| --- | --- | 
| DiskReadBytes |  The number of bytes (disk I/O) from any disk reads to the file system's primary tier.  The `Sum` statistic is the total number of bytes read from the file system over a specified period.  To calculate read disk throughput (bytes per second) for any statistic, divide the `Sum` statistic by the seconds in the specified period.  Units: Bytes  Valid statistics: `Sum`  | 
| DiskWriteBytes |  The number of bytes (disk I/O) from any disk writes to the file system's primary tier.  The `Sum` statistic is the total number of bytes written from the file system over a specified period.  To calculate write disk throughput (bytes per second) for any statistic, divide `Sum` the statistic by the seconds in the specified period.  Units: Bytes  Valid statistics: `Sum`  | 
| DiskIopsUtilization |  The disk IOPS between your file server and storage volumes, as a percentage of the primary's tiers provisioned disk IOPS limit.  The `Average` statistic is the average disk IOPS utilization of the file system over a specified period.  The `Minimum` statistic is the minimum disk IOPS utilization of the file system over a specified period.  The `Maximum` statistic is the maximum disk IOPS utilization of the file system over a specified period.  Units: Percent  Valid statistics: `Average`, `Minimum`, and `Maximum`  | 
| DiskReadOperations |  The number of read operations (disk I/O) from the file system's primary tier.  The `Sum` statistic is the total number of read operations from the primary tier over a specified period.  Units: Count  Valid statistics: `Sum`  | 
| DiskWriteOperations |  The number of write operations (disk I/O) to the file system's primary tier.  The `Sum` statistic is the total number of write operations to the primary tier over a specified period.  Units: Count  Valid statistics: `Sum`  | 

## Storage capacity metrics


All of these metrics take one dimension, `FileSystemId`. 


| Metric | Description | 
| --- | --- | 
| StorageEfficiencySavings |  The bytes saved from storage efficiency features (compression, deduplication, and compaction). The `Average` statistic is the average storage efficiency savings over a specified period. To calculate storage efficiency savings as a percentage of all data stored, over a one minute period, divide `StorageEfficiencySavings` by the sum of `StorageEfficiencySavings` and the `StorageUsed` file system metric, using the `Sum` statistic for `StorageUsed`.  The `Minimum` statistic is the minimum storage efficiency savings over a specified period.  The `Maximum` statistic is the maximum storage efficiency savings over a specified period.  Units: Bytes Valid statistics: `Average`, `Minimum`, and `Maximum`   | 
| StorageUsed |  The total amount of physical data stored on the file system, on both the primary (SSD) tier and the capacity pool tier. This metric includes savings from storage-efficiency features, such as data compression and deduplication. Units: Bytes Valid statistics: `Average`, `Minimum`, and `Maximum`  | 
| LogicalDataStored |  The total amount of logical data stored on the file system, considering both the SSD tier and the capacity pool tier. This metric includes the total logical size of snapshots and FlexClones, but does not include storage efficiency savings achieved through compression, compaction, and deduplication. To compute storage-efficiency savings in bytes, take the `Average` of `StorageUsed` over a given period and subtract it from the `Average` of `LogicalDataStored` over the same period.  To compute storage-efficiency savings as a percentage of total logical data size, take the `Average` of `StorageUsed` over a given period and subtract it from the `Average` of `LogicalDataStored` over the same period. Then divide the difference by the `Average` of `LogicalDataStored` over the same period. Units: Bytes Valid statistics: `Average`, `Minimum`, and `Maximum`  | 

## Detailed file system metrics


Detailed file system metrics are detailed storage-utilization metrics for each of your storage tiers. Detailed file system metrics all have the dimensions `FileSystemId`, `StorageTier`, and `DataType`.
+ The `StorageTier` dimension indicates the storage tier that the metric measures, with possible values of `SSD` and `StandardCapacityPool`.
+ The `DataType` dimension indicates the type of data that the metric measures, with the possible value `All`.

There is a row for each unique combination of a given metric and dimensional key-value pairs, with a description of what that combination measures.


| Metric | Description | 
| --- | --- | 
| StorageCapacityUtilization |  The storage capacity utilization for each of your file system's aggregates. There is one metric emitted each minute for each of your file system's aggregates. The `Average` statistic is the average amount of storage capacity utilization for your file system's performance tier over the specified period. The `Minimum` statistic is the lowest amount of storage capacity utilization for your file system's performance tier over the specified period. The `Maximum` statistic is the highest amount of storage capacity utilization for your file system's performance tier over the specified period. Units: Percent Valid statistics: `Average`, `Minimum`, and `Maximum`  | 
| StorageCapacity |  The total storage capacity of the primary (SSD) tier. Units: Bytes Valid statistics: `Maximum`  | 
| StorageUsed |  The used physical storage capacity in bytes, specific to the storage tier. This value includes savings from storage-efficiency features, such as data compression and deduplication. Valid dimension values for `StorageTier` are `SSD` and `StandardCapacityPool`, corresponding to the storage tier that this metric measures. This metric also requires the `DataType` dimension with the value `All`. The `Average`, `Minimum`, and `Maximum` statistics are per-tier storage consumption in bytes for the given period.  To calculate storage capacity utilization of your primary (SSD) storage tier, divide any of these statistics by the `Maximum` `StorageCapacity` over the same period, with the `StorageTier` dimension equal to `SSD`.  To calculate the free storage capacity of your primary (SSD) storage tier in bytes, subtract any of these statistics from the `Maximum` `StorageCapacity` over the same period, with the dimension `StorageTier` equal to `SSD`. Units: Bytes Valid statistics: `Average`, `Minimum`, and `Maximum`  | 

# Second-generation file system metrics


The following metrics are provided for FSx for ONTAP second-generation file systems. For the metrics, a datapoint is emitted for each HA pair and for each aggregate (for storage utilization metrics).

**Note**  
If you have a file system with multiple HA pairs, you can also use the [single-HA pair file system metrics](file-system-metrics.md) and the [volume metrics](volume-metrics.md).

**Topics**
+ [

## Network I/O metrics
](#so-network-IO-metrics)
+ [

## File server metrics
](#so-file-server-metrics)
+ [

## Disk I/O metrics
](#so-disk-IO-metrics)
+ [

## Detailed file system metrics
](#so-detailed-fs-metrics)

## Network I/O metrics


All of these metrics take two dimensions, `FileSystemId` and `FileServer`.
+ `FileSystemId` – Your file system's AWS resource ID.
+ `FileServer` – The name of a file server (or *node*) in ONTAP (for example, `FsxId01234567890abcdef-01`). Odd-numbered file servers are preferred file servers (that is, they service traffic unless the file system has failed over to the secondary file server), while even-numbered file servers are secondary file servers (that is, they serve traffic only when their partner is unavailable). Because of this, secondary file servers typically show less utilization than preferred file servers.


| Metric | Description | 
| --- | --- | 
| NetworkThroughputUtilization |  Network throughput utilization as a percentage of available network throughput for your file system. This metric is equivalent to the maximum of `NetworkSentBytes` and `NetworkReceivedBytes` as a percentage of the network throughput capacity of one HA pair for your file system. All traffic is considered in this metric, including background tasks (such as SnapMirror, tiering, and backups). There is one metric emitted each minute for each of your file system's file servers. The `Average` statistic is the average network throughput utilization for the given file server over the specified period. The `Minimum` statistic is the lowest network throughput utilization for the given file server over one minute, for the specified period. The `Maximum` statistic is the highest network throughput utilization for the given file server over one minute, for the specified period. Units: Percent  Valid statistics: `Average`, `Minimum`, and `Maximum`  | 
| NetworkSentBytes |  The number of bytes (network IO) sent by your file system. All traffic is considered in this metric, including background tasks (such as SnapMirror, tiering, and backups). There is one metric emitted each minute for each of your file system's file servers. The `Sum` statistic is the total number of bytes sent over the network by the given file server over the specified period. The `Average` statistic is the average number of bytes sent over the network by the given file server over the specified period. The `Minimum` statistic is the lowest number of bytes sent over the network by the given file server over the specified period. The `Maximum` statistic is the highest number of bytes sent over the network by the given file server over the specified period. To calculate sent throughput (bytes per second) for any statistic, divide the statistic by the seconds in the specified period.  Units: Bytes  Valid statistics: `Sum`, `Average`, `Minimum`, and `Maximum`  | 
| NetworkReceivedBytes |  The number of bytes (network IO) received by your file system. All traffic is considered in this metric, including background tasks (such as SnapMirror, tiering, and backups). There is one metric emitted each minute for each of your file system's file servers. The `Sum` statistic is the total number of bytes received over the network by the given file server over the specified period. The `Average` statistic is the average number of bytes received over the network by the given file server each minute over the specified period. The `Minimum` statistic is the lowest number of bytes received over the network by the given file server each minute over the specified period. The `Maximum` statistic is the highest number of bytes received over the network by the given file server each minute over the specified period. To calculate received throughput (bytes per second) for any statistic, divide the statistic by the seconds in the period. Units: Bytes  Valid statistics: `Sum`, `Average`, `Minimum`, and `Maximum`  | 

## File server metrics


All of these metrics take two dimensions, `FileSystemId` and `FileServer`.


| Metric | Description | 
| --- | --- | 
| CPUUtilization |  The percent utilization of the file system's CPU resources. There is one metric emitted each minute for each of your file system's file servers. The `Average` statistic is the average CPU utilization of the file system over a specified period.  The `Minimum` statistic is the lowest CPU utilization for the given file server over the specified period. The `Maximum` statistic is the highest CPU utilization for the given file server over the specified period. Units: Percent  Valid statistics: `Average`, `Minimum`, and `Maximum`  | 
| FileServerDiskThroughputUtilization |  The disk throughput between your file server and aggregate, as a percentage of the provisioned limit determined by throughput capacity. All traffic is considered in this metric, including background tasks (such as SnapMirror, tiering, and backups). This metric is equivalent to the sum of `DiskReadBytes` and `DiskWriteBytes` as a percentage of the file server's disk throughput capacity of one HA pair for your file system. There is one metric emitted each minute for each of your file system's file servers. The `Average` statistic is the average file server disk throughput utilization for the given file server over the specified period. The `Minimum` statistic is the lowest file server disk throughput utilization for the given file server over the specified period. The `Maximum` statistic is the highest file server disk throughput utilization for the given file server over the specified period. Units: Percent Valid statistics: `Average`, `Minimum`, and `Maximum`  | 
| FileServerDiskIopsUtilization |  The IOPS utilization of available disk IOPS capacity for your file server, as a percentage of its disk IOPS limit. This differs from `DiskIopsUtilization` in that the utilization of disk IOPS out of the maximum that your file server can handle, as opposed to your provisioned disk IOPS. All traffic is considered in this metric, including background tasks (such as SnapMirror, tiering, and backups). There is one metric emitted each minute for each of your file system's file servers. The `Average` statistic is the average disk IOPS utilization for the given file server over the specified period. The `Minimum` statistic is the lowest disk IOPS utilization for the given file server over the specified period. The `Maximum` statistic is the highest disk IOPS utilization for the given file server over the specified period. Units: Percent  Valid statistics: `Average`, `Minimum`, and `Maximum`  | 
| FileServerCacheHitRatio |  The percentage of all read requests which are served by data that resides in your file system's RAM or NVMe caches for each of your HA pairs (for example, the active file server in an HA pair). A higher percentage indicates a higher ratio of cached reads to total reads. All I/O is considered, including background tasks (such as SnapMirror, tiering, and backups). There is one metric emitted each minute for each of your file system's file servers.  Units: Percent  The `Average` statistic is the average cache hit ratio for one of your file system's HA pairs over the specified period.  The `Minimum` statistic is the lowest cache hit ratio for one of your file system's HA pairs over the specified period.  The `Maximum` statistic is the highest cache hit ratio for one of your file system's HA pairs over the specified period. Valid statistics: `Average`, `Minimum`, and `Maximum`  | 

## Disk I/O metrics


All of these metrics take two dimensions, `FileSystemId` and `Aggregate`.
+ `FileSystemId` – Your file system's AWS resource ID.
+ `Aggregate` – Your file system's performance tier consists of multiple storage pools called *aggregates*. There is one aggregate for each HA pair. For example, aggregate `aggr1` maps to file server `FsxId01234567890abcdef-01` (the active file server) and file server `FsxId01234567890abcdef-02` (the secondary file server) in an HA pair.


| Metric | Description | 
| --- | --- | 
| DiskReadBytes |  The number of bytes (disk IO) from ay disk reads from this aggregate. All traffic is considered in this metric, including background tasks (such as SnapMirror, tiering, and backups). There is one metric emitted each minute for each of your file system's aggregates. During SSD capacity decrease operations, this metric is reported for both the original aggregate (`aggr1_old`) and the new smaller aggregate (`aggr1`). The `Sum` statistic is the total number of bytes read each minute from the given aggregate over the specified period. The `Average` statistic is the average number of bytes read each minute from the given aggregate over the specified period. The `Minimum` statistic is the lowest number of bytes read each minute from the given aggregate over the specified period. The `Maximum` statistic is the highest number of bytes read each minute from the given aggregate over the specified period. To calculate read disk throughput (bytes per second) for any statistic, divide the statistic by the seconds in the period. Units: Bytes  Valid statistics: `Sum`, `Average`, `Minimum`, and `Maximum`  | 
| DiskWriteBytes |  The number of bytes (disk IO) from any disk writes to this aggregate. All traffic is considered in this metric, including background tasks (such as SnapMirror, tiering, and backups). There is one metric emitted each minute for each of your file system's aggregates. During SSD capacity decrease operations, this metric is reported for both the original aggregate (`aggr1_old`) and the new smaller aggregate (`aggr1`). The `Sum` statistic is the total number of bytes written to the given aggregate over the specified period. The `Average` statistic is the average number of bytes written to the given aggregate each minute over the specified period. The `Minimum` statistic is the lowest number of bytes written to the given aggregate each minute over the specified period. The `Maximum` statistic is the highest number of bytes written to the given aggregate each minute over the specified period. To calculate write disk throughput (bytes per second) for any statistic, divide the statistic by the seconds in the specified period.  Units: Bytes  Valid statistics: `Sum`, `Average`, `Minimum`, and `Maximum`  | 
| DiskIopsUtilization |  The disk IOPS utilization of one aggregate, as a percentage of the aggregate's disk IOPS limit (that is, the file system's total IOPS divided by the number of HA pairs for your file system). This differs from `FileServerDiskIopsUtilization` in that it is the utilization of provisioned disk IOPS against your provisioned IOPS limit, as opposed to the maximum disk IOPS supported by the file server (that is, dictated by your configured throughput capacity per HA pair). All traffic is considered in this metric, including background tasks (such as SnapMirror, tiering, and backups). There is one metric emitted each minute for each of your file system's aggregates. During SSD capacity decrease operations, this metric is reported for both the original aggregate (`aggr1_old`) and the new smaller aggregate (`aggr1`). The `Average` statistic is the average disk IOPS utilization for the given aggregate over the specified period. The `Minimum` statistic is the lowest disk IOPS utilization for the given aggregate over the specified period. The `Maximum` statistic ii the highest disk IOPS utilization for the given aggregate over the specified period. Units: Percent  Valid statistics: `Average`, `Minimum`, and `Maximum`  | 
| DiskReadOperations |  The number of read operations (disk IO) to this aggregate. All traffic is considered in this metric, including background tasks (such as SnapMirror, tiering, and backups). There is one metric emitted each minute for each of your file system's aggregates. During SSD capacity decrease operations, this metric is reported for both the original aggregate (`aggr1_old`) and the new smaller aggregate (`aggr1`). The `Sum` statistic is the total number of read operations performed by the given aggregate over the specified period. The `Average` statistic is the average number of read operations performed each minute by the given aggregate over the specified period. The `Minimum` statistic is the lowest number of read operations performed each minute by the given aggregate over the specified period. The `Maximum` statistic is the highest number of read operations performed each minute by the given aggregate over the specified period. To calculate average disk IOPS over the period, use the `Average` statistic and divide the result by 60 (seconds). Units: Count  Valid statistics: `Sum`, `Average`, `Minimum`, and `Maximum`  | 
| DiskWriteOperations |  The number of write operations (disk IO) to this aggregate. All traffic is considered in this metric, including background tasks (such as SnapMirror, tiering, and backups). There is one metric emitted each minute for each of your file system's aggregates. During SSD capacity decrease operations, this metric is reported for both the original aggregate (`aggr1_old`) and the new smaller aggregate (`aggr1`). The `Sum` statistic is the total number of write operations performed by the given aggregate over the specified period. The `Average` statistic is the average number of write operations performed each minute by the given aggregate over the specified period. To calculate average disk IOPS over the period, use the `Average` statistic and divide the result by 60 (seconds). Units: Count  Valid statistics: `Sum` and `Average`  | 

## Detailed file system metrics


Detailed file system metrics are detailed storage-utilization metrics for each of your storage tiers. Detailed file system metrics have either the `FileSystemId`, `StorageTier`, and `DataType` dimensions, or the `FileSystemId`, `StorageTier`, `DataType`, and `Aggregate` dimensions.
+ When the `Aggregate` dimension is not supplied, the metrics are for your entire file system. The `StorageUsed` and `StorageCapacity` metrics have a single data point each minute corresponding to the file system's total consumed storage (per storage tier) and total storage capacity (for the SSD tier). Meanwhile, the `StorageCapacityUtilization` metric emits one metric each minute for each aggregate.
+ When the `Aggregate` dimension is supplied, the metrics are for each aggregate.

The meaning of the dimensions are as follows:
+ `FileSystemId` – Your file system's AWS resource ID.
+ `Aggregate` – Your file system's performance tier consists of multiple storage pools called *aggregates*. There is one aggregate for each HA pair. For example, aggregate `aggr1` maps to file server `FsxId01234567890abcdef-01` (the active file server) and file server `FsxId01234567890abcdef-02` (the secondary file server) in an HA pair.
+ `StorageTier` – Indicates the storage tier that the metric measures, with possible values of `SSD` and `StandardCapacityPool`.
+ `DataType` – Indicates the type of data that the metric measures, with the possible value `All`.

There is a row for each unique combination of a given metric and dimensional key-value pairs, with a description of what that combination measures.


| Metric | Description | 
| --- | --- | 
| StorageCapacityUtilization |  The storage capacity utilization for a given file system aggregate. There is one metric emitted each minute for each of your file system's aggregates. The `Average` statistic is the average amount of storage capacity utilization for a given aggregate over the specified period. The `Minimum` statistic is the minimum amount of storage capacity utilization for a given aggregate over the specified period. The `Maximum` statistic is the maximum amount of storage capacity utilization for a given aggregate over the specified period. During SSD capacity decrease operations, this metric is reported for both the original aggregate (`aggr1_old`) and the new smaller aggregate (`aggr1`). Units: Percent Valid statistics: `Average`, `Minimum`, and `Maximum`  | 
| StorageCapacity |  The storage capacity for a given file system aggregate. There is one metric emitted each minute for each of your file system's aggregates. The `Average` statistic is the average amount of storage capacity for a given aggregate over the specified period. The `Minimum` statistic is the minimum amount of storage capacity for a given aggregate over the specified period. The `Maximum` statistic is the maximum amount of storage capacity for a given aggregate over the specified period. During SSD capacity decrease operations, this metric is reported for both the original aggregate (`aggr1_old`) and the new smaller aggregate (`aggr1`). Units: Bytes Valid statistics: `Average`, `Minimum`, and `Maximum`  | 
| StorageUsed |  The used physical storage capacity in bytes, specific to the storage tier. This value includes savings from storage-efficiency features, such as data compression and deduplication. Valid dimension values for `StorageTier` are `SSD` and `StandardCapacityPool`, corresponding to the storage tier that this metric measures. There is one metric emitted each minute for each of your file system's aggregates. The `Average` statistic is the average amount of physical storage capacity consumed on the given storage tier by the given aggregate over the specified period. The `Minimum` statistic is the minimum amount of physical storage capacity consumed on the given storage tier by the given aggregate over the specified period. The `Maximum` statistic is the maximum amount of physical storage capacity consumed on the given storage tier by the given aggregate over the specified period. During SSD capacity decrease operations, this metric is reported for both the original aggregate (`aggr1_old`) and the new smaller aggregate (`aggr1`). Units: Bytes Valid statistics: `Average`, `Minimum`, and `Maximum`  | 

# Volume metrics


Your Amazon FSx for NetApp ONTAP file system can have one or more volumes that store your data. Each of these volumes has a set of CloudWatch metrics, classified as either **Volume metrics** or **Detailed volume metrics**.
+ **Volume metrics** are per-volume performance and storage metrics that take two dimensions, `FileSystemId` and `VolumeId`. `FileSystemId` maps to the file system that the volume belongs to.
+ **Detailed volume metrics** are per-storage-tier metrics that measure storage consumption per tier with the `StorageTier` dimension (with possible values of `SSD` and `StandardCapacityPool`) and per data type with the `DataType` dimension (with possible values of `User`, `Snapshot`, and `Other`). These metrics have the `FileSystemId`, `VolumeId`, `StorageTier`, and `DataType` dimensions.

**Topics**
+ [

## Network I/O metrics
](#fsxn-vol-network-IO-metrics)
+ [

## Storage capacity metrics
](#fsxn-vol-storage-volume-metrics)
+ [

## Detailed volume metrics
](#detailed-vol-metrics)

## Network I/O metrics


All of these metrics take two dimensions, `FileSystemId` and `VolumeId`. 


| Metric | Description | 
| --- | --- | 
| DataReadBytes |  The number of bytes (network I/O) read from the volume by clients. The `Sum` statistic is the total number of bytes associated with read operations during the specified period. To calculate the average throughput (bytes per second) for a period, divide the `Sum` statistic by the number of seconds in the specified period. Units: Bytes Valid statistics: `Sum`  | 
| DataWriteBytes |  The number of bytes (network I/O) written to the volume by clients. The `Sum` statistic is the total number of bytes associated with write operations during the specified period. To calculate the average throughput (bytes per second) for a period, divide the `Sum` statistic by the number of seconds in the specified period. Units: Bytes Valid statistics: `Sum`  | 
| DataReadOperations |  The number of read operations (network I/O) on the volume by clients. The `Sum` statistic is the total number of read operations during the specified period. To calculate the average read operations per second for a period, divide the `Sum` statistic by the number of seconds in the specified period. Units: Count Valid statistics: `Sum`  | 
| DataWriteOperations |  The number of write operations (network I/O) on the volume by clients. The `Sum` statistic is the total number of write operations during the specified period. To calculate the average write operations per second for a period, divide the `Sum` statistic by the number of seconds in the specified period. Units: Count Valid statistics: `Sum`  | 
| MetadataOperations |  The number of I/O operations (network I/O) from metadata activities by clients to the volume. The `Sum` statistic is the total number of metadata operations during the specified period. To calculate the average metadata operations per second for a period, divide the `Sum` statistic by the number of seconds in the specified period. Units: Count Valid statistics: `Sum`  | 
| DataReadOperationTime |  The sum of total time spent within the volume for read operations (network I/O) from clients accessing data in the volume. The `Sum` statistic is the total number of seconds spent by read operations during the specified period. To calculate the average read latency for a period, divide the `Sum` statistic by the `Sum` of the `DataReadOperations` metric over the same period. Units: Seconds Valid statistics: `Sum`  | 
| DataWriteOperationTime |  The sum of total time spent within the volume for fulfilling write operations (network I/O) from clients accessing data in the volume. The `Sum` statistic is the total number of seconds spent by write operations during the specified period. To calculate the average write latency for a period, divide the `Sum` statistic by the `Sum` of the `DataWriteOperations` metric over the same period. Units: Seconds Valid statistics: `Sum`  | 
| MetadataOperationTime |  The sum of total time spent within the volume for fulfilling metadata operations (network I/O) from clients that are accessing data in the volume. The `Sum` statistic is the total number of seconds spent by read operations during the specified period. To calculate the average latency for a period, divide the `Sum` statistic by the `Sum` of the `MetadataOperations` over the same period. Units: Seconds Valid statistics: `Sum`  | 
| CapacityPoolReadBytes | The number of bytes read (network I/O) from the volume's capacity pool tier.  To ensure data integrity, ONTAP performs a read operation on the capacity pool immediately after performing a write operation.  The `Sum` statistic is the total number of bytes read from the volume's capacity pool tier over a specified period. To calculate capacity pool bytes per second, divide the `Sum` statistic by the seconds in a specified period. Units: BytesValid statistics: `Sum` | 
| CapacityPoolReadOperations |  The number of read operations (network I/O) from the volume's capacity pool tier. This translates to a capacity pool read request.  To ensure data integrity, ONTAP performs a read operation on the capacity pool immediately after performing a write operation.  The `Sum` statistic is the total number of read operations from the volume's capacity pool tier over a specified period. To calculate capacity pool requests per second, divide the `Sum` statistic by the seconds in a specified period.  Units: Count Valid statistics: `Sum`  | 
| CapacityPoolWriteBytes | The number of bytes written (network I/O) to the volume's capacity pool tier. To ensure data integrity, ONTAP performs a read operation on the capacity pool immediately after performing a write operation.  The `Sum` statistic is the total number of bytes written to the volume's capacity pool tier over a specified period. To calculate capacity pool bytes per second, divide the `Sum` statistic by the seconds in a specified period.  Units: Bytes Valid statistics: `Sum` | 
| CapacityPoolWriteOperations |  The number of write operations (network I/O) to the volume from the capacity pool tier. This translates to a write request.  To ensure data integrity, ONTAP performs a read operation on the capacity pool immediately after performing a write operation.  The `Sum` statistic is the total number of write operations to the volume's capacity pool tier over a specified period. To calculate capacity pool requests per second, divide the `Sum` statistic by the seconds in a specified period.  Units: Count Valid statistics: `Sum`  | 

## Storage capacity metrics


All of these metrics take two dimensions, `FileSystemId` and `VolumeId`. 


| Metric | Description | 
| --- | --- | 
| StorageCapacity |  The size of the volume in bytes. Units: Bytes Valid statistics: `Maximum`  | 
| StorageUsed |  The used logical storage capacity of the volume. Units: Bytes Valid statistics: `Average`  | 
| StorageCapacityUtilization |  The storage capacity utilization of the volume. Units: Percent Valid statistics: `Average`  | 
| FilesUsed |  The used files (number of files or inodes) on the volume. Units: Count Valid statistics: `Average`  | 
| FilesCapacity |  The total number of inodes that can be created on the volume. Units: Count Valid statistics: `Maximum`  | 

## Detailed volume metrics


Detailed volume metrics take more dimensions than volume metrics, enabling more granular measurements of your data. All detailed volume metrics have the dimensions `FileSystemId`, `VolumeId`, `StorageTier`, and `DataType`.
+ The `StorageTier` dimension indicates the storage tier that the metric measures, with possible values of `All`, `SSD`, and `StandardCapacityPool`.
+ The `DataType` dimension indicates the type of data that the metric measures, with possible values of `All`, `User`, `Snapshot`, and `Other`.

The following table defines what the `StorageUsed` metric measures for the listed dimensions. 


| Metric | Description | 
| --- | --- | 
| StorageUsed |  The amount of logical space used, in bytes. This metric measures different types of space consumption depending on the dimensions used with this metric. When setting `StorageTier` to `SSD` or `StandardCapacityPool`, and setting `DataType` to `All`, this metric measures the logical space usage for this volume for your SSD and capacity pool tiers, respectively. When setting the `DataType` dimension to `User`, `Snapshot`, or `Other`, and setting `StorageTier` to `All`, this metric measures the logical space usage for each respective type of data. The `Snapshot` data consumption includes the snapshot reserve, which is 5% of the volume's size by default.  Units: Bytes Valid statistics: `Average`, `Minimum`, and `Maximum`  | 
| StorageCapacityUtilization |  The percentage of the volume's used physical disk space.  Units: Percent Valid statistics: `Maximum`  | 

# Monitoring FSx for ONTAP EMS events
Monitoring EMS events

You can monitor FSx for ONTAP file system events using NetAPP ONTAP's native Events Management System (EMS). You can view these events using the NetApp ONTAP CLI.

**Topics**
+ [

## Overview of EMS events
](#ems-events-overview)
+ [

## Viewing EMS events
](#view-ems-events)
+ [

## EMS event forwarding to a Syslog server
](#ems-log-forwarding)

## Overview of EMS events


EMS events are automatically generated notifications that alert you when a predefined condition occurs in your FSx for ONTAP file system. These notifications keep you informed so that you can prevent or correct issues that can lead to larger problems, such as storage virtual machine (SVM) authentication issues or full volumes.

By default, events are logged in the Event Management System log. Using EMS, you can monitor events such as user password changes, a constituent within a FlexGroup approaching full capacity, a Logical Unit Number (LUN) was manually brought online or offline, or a volume automatically resizing.

For more information about ONTAP EMS events, see [ONTAP EMS Reference](https://docs.netapp.com/us-en/ontap-ems-9121/index.html) in the NetApp ONTAP Documentation Center. To display the event categories, use the document's left navigation pane.

**Note**  
Only some ONTAP EMS messages are available for FSx for ONTAP file systems. To view a list of the available ONTAP EMS messages, use the NetApp ONTAP CLI [event catalog show](https://docs.netapp.com/us-en/ontap-cli-9131/event-catalog-show.html) command.

EMS event descriptions contain event names, severity, possible causes, log messages, and corrective actions that can help you decide how to respond. For example, a [ wafl.vol.autoSize.fail](https://docs.netapp.com/us-en/ontap-ems-9121/wafl-vol-events.html#wafl-vol-autosize-fail) event occurs when automatic sizing of a volume fails. According to the event description, the corrective action is to increase the maximum size of the volume while setting the autosize.

## Viewing EMS events


Use the NetApp ONTAP CLI [event log show](https://docs.netapp.com/us-en/ontap-cli-9131/event-log-show.html) command to display the contents of the events log. This command is available if you have the `fsxadmin` role on your file system. The command syntax is as follows:

```
event log show [event_options]
```

The most recent events are listed first. By default, this command displays `EMERGENCY`, `ALERT`, and `ERROR` severity-level events with the following information:
+ **Time** – The time of the event.
+ **Node** – The node on which the event occurred.
+ **Severity** – The severity level of the event. To display `NOTICE`, `INFORMATIONAL`, or `DEBUG` severity-level events, use the `-severity` option.
+ **Event** – The event name and message.

To display detailed information about events, use one or more of the event options listed in the following table.


| Event option | Description | 
| --- | --- | 
|  `-detail`  |  Displays additional event information.  | 
|  `-detailtime`  |  Displays detailed event information in reverse chronological order.  | 
|  `-instance`  |  Displays detailed information about all fields.  | 
|  `-node nodename\|local`  |  Displays a list of events for the node that you specify. Use this option with `-seqnum` to display detailed information.  | 
| `-seqnum sequence_number` | Selects the events that match this number in the sequence. Use with `-node` to display detailed information. | 
| `-time MM/DD/YYYY HH:MM:SS` | Selects the events that happened at this specific time. Use the format: MM/DD/YYYY HH:MM:SS [\$1- HH:MM]. You can specify a time range by using the `..` operator between two time statements. <pre>event log show -time "04/17/2023 05:55:00".."04/17/2023 06:10:00"</pre> Comparative time values are relative to the current time when you run the command. The following example shows how to display only events that occurred within the last minute: <pre>event log show -time >1m</pre> The month and date fields of this option are not zero-padded. These fields can be single digits; for example, `4/1/2023 06:45:00`.  | 
|  `-severity sev_level`  |  Selects the events that match the *sev\$1level* value, which must be one of the following: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/fsx/latest/ONTAPGuide/ems-events.html) To display all events, specify severity as follows: <pre>event log show -severity <=DEBUG</pre>  | 
|  `-ems-severity ems_sev_level`  |  Selects the events that match the *ems\$1sev\$1level* value, which must be one of the following: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/fsx/latest/ONTAPGuide/ems-events.html) To display all events, specify severity as follows: <pre>event log show -ems-severity <=DEBUG</pre>  | 
|  `-source text`  |  Selects the events that match the *text* value. The source is typically a software module.  | 
|  `-message-name message_name`  |  Selects the events that match the *message\$1name* value. Message names are descriptive, so filtering output by message name displays messages of a specific type.  | 
|  `-event text`  |  Selects the events that match the *text* value. The `event` field contains the full text of the event, including any parameters.  | 
|  `-kernel-generation-num integer`  |  Selects the events that match the *integer* value. Only events that come from the kernel have kernel generation numbers.  | 
|  `-kernel-sequence-num integer`  |  Selects the events that match the *integer* value. Only events that come from the kernel have kernel sequence numbers.  | 
|  `-action text`  |  Selects the events that match the *text* value. The `action` field describes what corrective action, if any, you must take to remedy the situation.  | 
|  `-description text`  |  Selects the events that match the *text* value. The `description` field describes why the event happened and what it means.  | 
|  `-filter-name filter_name`  |  Selects the events that match the *filter\$1name* value. Only events that are included by existing filters that match this value display.  | 
|  `-fields fieldname,...`  |  Indicates that the command output also includes the specified field or fields. You can use `-fields ?` to choose the fields that you want to specify.  | 

**To view EMS events**

1. To SSH into the NetApp ONTAP CLI of your file system, follow the steps documented in the [Using the NetApp ONTAP CLI](managing-resources-ontap-apps.md#netapp-ontap-cli) section of the *Amazon FSx for NetApp ONTAP User Guide*.

   ```
   ssh fsxadmin@file-system-management-endpoint-ip-address
   ```

1. Use the `event log show` command to display the contents of the event log.

   ```
   ::> event log show
   Time                Node          Severity      Event
   ------------------- ------------- ------------- ------------------------
   6/30/2023 13:54:19  node1         NOTICE        vifmgr.portup: A link up event was received on node node1, port e0a.
   6/30/2023 13:54:19  node1         NOTICE        vifmgr.portup: A link up event was received on node node1, port e0d.
   ```

For information about the EMS events returned by the `event log show` command, refer to the [ONTAP EMS Reference](https://docs.netapp.com/us-en/ontap-ems-9121/index.html) in the NetApp ONTAP Documentation Center.

## EMS event forwarding to a Syslog server


You can configure EMS events to forward notifications to a Syslog server. EMS event forwarding is used for real-time monitoring of your file system to determine and isolate root causes for a wide range of issues. If your environment doesn't already contain a Syslog server for event notifications, you must first create one. DNS must be configured on the file system to resolve the Syslog server name.

**Note**  
Your Syslog destination must be located in the primary subnet that is used by your file system.

**To configure EMS events to forward notifications to a Syslog server**

1.  To SSH into the NetApp ONTAP CLI of your file system, follow the steps documented in the [Using the NetApp ONTAP CLI](managing-resources-ontap-apps.md#netapp-ontap-cli) section of the *Amazon FSx for NetApp ONTAP User Guide*.

   ```
   ssh fsxadmin@file-system-management-endpoint-ip-address
   ```

1. Use the [event notification destination create](https://docs.netapp.com/us-en/ontap-cli-9131/event-notification-destination-create.html) command to create an event notification destination of type `syslog`, specifying the following attributes:
   + `dest_name` – The name of the notification destination that is to be created (for example, `syslog-ems`). An event notification destination name must be 2 to 64 characters long. Valid characters are the following ASCII characters: A-Z, a-z, 0-9, "\$1", and "-". The name must start and end with: A-Z, a-z, or 0-9.
   + `syslog_name` – The Syslog server host name or IP address that Syslog messages are sent to.
   + `transport_protocol` – The protocol used to send the events:
     + `udp-unencrypted` – User Datagram Protocol with no security. This is the default protocol.
     + `tcp-unencrypted` – Transmission Control Protocol with no security.
     + `tcp-encrypted` – Transmission Control Protocol with Transport Layer Security (TLS). When this option is specified, FSx for ONTAP verifies the identity of the destination host by validating its certificate.
   + `port_number` – The Syslog server port that Syslog messages are sent to. The default value `syslog-port` parameter depends on the setting for the `syslog-transport` parameter. If `syslog-transport` is set to `tcp-encrypted`, the `syslog-port` default value is `6514`. If `syslog-transport` is set to `tcp-unencrypted`, `syslog-port` has the default value `601`. Otherwise, the default port is set to `514`.

   ```
   ::> event notification destination create -name dest_name -syslog syslog_name -syslog-transport transport_protocol -syslog-port port_number
   ```

1. Use the [event notification create](https://docs.netapp.com/us-en/ontap-cli-9131/event-notification-create.html) command to create a new notification of a set of events defined by an event filter to the notification destination created in the previous step, specifying the following attributes:
   + `node_name` – The name of the event filter. Events that are included in the event filter are forwarded to the destinations specified in the `-destinations` parameter.
   + `dest_name` – The name of the existing notification destination that the event notifications are sent to.

   ```
   ::> event notification create -filter-name filter_name -destinations dest_name
   ```

1. If you selected TCP as the `transport_protocol`, you can use the `event notification destination check` command to generate a test message and verify your setup works. Specify the following attributes with the command:
   + `node_name` – The name of the node (for example, `FsxId07353f551e6b557b4-01`).
   + `dest_name` – The name of the existing notification destination that the event notifications are sent to.

   ```
   ::> set diag
   ::*> event notification destination check -node node_name -destination-name dest_name
   ```

# Monitoring with Data Infrastructure Insights


NetApp Data Infrastructure Insights (formerly Cloud Insights) is a NetApp service that you can use to monitor your Amazon FSx for NetApp ONTAP file systems alongside your other NetApp storage solutions. With Data Infrastructure Insights, you can monitor configuration, capacity, and performance metrics over time to understand your workload's trends and plan for future performance and storage capacity needs. You can also create alerts based on metric conditions that can integrate with your existing workflows and productivity tools.

**Note**  
Data Infrastructure Insights isn't supported for second-generation file systems with more than one HA pair. 

Data Infrastructure Insights provides:
+ **A breadth of metrics and logs** – Collect configuration, capacity, and performance metrics. Understand how your workload is trending with predefined dashboards, alerts, and reports.
+ **User analytics and ransomware protection** – With Cloud Secure and ONTAP snapshots you can audit, detect, stop, and repair incidents of user error and ransomware.
+ **SnapMirror reporting** – Understand your SnapMirror relationships and set alerts on replication issues.
+ **Capacity planning** – Understand the resource requirements of on-premises workloads to help you migrate your workload to a more efficient FSx for ONTAP configuration. You can also use these insights to plan for when more performance or capacity will be needed for your FSx for ONTAP deployment.

For more information, see [Data Infrastructure Insights documentation](https://docs.netapp.com/us-en/data-infrastructure-insights/index.html) in the NetApp ONTAP Product Documentation. 

# Monitoring FSx for ONTAP file systems using Harvest and Grafana
Monitoring with Harvest and Grafana

NetApp Harvest is an open source tool for gathering performance and capacity metrics from ONTAP systems, and is compatible with FSx for ONTAP. You can use Harvest with Grafana for an open source monitoring solution.

## Getting started with Harvest and Grafana


The following section details how you can set up and configure Harvest and Grafana to measure your FSx for ONTAP file system’s performance and storage capacity utilization. 

You can monitor your Amazon FSx for NetApp ONTAP file system by using Harvest and Grafana. NetApp Harvest monitors ONTAP data centers by collecting performance, capacity, and hardware metrics from FSx for ONTAP file systems. Grafana provides a dashboard where the collected Harvest metrics can be displayed.

## Supported Harvest dashboards


Amazon FSx for NetApp ONTAP exposes a different set of metrics than does on-premises NetApp ONTAP. Therefore, only the following out-of-the-box Harvest dashboards tagged with `fsx` are currently supported for use with FSx for ONTAP. Some of the panels in these dashboards may be missing information that is not supported.
+ Harvest: Metadata
+ ONTAP: Aggregate
+ ONTAP: cDOT
+ ONTAP: Cluster
+ ONTAP: Compliance
+ ONTAP: Datacenter
+ ONTAP: Data Protection
+ ONTAP: LUN
+ ONTAP: Network
+ ONTAP: Node
+ ONTAP: Qtree
+ ONTAP: Security
+ ONTAP: SnapMirror
+ ONTAP: SnapMirror Destinations
+ ONTAP: SnapMirror Sources
+ ONTAP: SVM
+ ONTAP: Volume
+ ONTAP: Volume by SVM
+ ONTAP: Volume Deep Dive

The following Harvest dashboards are supported by FSx for ONTAP, but are not enabled by default in Harvest.
+ ONTAP: FlexCache
+ ONTAP: FlexGroup
+ ONTAP: NFS Clients
+ ONTAP: NFSv4 Storepool Monitors
+ ONTAP: NFS Troubleshooting
+ ONTAP: NVMe Namespaces
+ ONTAP: SMB
+ ONTAP: Workload

## Unsupported Harvest dashboards


The following Harvest dashboards are *not* supported by FSx for ONTAP.
+ ONTAP: Disk
+ ONTAP: External Service Operation
+ ONTAP: File Systems Analytics (FSA)
+ ONTAP: Headroom
+ ONTAP: Health
+ ONTAP: MAV Request
+ ONTAP: MetroCluster
+ ONTAP: Power
+ ONTAP: Shelf
+ ONTAP: S3 Object Stores

## CloudFormation template


To get started, you can deploy an CloudFormation template that automatically launches an Amazon EC2 instance running Harvest and Grafana. As an input to the CloudFormation template, you specify the `fsxadmin` user and the Amazon FSx management endpoint for the file system which will be added as part of this deployment. After the deployment is completed, you can log in to the Grafana dashboard to monitor your file system.

This solution uses CloudFormation to automate the deployment of the Harvest and Grafana solution. The template creates an Amazon EC2 Linux instance and installs Harvest and Grafana software. To use this solution, download the [fsx-ontap-harvest-grafana.template](https://solution-references.s3.amazonaws.com/fsx/harvest-grafana/harvest-grafana.yaml) CloudFormation template.

**Note**  
Implementing this solution incurs billing for the associated AWS services. For more information, see the pricing details pages for those services.

## Amazon EC2 instance types


When configuring the template, you provide the Amazon EC2 instance type. NetApp's recommendation for the instance size depends on how many file systems you monitor and the number of metrics you choose to collect. With the default configuration, for each 10 file systems you monitor, NetApp recommends:
+ CPU: 2 cores
+ Memory: 1 GB
+ Disk: 500 MB (mostly used by log files)

Following are some sample configurations and the `t3` instance type you might choose.


****  

| File systems | CPU | Disk | Instance type | 
| --- | --- | --- | --- | 
|  Under 10  |  2 cores  |  500 MB  |  `t3.micro`  | 
|  10–40  |  4 cores  |  1000 MB  |  `t3.xlarge`  | 
|  40\$1  |  8 cores  |  2000 MB  |  `t3.2xlarge`  | 

For more information on Amazon EC2 instance types, see [General purpose instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/general-purpose-instances.html) in the *Amazon EC2 User Guide*.

### Instance port rules


When you set up your Amazon EC2 instance, make sure that ports 3000 and 9090 are open for inbound traffic for the security group that the Amazon EC2 Harvest and Grafana instance is in. Because the instance that is launched connects to an endpoint over HTTPS, it needs to resolve the endpoint, which needs port 53 TCP/UDP for DNS. Additionally, to reach the endpoint it needs port 443 TCP for HTTPS and Internet Access.

## Deployment procedure


The following procedure configures and deploys the Harvest/Grafana solution. It takes about five minutes to deploy. Before you start, you must have an FSx for ONTAP file system running in an Amazon Virtual Private Cloud (Amazon VPC) in your AWS account, and the parameter information for the template listed below. For more information on creating a file system, see [Creating file systems](creating-file-systems.md).

**To launch the Harvest/Grafana solution stack**

1. Download the [fsx-ontap-harvest-grafana.template](https://solution-references.s3.amazonaws.com/fsx/harvest-grafana/harvest-grafana.yaml) CloudFormation template. For more information on creating an CloudFormation stack, see [Creating a stack on the AWS CloudFormation console](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-console-create-stack.html) in the *AWS CloudFormation User Guide*.
**Note**  
By default, this template launches in the US East (N. Virginia) AWS Region. You must launch this solution in an AWS Region where Amazon FSx is available. For more information, see [Amazon FSx endpoints and quotas](https://docs.aws.amazon.com/general/latest/gr/fsxn.html) in the *AWS General Reference. *

1. For **Parameters**, review the parameters for the template and modify them for the needs of your file system. This solution uses the following default values.  
****    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/fsx/latest/ONTAPGuide/monitoring-harvest-grafana.html)

1. Choose **Next**.

1. For **Options**, choose **Next**.

1. For **Review**, review and confirm the settings. You must select the check box acknowledging that the template create IAM resources.

1. Choose **Create** to deploy the stack.

You can view the status of the stack in the CloudFormation console in the **Status** column. You should see a status of **CREATE\$1COMPLETE** in about five minutes.

## Logging in to Grafana


After the deployment has finished, use your browser to log in to the Grafana dashboard at the IP and port 3000 of the Amazon EC2 instance:

```
http://EC2_instance_IP:3000
```

When prompted, use the Grafana default user name (`admin`) and password (`pass`). We recommend that you change your password as soon as you log in.

For more information, see the [ NetApp Harvest](https://github.com/NetApp/harvest) page on GitHub.

## Troubleshooting Harvest and Grafana


If you are encountering any data missing mentioned in Harvest and Grafana dashboards or are having trouble setting up Harvest and Grafana with FSx for ONTAP, check the following topics for a potential solution.

**Topics**
+ [

### SVM and volume dashboards are blank
](#svm-volume-blank-dashboards)
+ [

### CloudFormation stack rolled back after timeout
](#cfn-stack-rolled-back)

### SVM and volume dashboards are blank


If the CloudFormation stack deployed successfully and can contact Grafana but the SVM and volume dashboards are blank, use the following procedure to troubleshoot your environment. You will need SSH access to the Amazon EC2 instance that Harvest and Grafana is deployed on.

1. SSH into the Amazon EC2 instance that your Harvest and Grafana clients are running on.

   ```
   [~]$ ssh ec2-user@ec2_ip_address
   ```

1. Use the following command to open the `harvest.yml` file and:
   + Verify that an entry was created for your FSx for ONTAP instance as `Cluster-2`.
   + Verify that the entries for username and password match your `fsxadmin` credentials.

   ```
   [ec2-user@ip-ec2_ip_address ~]$ sudo cat /home/ec2-user/harvest_install/harvest/harvest.yml
   ```

1. If the password field is blank, open the file in an editor and update it with the `fsxadmin` password, as follows:

   ```
   [ec2-user@ip-ec2_ip_address ~]$ sudo vi /home/ec2-user/harvest_install/harvest/harvest.yml
   ```

1. Ensure the `fsxadmin` user credentials are stored in Secrets Manager in the following format for any future deployments, replacing `fsxadmin_password` with your password.

   ```
   {"username" : "fsxadmin", "password" : "fsxadmin_password"}
   ```

### CloudFormation stack rolled back after timeout


If you are unable to deploy the CloudFormation stack successfully and it is rolling back with errors, use the following procedure to resolve the issue. You will need SSH access to the EC2 instance deployed by the CloudFormation stack.

1. Redeploy the CloudFormation stack, making sure that automatic rollback is disabled.

1. SSH into the Amazon EC2 instance that your Harvest and Grafana clients are running on.

   ```
   [~]$ ssh ec2-user@ec2_ip_address
   ```

1. Verfy that the docker containers were successfully started using the following command.

   ```
   [ec2-user@ip-ec2_ip_address ~]$ sudo docker ps
   ```

   In the response you should see five containers as follows:

   ```
   CONTAINER ID   IMAGE                   COMMAND                  CREATED         STATUS                          PORTS                    NAMES
   6b9b3f2085ef   rahulguptajss/harvest   "bin/poller --config…"   8 minutes ago   Restarting (1) 20 seconds ago                            harvest_cluster-2
   3cf3e3623fde   rahulguptajss/harvest   "bin/poller --config…"   8 minutes ago   Up About a minute                                        harvest_cluster-1
   708f3b7ef6f8   grafana/grafana         "/run.sh"                8 minutes ago   Up 8 minutes                    0.0.0.0:3000->3000/tcp   harvest_grafana
   0febee61cab7   prom/alertmanager       "/bin/alertmanager -…"   8 minutes ago   Up 8 minutes                    0.0.0.0:9093->9093/tcp   harvest_prometheus_alertmanager
   1706d8cd5a0c   prom/prometheus         "/bin/prometheus --c…"   8 minutes ago   Up 8 minutes                    0.0.0.0:9090->9090/tcp   harvest_prometheus
   ```

1. If the docker containers are not running, check for failures in the `/var/log/cloud-init-output.log` file as follows.

   ```
   [ec2-user@ip-ec2_ip_address ~]$ sudo cat /var/log/cloud-init-output.log
        PLAY [Manage Harvest] **********************************************************
    
   TASK [Gathering Facts] *********************************************************
   ok: [localhost]
    
   TASK [Verify images] ***********************************************************
   failed: [localhost] (item=prom/prometheus) => {"ansible_loop_var": "item", "changed": false, "item": "prom/prometheus",
   "msg": "Error connecting: Error while fetching server API version: ('Connection aborted.', ConnectionResetError(104, 'Co
   nnection reset by peer'))"}
   failed: [localhost] (item=prom/alertmanager) => {"ansible_loop_var": "item", "changed": false, "item": "prom/alertmanage
   r", "msg": "Error connecting: Error while fetching server API version: ('Connection aborted.', ConnectionResetError(104,
   'Connection reset by peer'))"}
   failed: [localhost] (item=rahulguptajss/harvest) => {"ansible_loop_var": "item", "changed": false, "item": "rahulguptajs
   s/harvest", "msg": "Error connecting: Error while fetching server API version: ('Connection aborted.', ConnectionResetEr
   ror(104, 'Connection reset by peer'))"}
   failed: [localhost] (item=grafana/grafana) => {"ansible_loop_var": "item", "changed": false, "item": "grafana/grafana",
   "msg": "Error connecting: Error while fetching server API version: ('Connection aborted.', ConnectionResetError(104, 'Co
   nnection reset by peer'))"}
    
   PLAY RECAP *********************************************************************
   localhost                  : ok=1    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0
   ```

1. If there are failures, execute the following commands to deploy the Harvest and Grafana containers.

   ```
   [ec2-user@ip-ec2_ip_address ~]$ sudo su
        [ec2-user@ip-ec2_ip_address ~]$ cd /home/ec2-user/harvest_install
        [ec2-user@ip-ec2_ip_address ~]$ /usr/local/bin/ansible-playbook manage_harvest.yml
        [ec2-user@ip-ec2_ip_address ~]$ /usr/local/bin/ansible-playbook manage_harvest.yml --tags api
   ```

1. Validate the containers started successfully by running **sudo docker ps** and connecting to your Harvest and Grafana URL.

# Monitoring FSx for ONTAP API Calls with AWS CloudTrail
Monitoring with AWS CloudTrail

Amazon FSx is integrated with AWS CloudTrail, a service that provides a record of actions taken by a user, role, or an AWS service in Amazon FSx. CloudTrail captures all Amazon FSx API calls for Amazon FSx for NetApp ONTAP as events. Captured calls include calls from the Amazon FSx console and from code calls to Amazon FSx API operations.

If you create a trail, you can enable continuous delivery of CloudTrail events to an Amazon S3 bucket, including events for Amazon FSx. If you don't configure a trail, you can still view the most recent events in the CloudTrail console in **Event history**. Using the information collected by CloudTrail, you can determine the request that was made to Amazon FSx. You can also determine the IP address from which the request was made, who made the request, when it was made, and additional details. 

To learn more about CloudTrail, see the [AWS CloudTrail User Guide](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/).

## Amazon FSx Information in CloudTrail


CloudTrail is enabled on your AWS account when you create the account. When API activity occurs in Amazon FSx, that activity is recorded in a CloudTrail event along with other AWS service events in **Event history**. You can view, search, and download recent events in your AWS account. For more information, see [Viewing events with CloudTrail Event history](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/view-cloudtrail-events.html). 

For an ongoing record of events in your AWS account, including events for Amazon FSx, create a trail. A *trail* enables CloudTrail to deliver log files to an Amazon S3 bucket. By default, when you create a trail in the console, the trail applies to all AWS Regions. The trail logs events from all AWS Regions in the AWS partition and delivers the log files to the Amazon S3 bucket that you specify. Additionally, you can configure other AWS services to further analyze and act upon the event data collected in CloudTrail logs. For more information, see the following topics in the *AWS CloudTrail User Guide:* 
+ [Creating a trail for your AWS account](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-create-and-update-a-trail.html)
+ [AWS service integrations with CloudTrail Logs](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-aws-service-specific-topics.html#cloudtrail-aws-service-specific-topics-integrations)
+ [Configuring Amazon SNS notifications for CloudTrail](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/getting_notifications_top_level.html)
+ [Receiving CloudTrail log files from multiple regions](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/receive-cloudtrail-log-files-from-multiple-regions.html) and [Receiving CloudTrail log files from multiple accounts](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-receive-logs-from-multiple-accounts.html)

All Amazon FSx [API calls](https://docs.aws.amazon.com/fsx/latest/APIReference/Welcome.html) are logged by CloudTrail. For example, calls to the `CreateFileSystem` and `TagResource` operations generate entries in the CloudTrail log files. 

Every event or log entry contains information about who generated the request. The identity information helps you determine the following: 
+ Whether the request was made with root or AWS Identity and Access Management (IAM) user credentials.
+ Whether the request was made with temporary security credentials for a role or federated user.
+ Whether the request was made by another AWS service.

For more information, see the [CloudTrail userIdentity element](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-event-reference-user-identity.html) in the *AWS CloudTrail User Guide.*

## Understanding Amazon FSx Log File Entries


A *trail* is a configuration that enables delivery of events as log files to an Amazon S3 bucket that you specify. CloudTrail log files contain one or more log entries. An *event* represents a single request from any source and includes information about the requested action, the date and time of the action, request parameters, and so on. CloudTrail log files aren't an ordered stack trace of the public API calls, so they don't appear in any specific order. 

The following example shows a CloudTrail log entry that demonstrates the `TagResource` operation when a tag for a file system is created from the console.

```
{
    "eventVersion": "1.05",
    "userIdentity": {
        "type": “Root”,
        "principalId": “111122223333”,
        "arn": "arn:aws:sts::111122223333:root”,
        "accountId": “111122223333”,
        "accessKeyId": "AKIAIOSFODNN7EXAMPLE”,
        "sessionContext": {
            "attributes": {
                "mfaAuthenticated": "false",
                "creationDate": "2018-11-14T22:36:07Z"
            }
        }
    },
    "eventTime": "2018-11-14T22:36:07Z",
    "eventSource": "fsx.amazonaws.com",
    "eventName": "TagResource",
    "awsRegion": "us-east-1",
    "sourceIPAddress": “192.0.2.0”,
    "userAgent": “console.amazonaws.com”,
    "requestParameters": {
        "resourceARN": "arn:aws:fsx:us-east-1:111122223333:file-system/fs-ab12cd34ef56gh789”
    },
    "responseElements": null,
    "requestID": “aEXAMPLE-abcd-1234-56ef-b4cEXAMPLE51”,
    "eventID": “bEXAMPLE-gl12-3f5h-3sh4-ab6EXAMPLE9p”,
    "eventType": "AwsApiCall",
    "apiVersion": "2018-03-01",
    "recipientAccountId": “111122223333”
}
```

The following example shows a CloudTrail log entry that demonstrates the `UntagResource` action when a tag for a file system is deleted from the console.

```
{
    "eventVersion": "1.05",
    "userIdentity": {
        "type": “Root”,
        "principalId": "111122223333",
        "arn": "arn:aws:sts::111122223333:root",
        "accountId": "111122223333",
        "accessKeyId": "AKIAIOSFODNN7EXAMPLE",
        "sessionContext": {
            "attributes": {
                "mfaAuthenticated": "false",
                "creationDate": "2018-11-14T23:40:54Z"
            }
        }
    },
    "eventTime": "2018-11-14T23:40:54Z",
    "eventSource": "fsx.amazonaws.com",
    "eventName": "UntagResource",
    "awsRegion": "us-east-1",
    "sourceIPAddress": "192.0.2.0",
    "userAgent": "console.amazonaws.com",
    "requestParameters": {
        "resourceARN": "arn:aws:fsx:us-east-1:111122223333:file-system/fs-ab12cd34ef56gh789"
    },
    "responseElements": null,
    "requestID": "aEXAMPLE-abcd-1234-56ef-b4cEXAMPLE51",
    "eventID": "bEXAMPLE-gl12-3f5h-3sh4-ab6EXAMPLE9p",
    "eventType": "AwsApiCall",
    "apiVersion": "2018-03-01",
    "recipientAccountId": "111122223333"
}
```