

# Anomaly detection in Amazon OpenSearch Service
Anomaly detection

Anomaly detection in Amazon OpenSearch Service automatically detects anomalies in your OpenSearch data in near-real time by using the Random Cut Forest (RCF) algorithm. RCF is an unsupervised machine learning algorithm that models a sketch of your incoming data stream. The algorithm computes an `anomaly grade` and `confidence score` value for each incoming data point. Anomaly detection uses these values to differentiate an anomaly from normal variations in your data. 

You can pair the anomaly detection plugin with the [Alerting plugin](alerting.md) to notify you as soon as an anomaly is detected. 

Anomaly detection is available on domains running any OpenSearch version or Elasticsearch 7.4 or later. All instance types support anomaly detection except for `t2.micro` and `t2.small`. 

**Note**  
This documentation provides a brief overview of anomaly detection in the context of Amazon OpenSearch Service. For comprehensive documentation, including detailed steps, an API reference, a reference of all available settings, and steps to create visualizations and dashboards, see [Anomaly detection](https://opensearch.org/docs/latest/monitoring-plugins/ad/index/) in the open source OpenSearch documentation.

## Prerequisites


Anomaly detection has the following prerequisites:
+ Anomaly detection requires OpenSearch or Elasticsearch 7.4 or later. 
+ Anomaly detection only supports [fine-grained access control](fgac.md) on Elasticsearch versions 7.9 and later and all versions of OpenSearch. Prior to Elasticsearch 7.9, only admin users can create, view, and manage detectors. 
+ If your domain uses fine-grained access control, non-admin users must be [mapped](fgac.md#fgac-mapping) to the `anomaly_read_access` role in OpenSearch Dashboards in order to view detectors, or `anomaly_full_access` in order to create and manage detectors.

## Getting started with anomaly detection


To get started, choose **Anomaly Detection** in OpenSearch Dashboards.

### Step 1: Create a detector


A detector is an individual anomaly detection task. You can create multiple detectors, and all the detectors can run simultaneously, with each analyzing data from different sources.

### Step 2: Add features to your detector


A feature is the field in your index that you check for anomalies. A detector can discover anomalies across one or more features. You must choose one of the following aggregations for each feature: `average()`, `sum()`, `count()`, `min()`, or `max()`. 

**Note**  
The `count()` aggregation method is only available in OpenSearch and Elasticsearch 7.7 or later. For Elasticsearch 7.4, use a custom expression like the following:  

```
{
  "aggregation_name": {
     "value_count": {
        "field": "field_name"
     }
  }
}
```

The aggregation method determines what constitutes an anomaly. For example, if you choose `min()`, the detector focuses on finding anomalies based on the minimum values of your feature. If you choose `average()`, the detector finds anomalies based on the average values of your feature. You can add a maximum of five features per detector.

You can configure the following optional settings (available in Elasticsearch 7.7 and later):
+ **Category field** - Categorize or slice your data with a dimension like IP address, product ID, country code, and so on.
+ **Window size** - Set the number of aggregation intervals from your data stream to consider in a detection window.

After you set up your features, preview sample anomalies and adjust the feature settings if necessary.

### Step 3: Observe the results


![\[The following visualizations are available on the anomaly detection dashboard:\]](http://docs.aws.amazon.com/opensearch-service/latest/developerguide/images/ad.png)

+ **Live anomalies** - displays the live anomaly results for the last 60 intervals. For example, if the interval is set to 10, it shows the results for the last 600 minutes. This chart refreshes every 30 seconds.
+ **Anomaly history** - plots the anomaly grade with the corresponding measure of confidence.
+ **Feature breakdown** - plots the features based on the aggregation method. You can vary the date-time range of the detector.
+ **Anomaly occurrence** - shows the `Start time`, `End time`, `Data confidence`, and `Anomaly grade` for each anomaly detected.

  If you set the category field, you see an additional **Heat map** chart that correlates results for anomalous entities. Choose a filled rectangle to see a more detailed view of the anomaly.

### Step 4: Set up alerts


To create a monitor to send you notifications when any anomalies are detected, choose **Set up alerts**. The plugin redirects you to the [Add monitor](https://docs.opensearch.org/latest/observing-your-data/alerting/monitors/) page where you can configure an alert.

# Tutorial: Detect high CPU usage with anomaly detection
Tutorial: Detect high CPU usage with anomaly detection

This tutorial demonstrates how to create an anomaly detector in Amazon OpenSearch Service to detect high CPU usage. You'll use OpenSearch Dashboards to configure a detector to monitor CPU usage, and generate an alert when your CPU usage rises above a specified threshold. 

**Note**  
These steps apply to the latest version of OpenSearch and might differ slightly for past versions.

## Prerequisites

+ You must have an OpenSearch Service domain running Elasticsearch 7.4 or later, or any OpenSearch version.
+ You must be ingesting application log files into your cluster that contain CPU usage data.

## Step 1: Create a detector


First, create a detector that identifies anomalies in your CPU usage data. 

1. Open the left panel menu in OpenSearch Dashboards and choose **Anomaly Detection**, then choose **Create detector**.

1. Name the detector **high-cpu-usage**. 

1. For your data source, choose your index that contains CPU usage log files where you want to identify anomalies.

1. Choose the **Timestamp field** from your data. Optionally, you can add a data filter. This data filter analyzes only a subset of the data source and reduces the noise from data that's not relevant.

1. Set the **Detector interval** to **2** minutes. This interval defines the time (by minute interval) for the detector to collect the data.

1.  In **Window delay**, add a **1-minute** delay. This delay adds extra processing time to ensure that all data within the window is present. 

1. Choose **Next**. On the anomaly detection dashboard, under the detector name, choose **Configure model**.

1. For **Feature name**, enter **max\$1cpu\$1usage**. For **Feature state**, select **Enable feature**. 

1. For **Find anomalies based on**, choose **Field value**.

1. For **Aggregation method**, choose **`max()`**.

1. For **Field**, select the field in your data to check for anomalies. For example, it might be called `cpu_usage_percentage`.

1. Keep all other settings as their defaults and choose **Next**.

1. Ignore the detector jobs setup and choose **Next**.

1. In the pop-up window, choose when to start the detector (automatically or manually), and then choose **Confirm**.

Now that the detector is configured, after it initializes, you will be able to see real-time results of the CPU usage in the **Real-time results** section of your detector panel. The **Live anomalies** section displays any anomalies that occur as data is being ingested in real time. 

## Step 2: Configure an alert


Now that you've created a detector, create a monitor that invokes an alert to send a message to Slack when it detects CPU usage that meets the conditions specified in the detector settings. You'll receive Slack notifications when data from one or more indexes meets the conditions that invoke the alert. 

1. Open the left panel menu in OpenSearch Dashboards and choose **Alerting**, then choose **Create monitor**.

1. Provide a name for the monitor.

1. For **Monitor type**, choose **Per-query monitor**. A per-query monitor runs a specified query and defines the triggers.

1. For **Monitor defining method**, choose **Anomaly detector**, then select the detector that you created in the previous section from the **Detector** dropdown menu.

1. For **Schedule**, choose how often the monitor collects data and how often you receive alerts. For the purposes of this tutorial, set the schedule to run every **7** minutes.

1. In the **Triggers** section, choose **Add trigger**. For **Trigger name**, enter **High CPU usage**. For this tutorial, for **Severity level**, choose **1**, which is the highest level of severity.

1. For **Anomaly grade threshold**, choose **IS ABOVE**. On the menu under that, choose the grade threshold to apply. For this tutorial, set the **Anomaly grade** to **0.7**.

1. For **Anomaly confidence threshold**, choose **IS ABOVE**. On the menu under that, enter the same number as your Anomaly grade. For this tutorial, set the **Anomaly confidence threshold** to **0.7**.

1. In the **Actions** section, choose **Destination**. In the **Name** field, choose the name of the destination. On the **Type** menu, choose **Slack**. In the **Webhook URL** field, enter a webhook URL to receive alerts to. For more information, see [Sending messages using incoming webhooks](https://api.slack.com/messaging/webhooks).

1. Choose **Create**.

## Related resources

+  [Configuring alerts in Amazon OpenSearch Service](alerting.md)
+  [Anomaly detection in Amazon OpenSearch Service](ad.md) 
+  [Anomaly detection API](https://opensearch.org/docs/latest/monitoring-plugins/ad/api/) 