

# CloudWatch observability solutions
<a name="Monitoring-Solutions"></a>

CloudWatch observability solutions offer a catalog of readily available configurations to help you quickly implement monitoring for various AWS services and common workloads, such as Java Virtual Machines (JVM), Apache Kafka, Apache Tomcat, and NGINX. These solutions provide focused guidance on key monitoring tasks, including the installation and configuration of the CloudWatch agent, deployment of pre-defined custom dashboards, and setup of metric alarms. They are designed to assist developers and operations teams in leveraging AWS monitoring and observability tools more effectively.

The solutions include guidance on when to use specific observability features like Detailed Monitoring metrics for infrastructure, Container Insights for container monitoring, and Application Signals for application monitoring. By providing working examples and practical configurations, these solutions aim to simplify the initial setup process, allowing you to establish functional monitoring more quickly and customize as needed for their specific requirements.

To get started with observability solutions, visit the [observability solutions page](https://console.aws.amazon.com/cloudwatch/home?#settings:/observability-solutions) in the CloudWatch console.

For open-source solutions that work with Amazon Managed Grafana, see [Amazon Managed Grafana solutions](https://docs.aws.amazon.com/grafana/latest/userguide/AMG_solutions.html)

Solutions that require CloudWatch agent are detailed below:

**Topics**
+ [CloudWatch solution: JVM workload on Amazon EC2](Solution-JVM-On-EC2.md)
+ [CloudWatch solution: NGINX workload on Amazon EC2](Solution-NGINX-On-EC2.md)
+ [CloudWatch solution: NVIDIA GPU workload on Amazon EC2](Solution-NVIDIA-GPU-On-EC2.md)
+ [CloudWatch solution: Kafka workload on Amazon EC2](Solution-Kafka-On-EC2.md)
+ [CloudWatch solution: Tomcat workload on Amazon EC2](Solution-Tomcat-On-EC2.md)
+ [CloudWatch solution: Amazon EC2 health](Solution-EC2-Health.md)

**How do solution dashboards work?**  
The dashboards for CloudWatch solutions use search-powered variables (dropdowns) that allow you to explore and visualize different aspects of your workloads dynamically.  
By combining the flexibility of search-powered variables with the pre-configured [metric widgets](create-and-work-with-widgets.md), the dashboard provides deep insights into your workloads, enabling proactive monitoring, troubleshooting, and optimization. This dynamic approach ensures that you can quickly adapt the dashboard to your specific monitoring needs, without the need for extensive customization or configuration.

**Do solutions support cross-Region observability?**  
CloudWatch solution dashboards display metrics of the Region where the solution dashboard is created. However, the solution dashboard doesn’t display metrics across multiple Regions. If you have a use case to view data from multiple Regions in a single dashboard, you'll need to customize the dashboard JSON to add the Regions that you want to view. To do this, use the `region` attribute of the metric format to query the metrics from different Regions. For more information about modifying dashboard JSON, see [Metric Widget: Format for Each Metric in the Array](https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/CloudWatch-Dashboard-Body-Structure.html#CloudWatch-Dashboard-Properties-Metrics-Array-Format).

**Do solution dashboards support [Cross-account cross-Region CloudWatch console](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Cross-Account-Cross-Region.html)?**  
When using CloudWatch cross-account observability, solution dashboards in the central monitoring account display metrics from source accounts in the same Region. To differentiate metrics for similar workloads across accounts, provide unique grouping dimension values in agent configurations. For instance, assign distinct `ClusterName` values for Kafka brokers in different accounts for Kafka workload, enabling precise cluster selection and metric viewing in the dashboard.

**Do solution dashboards support [CloudWatch cross-account observability](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Unified-Cross-Account.html)?**  
If you have enabled cross-account using Cross-account cross-Region CloudWatch console, you won't be able to use the solution dashboard created in the monitoring account to view metrics from source accounts. Instead, you'll need to create dashboards in the respective source accounts. However, you can create the dashboard in the source account and view it from the monitoring account by switching the account ID setting in the console.

**What are the limitations for a solution dashboard?**  
Solution dashboards leverage Search expressions to filter and analyze metrics for the workloads. This enables dynamic views based on dropdown option selections. These search expressions might return more than 500 time series, but each dashboard widget can't display more than 500 time series. If a metric search in the solution dashboard results in more than 500 time series across all Amazon EC2; instances, the graph displaying the top contributors might show inaccurate results. For more information about search expressions, see [CloudWatch search expression syntax](search-expression-syntax.md).  
CloudWatch displays the metric information on the dashboards if you click the `i` icon on the dashboard widget. However, this currently doesn’t work for dashboard widgets that use search expressions. The solution dashboards use search expressions, so you won’t be able to see the metric description in the dashboard.

**Can I customize the agent configuration or the dashboard provided by a solution?**  
You can customize the agent configuration and the dashboard. Be aware that if you customize the agent configuration, you must update the dashboard accordingly or it will display empty metric widgets. Also be aware that if CloudWatch releases a new version of a solution, you might have to repeat your customizations if you apply the newer version of the solution. 

**How are solutions versioned?**  
Each solution provides the most up-to-date instructions and resources. We always recommend using the latest version available. While the solutions themselves are not versioned, the associated artifacts (such as CloudFormation templates for dashboards and agent installations) are versioned.  
You can identify the version of a previously deployed artifact by checking the CloudFormation template's description field or the filename of the template you downloaded. To determine if you're using the latest version, compare your deployed version with the one currently referenced in the solution documentation.

# CloudWatch solution: JVM workload on Amazon EC2
<a name="Solution-JVM-On-EC2"></a>

This solution helps you configure out-of-the-box metric collection using CloudWatch agents for JVM application running on EC2 instances. Additionally, it helps you set up a pre-configured CloudWatch dashboard. For general information about all CloudWatch observability solutions, see [CloudWatch observability solutions](Monitoring-Solutions.md).

**Topics**
+ [Requirements](#Solution-JVM-On-EC2-Requirements)
+ [Benefits](#Solution-JVM-On-EC2-Benefits)
+ [Costs](#Solution-JVM-On-EC2-Costs)
+ [CloudWatch agent configuration for this solution](#Solution-JVM-CloudWatch-Agent)
+ [Deploy the agent for your solution](#Solution-JVM-Agent-Deploy)
+ [Create the JVM solution dashboard](#Solution-JVM-Dashboard)

## Requirements
<a name="Solution-JVM-On-EC2-Requirements"></a>

This solution is relevant for the following conditions:
+ Supported versions: Java LTS versions 8, 11, 17, and 21
+ Compute: Amazon EC2
+ Supports up to 500 EC2 instances across all JVM workloads in a given AWS Region
+ Latest version of CloudWatch agent
+ SSM agent installed on EC2 instance
**Note**  
AWS Systems Manager (SSM agent) is pre-installed on some [Amazon Machine Images (AMIs)](https://docs.aws.amazon.com/systems-manager/latest/userguide/ami-preinstalled-agent.html) provided by AWS and trusted third-parties. If the agent isn't installed, you can install it manually using the procedure for your operating system type.  
[Manually installing and uninstalling SSM Agent on EC2 instances for Linux](https://docs.aws.amazon.com/systems-manager/latest/userguide/manually-install-ssm-agent-linux.html)
[Manually installing and uninstalling SSM Agent on EC2 instances for macOS](https://docs.aws.amazon.com/systems-manager/latest/userguide/manually-install-ssm-agent-macos.html)
[Manually installing and uninstalling SSM Agent on EC2 instances for Windows Server](https://docs.aws.amazon.com/systems-manager/latest/userguide/manually-install-ssm-agent-windows.html)

## Benefits
<a name="Solution-JVM-On-EC2-Benefits"></a>

The solution delivers JVM monitoring, providing valuable insights for the following use cases:
+ Monitor JVM heap and non-heap memory usage.
+ Analyze thread and class loading for concurrency issues.
+ Track garbage collection to identify memory leaks.
+ Switch between different JVM applications configured via the solution under the same account.

Below are the key advantages of the solution:
+ Automates metric collection for JVM using CloudWatch agent configuration, eliminating manual instrumentation.
+ Provides a pre-configured, consolidated CloudWatch dashboard for JVM metrics. The dashboard will automatically handle metrics from new JVM EC2 instances configured using the solution, even if those metrics don't exist when you first create the dashboard. It also allows you to group the metrics into logical applications for easier focus and management.

The following image is an example of the dashboard for this solution.

![\[Example of JVM dashboard\]](http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/images/JvmDashboard.png)


## Costs
<a name="Solution-JVM-On-EC2-Costs"></a>

This solution creates and uses resources in your account. You are charged for standard usage, including the following:
+ All metrics collected by the CloudWatch agent are charged as custom metrics. The number of metrics used by this solution depends on the number of EC2 hosts.
  + Each JVM host configured for the solution publishes a total of 18 metrics plus one metric (`disk_used_percent`) for which the metric count depends on the number of paths for the host.
+ One custom dashboard.
+ API operations requested by the CloudWatch agent to publish the metrics. With the default configuration for this solution, the CloudWatch agent calls the **PutMetricData** once every minute for each EC2 host. This means the **PutMetricData** API will be called `30*24*60=43,200` in a 30-day month for each EC2 host.

For more information about CloudWatch pricing, see [Amazon CloudWatch Pricing](https://aws.amazon.com/cloudwatch/pricing/).

The pricing calculator can help you estimate approximate monthly costs for using this solution. 

**To use the pricing calculator to estimate your monthly solution costs**

1. Open the [Amazon CloudWatch pricing calculator](https://calculator.aws/#/createCalculator/CloudWatch).

1. For **Choose a Region**, select the Region where you would like to deploy the solution.

1. In the **Metrics** section, for **Number of metrics**, enter **(18 \$1 average number of disk paths per EC2 host) \$1 number of EC2 instances configured for this solution**.

1. In the **APIs** section, for **Number of API requests**, enter **43200 \$1 number of EC2 instances configured for this solution**.

   By default, the CloudWatch agent performs one **PutMetricData** operation each minute for each EC2 host.

1. In the **Dashboards and Alarms** section, for **Number of Dashboards**, enter **1**.

1. You can see your monthly estimated costs at the bottom of the pricing calculator.

## CloudWatch agent configuration for this solution
<a name="Solution-JVM-CloudWatch-Agent"></a>

The CloudWatch agent is software that runs continuously and autonomously on your servers and in containerized environments. It collects metrics, logs, and traces from your infrastructure and applications and sends them to CloudWatch and X-Ray.

For more information about the CloudWatch agent, see [Collect metrics, logs, and traces using the CloudWatch agent](Install-CloudWatch-Agent.md).

The agent configuration in this solution collects the foundational metrics for the solution. The CloudWatch agent can be configured to collect more JVM metrics than the dashboard displays by default. For a list of all JVM metrics that you can collect, see [Collect JVM metrics](CloudWatch-Agent-JMX-metrics.md#CloudWatch-Agent-JVM-metrics). For general information about CloudWatch agent configuration, see [Metrics collected by the CloudWatch agent](metrics-collected-by-CloudWatch-agent.md).

**Expose JMX ports for the JVM application**

The CloudWatch agent relies on JMX to collect the metrics related to the JVM process. To make this possible, you must expose the JMX port from your JVM application. Instructions for exposing the JMX port depend on the workload type you are using for your JVM application. See the documentation for your application to find these instructions.

In general, to enable a JMX port for monitoring and management, you would set the following system properties for your JVM application. Be sure to specify an unused port number. The following example sets up unauthenticated JMX. If your security policies/requirements require you to enable JMX with password authentication or SSL for remote access, refer to the [JMX documentation ](https://docs.oracle.com/en/java/javase/17/management/monitoring-and-management-using-jmx-technology.html) to set the required property.

```
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=port-number
-Dcom.sun.management.jmxremote.authenticate=false 
-Dcom.sun.management.jmxremote.ssl=false
```

Review the starting scripts and configuration files of your application to find the best place to add these arguments. When you run a `.jar` file from the command line, this command could look like the following, where *pet-search.jar* is the name of the application jar.

```
$ java -jar -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=9999 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false pet-search.jar
```

**Agent configuration for this solution**

The metrics collected by the agent are defined in the agent configuration. The solution provides agent configurations to collect the recommended metrics with suitable dimensions for the solution’s dashboard. 

The steps for deploying the solution are described later in [Deploy the agent for your solution](#Solution-JVM-Agent-Deploy). The following information is intended to help you understand how to customize the agent configuration for your environment.

You must customize some parts of the following agent configuration for your environment:
+ The JMX port number is the port number that you configured in the previous section of this documentation. It is in the `endpoint` line in the configuration.
+ `ProcessGroupName`– Provide meaningful names for the `ProcessGroupName` dimension. These names should represent the cluster, application, or services grouping for EC2 instances running the same application or process. This helps you to group metrics from instances belonging to the same JVM process group, providing a unified view of cluster, application, and service performance in the solution dashboard.

For example, if you have two Java applications running in the same account, one for the `order-processing` application and another for the `inventory-management` application, you should set the `ProcessGroupName` dimensions accordingly in the agent configuration of each instance.
+ For the `order-processing` application instances, set `ProcessGroupName=order-processing`.
+ For the `inventory-management` application instances, set `ProcessGroupName=inventory-management`.

When you follow these guidelines, the solution dashboard will automatically group the metrics based on the `ProcessGroupName` dimension. The dashboard will include dropdown options to select and view metrics for a specific process group, allowing you to monitor the performance of individual process groups separately.

### Agent configuration for JVM hosts
<a name="Solution-JVM-Agent-Config"></a>

Use the following CloudWatch agent configuration on EC2 instances where your Java applications is deployed. Configuration will be stored as a parameter in SSM's Parameter Store, as detailed later in [Step 2: Store the recommended CloudWatch agent configuration file in Systems Manager Parameter Store](#Solution-JVM-Agent-Step2).

Replace *ProcessGroupName* with the name of your process group. Replace *port-number* with the JMX port of your Java application. If JMX was enabled with password authentication or SSL for remote access, see [Collect Java Management Extensions (JMX) metrics](CloudWatch-Agent-JMX-metrics.md) for information about setting up TLS or authorization in agent configuration as required.

The EC2 metrics shown in this configuration (configuration shown outside the JMX block) only work for Linux and macOS instances. If you are using Windows instances, you can choose to omit these metrics in the configuration. For information about metrics collected on Windows instances, see [Metrics collected by the CloudWatch agent on Windows Server instances](metrics-collected-by-CloudWatch-agent.md#windows-metrics-enabled-by-CloudWatch-agent).

```
{
  "metrics": {
    "namespace": "CWAgent",
    "append_dimensions": {
      "InstanceId": "${aws:InstanceId}"
    },
    "metrics_collected": {
      "jmx": [
        {
          "endpoint": "localhost:port-number",
          "jvm": {
            "measurement": [
              "jvm.classes.loaded",
              "jvm.gc.collections.count",
              "jvm.gc.collections.elapsed",
              "jvm.memory.heap.committed",
              "jvm.memory.heap.max",
              "jvm.memory.heap.used",
              "jvm.memory.nonheap.committed",
              "jvm.memory.nonheap.max",
              "jvm.memory.nonheap.used",
              "jvm.threads.count"
            ]
          },
          "append_dimensions": {
            "ProcessGroupName": "ProcessGroupName"
          }
        }
      ],
      "disk": {
        "measurement": [
          "used_percent"
        ]
      },
      "mem": {
        "measurement": [
          "used_percent"
        ]
      },
      "swap": {
        "measurement": [
          "used_percent"
        ]
      },
      "netstat": {
        "measurement": [
          "tcp_established",
          "tcp_time_wait"
        ]
      }
    }
  }
}
```

## Deploy the agent for your solution
<a name="Solution-JVM-Agent-Deploy"></a>

There are several approaches for installing the CloudWatch agent, depending on the use case. We recommend using Systems Manager for this solution. It provides a console experience and makes it simpler to manage a fleet of managed servers within a single AWS account. The instructions in this section use Systems Manager and are intended for when you don’t have the CloudWatch agent running with existing configurations. You can check whether the CloudWatch agent is running by following the steps in [Verify that the CloudWatch agent is running](troubleshooting-CloudWatch-Agent.md#CloudWatch-Agent-troubleshooting-verify-running).

If you are already running the CloudWatch agent on the EC2 hosts where the workload is deployed and managing the agent configurations, you can skip the instructions in this section and follow your existing deployment mechanism to update the configuration. Be sure to merge the agent configuration of JVM with your existing agent configuration, and then deploy the merged configuration. If you are using Systems Manager to store and manage the configuration for the CloudWatch agent, you can merge the configuration to the existing parameter value. For more information, see [Managing CloudWatch agent configuration files](https://docs.aws.amazon.com/prescriptive-guidance/latest/implementing-logging-monitoring-cloudwatch/create-store-cloudwatch-configurations.html).

**Note**  
Using Systems Manager to deploy the following CloudWatch agent configurations will replace or overwrite any existing CloudWatch agent configuration on your EC2 instances. You can modify this configuration to suit your unique environment or use case. The metrics defined in this solution are the minimum required for the recommended dashboard. 

The deployment process includes the following steps:
+ Step 1: Ensure that the target EC2 instances have the required IAM permissions.
+ Step 2: Store the recommended agent configuration file in the Systems Manager Parameter Store.
+ Step 3: Install the CloudWatch agent on one or more EC2 instances using an CloudFormation stack.
+ Step 4: Verify the agent setup is configured properly.

### Step 1: Ensure the target EC2 instances have the required IAM permissions
<a name="Solution-JVM-Agent-Step1"></a>

You must grant permission for Systems Manager to install and configure the CloudWatch agent. You must also grant permission for the CloudWatch agent to publish telemetry from your EC2 instance to CloudWatch. Make sure that the IAM role attached to the instance has the **CloudWatchAgentServerPolicy** and **AmazonSSMManagedInstanceCore** IAM policies attached.
+ After the role is created, attach the role to your EC2 instances. Follow the steps in [Launch an instance with an IAM role](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html#launch-instance-with-role) to attach a role while launching a new EC2 instance. To attach a role to an existing EC2 instance, follow the steps in [Attach an IAM role to an instance](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html#attach-iam-role).

### Step 2: Store the recommended CloudWatch agent configuration file in Systems Manager Parameter Store
<a name="Solution-JVM-Agent-Step2"></a>

Parameter Store simplifies the installation of the CloudWatch agent on an EC2 instance by securely storing and managing configuration parameters, eliminating the need for hard-coded values. This ensures a more secure and flexible deployment process, enabling centralized management and easier updates to configurations across multiple instances.

Use the following steps to store the recommended CloudWatch agent configuration file as a parameter in Parameter Store.

**To create the CloudWatch agent configuration file as a parameter**

1. Open the AWS Systems Manager console at [https://console.aws.amazon.com/systems-manager/](https://console.aws.amazon.com/systems-manager/).

1. From the navigation pane, choose **Application Management**, **Parameter Store**.

1. Follow these steps to create a new parameter for the configuration.

   1. Choose **Create parameter**.

   1. In the **Name** box, enter a name that you'll use to reference the CloudWatch agent configuration file in later steps. For example, **AmazonCloudWatch-JVM-Configuration**.

   1. (Optional) In the **Description** box, type a description for the parameter.

   1. For **Parameter tier**, choose **Standard**. 

   1. For **Type**, choose **String**.

   1. For **Data type**, choose **text**.

   1. In the **Value** box, paste the corresponding JSON block that was listed in [Agent configuration for JVM hosts](#Solution-JVM-Agent-Config). Be sure to customize the grouping dimension value and port number as described.

   1. Choose **Create parameter**. 

### Step 3: Install the CloudWatch agent and apply the configuration using an CloudFormation template
<a name="Solution-JVM-Agent-Step3"></a>

You can use AWS CloudFormation to install the agent and configure it to use the CloudWatch agent configuration that you created in the previous steps.

**To install and configure the CloudWatch agent for this solution**

1. Open the CloudFormation **Quick create stack** wizard using this link: [ https://console.aws.amazon.com/cloudformation/home?\$1/stacks/quickcreate?templateURL=https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/CloudWatchAgent/CFN/v1.0.0/cw-agent-installation-template-1.0.0.json ](https://console.aws.amazon.com/cloudformation/home?#/stacks/quickcreate?templateURL=https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/CloudWatchAgent/CFN/v1.0.0/cw-agent-installation-template-1.0.0.json). 

1. Verify that the selected Region on the console is the Region where the JVM workload is running.

1. For **Stack name**, enter a name to identity this stack, such as **CWAgentInstallationStack**.

1. In the **Parameters** section, specify the following:

   1. For **CloudWatchAgentConfigSSM**, enter the name of the Systems Manager parameter for the agent configuration that you created earlier, such as **AmazonCloudWatch-JVM-Configuration**.

   1. To select the target instances, you have two options.

      1. For **InstanceIds**, specify a comma-delimited list of instance IDs list of instance IDs where you want to install the CloudWatch agent with this configuration. You can list a single instance or several instances.

      1. If you are deploying at scale, you can specify the **TagKey** and the corresponding **TagValue** to target all EC2 instances with this tag and value. If you specify a **TagKey**, you must specify a corresponding **TagValue**. (For an Auto Scaling group, specify **aws:autoscaling:groupName** for the **TagKey** and specify the Auto Scaling group name for the **TagValue** to deploy to all instances within the Auto Scaling group.)

         If you specify both the **InstanceIds** and the **TagKeys** parameters, the **InstanceIds** will take precedence and the tags will be ignored.

1. Review the settings, then choose **Create stack**. 

If you want to edit the template file first to customize it, choose the **Upload a template file** option under **Create Stack Wizard** to upload the edited template. For more information, see [Creating a stack on CloudFormation console](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-console-create-stack.html). You can use the following link to download the template: [ https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/CloudWatchAgent/CFN/v1.0.0/cw-agent-installation-template-1.0.0.json ](https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/CloudWatchAgent/CFN/v1.0.0/cw-agent-installation-template-1.0.0.json). 

**Note**  
After this step is completed, this Systems Manager parameter will be associated with the CloudWatch agents running in the targeted instances. This means that:  
If the Systems Manager parameter is deleted, the agent will stop.
If the Systems Manager parameter is edited, the configuration changes will automatically apply to the agent at the scheduled frequency which is 30 days by default.
If you want to immediately apply changes to this Systems Manager parameter, you must run this step again. For more information about associations, see [Working with associations in Systems Manager](https://docs.aws.amazon.com/systems-manager/latest/userguide/state-manager-associations.html).

### Step 4: Verify the agent setup is configured properly
<a name="Solution-JVM-Agent-Step4"></a>

You can verify whether the CloudWatch agent is installed by following the steps in [Verify that the CloudWatch agent is running](troubleshooting-CloudWatch-Agent.md#CloudWatch-Agent-troubleshooting-verify-running). If the CloudWatch agent is not installed and running, make sure you have set up everything correctly.
+ Be sure you have attached a role with correct permissions for the EC2 instance as described in [Step 1: Ensure the target EC2 instances have the required IAM permissions](#Solution-JVM-Agent-Step1).
+ Be sure you have correctly configured the JSON for the Systems Manager parameter. Follow the steps in [Troubleshooting installation of the CloudWatch agent with CloudFormation](Install-CloudWatch-Agent-New-Instances-CloudFormation.md#CloudWatch-Agent-CloudFormation-troubleshooting).

If everything is set up correctly, then you should see the JVM metrics being published to CloudWatch. You can check the CloudWatch console to verify they are being published.

**To verify that JVM metrics are being published to CloudWatch**

1. Open the CloudWatch console at [https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/).

1. Choose **Metrics**, **All metrics**.

1. Make sure you've selected the Region where you deployed the solution, and choose **Custom namespaces**, **CWAgent**.

1. Search for the metrics mentioned in [Agent configuration for JVM hosts](#Solution-JVM-Agent-Config), such as `jvm.memory.heap.used`. If you see results for these metrics, then the metrics are being published to CloudWatch.

## Create the JVM solution dashboard
<a name="Solution-JVM-Dashboard"></a>

The dashboard provided by this solution presents metrics for the underlying Java Virtual Machine (JVM) for the server. It offers an overview of the JVM by aggregating and presenting metrics across all instances, providing a high-level summary of the overall health and operational state. Additionally, the dashboard shows a breakdown of the top contributors (top 10 per metric widget) for each metric. This helps you to quickly identify outliers or instances that significantly contribute to the observed metrics.

The solution dashboard doesn't display EC2 metrics. To view EC2 metrics, you'll need to use the EC2 automatic dashboard to see EC2 vended metrics and use the EC2 console dashboard to see EC2 metrics that are collected by the CloudWatch agent. For more information about automatic dashboards for AWS services, see [Viewing a CloudWatch dashboard for a single AWS service](CloudWatch_Automatic_Dashboards_Focus_Service.md).

To create the dashboard, you can use the following options:
+ Use CloudWatch console to create the dashboard.
+ Use AWS CloudFormation console to deploy the dashboard.
+ Download the AWS CloudFormation infrastructure as code and integrate it as part of your continuous integration (CI) automation.

By using the CloudWatch console to create a dashboard, you can preview the dashboard before actually creating and being charged.

**Note**  
 The dashboard created with CloudFormation in this solution displays metrics from the Region where the solution is deployed. Be sure to create the CloudFormation stack in the Region where your JVM metrics are published.   
If CloudWatch agent metrics are getting published to a different namespace than `CWAgent` (for example, if you've provided a customized namespace), you'll have to change the CloudFormation configuration to replace `CWAgent` with the customized namespace you are using.

**To create the dashboard via CloudWatch Console**
**Note**  
Solution dashboards currently display garbage collection-related metrics only for the G1 Garbage Collector, which is the default collector for the latest Java versions. If you are using a different garbage collection algorithm, the widgets pertaining to garbage collection are empty. However, you can customize these widgets by changing the dashboard CloudFormation template and applying the appropriate garbage collection type to the name dimension of the garbage collection-related metrics. For example, if you are using parallel garbage collection, change the **name=\$1"G1 Young Generation\$1"** to **name=\$1"Parallel GC\$1"** of the garbage collection count metric `jvm.gc.collections.count`.

1. Open the CloudWatch Console **Create Dashboard** using this link: [ https://console.aws.amazon.com/cloudwatch/home?\$1dashboards?dashboardTemplate=JvmOnEc2&referrer=os-catalog ](https://console.aws.amazon.com/cloudwatch/home?#dashboards?dashboardTemplate=JvmOnEc2&referrer=os-catalog). 

1. Verify that the selected Region on the console is the Region where the JVM workload is running.

1. Enter the name of the dashboard, then choose **Create Dashboard**.

   To easily differentiate this dashboard from similar dashboards in other Regions, we recommend including the Region name in the dashboard name, such as **JVMDashboard-us-east-1**.

1. Preview the dashboard and choose **Save** to create the dashboard.

**To create the dashboard via CloudFormation**

1. Open the CloudFormation **Quick create stack** wizard using this link: [ https://console.aws.amazon.com/cloudformation/home?\$1/stacks/quickcreate?templateURL=https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/JVM\$1EC2/CloudWatch/CFN/v1.0.0/dashboard-template-1.0.0.json ](https://console.aws.amazon.com/cloudformation/home?#/stacks/quickcreate?templateURL=https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/JVM_EC2/CloudWatch/CFN/v1.0.0/dashboard-template-1.0.0.json). 

1. Verify that the selected Region on the console is the Region where the JVM workload is running.

1. For **Stack name**, enter a name to identity this stack, such as **JVMDashboardStack**.

1. In the **Parameters** section, specify the name of the dashboard under the **DashboardName** parameter.

   To easily differentiate this dashboard from similar dashboards in other Regions, we recommend including the Region name in the dashboard name, such as **JVMDashboard-us-east-1**.

1. Acknowledge access capabilities for transforms under **Capabilities and transforms**. Note that CloudFormation doesn't add any IAM resources.

1. Review the settings, then choose **Create stack**. 

1. After the stack status is **CREATE\$1COMPLETE**, choose the **Resources** tab under the created stack and then choose the link under **Physical ID** to go to the dashboard. You can also access the dashboard in the CloudWatch console by choosing **Dashboards** in the left navigation pane of the console, and finding the dashboard name under **Custom Dashboards**.

If you want to edit the template file to customize it for any purpose, you can use **Upload a template file** option under **Create Stack Wizard** to upload the edited template. For more information, see [Creating a stack on CloudFormation console](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-console-create-stack.html). You can use this link to download the template: [ https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/JVM\$1EC2/CloudWatch/CFN/v1.0.0/dashboard-template-1.0.0.json](https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/JVM_EC2/CloudWatch/CFN/v1.0.0/dashboard-template-1.0.0.json). 

**Note**  
Solution dashboards currently display garbage collection-related metrics only for the G1 Garbage Collector, which is the default collector for the latest Java versions. If you are using a different garbage collection algorithm, the widgets pertaining to garbage collection are empty. However, you can customize these widgets by changing the dashboard CloudFormation template and applying the appropriate garbage collection type to the name dimension of the garbage collection-related metrics. For example, if you are using parallel garbage collection, change the **name=\$1"G1 Young Generation\$1"** to **name=\$1"Parallel GC\$1"** of the garbage collection count metric `jvm.gc.collections.count`. 

### Get started with the JVM dashboard
<a name="Solution-JVM-Dashboard-GetStarted"></a>

Here are a few tasks that you can try out with the new JVM dashboard. These tasks allow you to validate that the dashboard is working correctly and provide you some hands-on experience using it to monitor a JVM process group. As you try these out, you'll get familiar with navigating the dashboard and interpreting the visualized metrics.

**Select a process group**

Use the **JVM Process Group Name** dropdown list to select the process group that you want to monitor. The dashboard automatically updates to display metrics for the selected process group. If you have multiple Java applications or environments, each might be represented as a separate process group. Selecting the appropriate process group ensures that you're viewing metrics specific to the application or environment that you want to analyze.

**Review memory usage**

From the dashboard overview section, find the **Heap Memory Usage Percentage** and **Non-Heap Memory Usage Percentage** widgets. These show the percentage of heap and non-heap memory being used across all JVMs in the selected process group. A high percentage indicates potential memory pressure that could lead to performance issues or `OutOfMemoryError` exceptions. You can also drill down to heap usage by host under **Memory usage by host** to check the hosts with high usage.

**Analyze threads and classes loaded**

In the **Threads and Classes Loaded by Host** section, find the **Top 10 Threads Count** and **Top 10 Classes Loaded** widgets. Look for any JVMs with an abnormally high number of threads or classes compared to others. Too many threads can indicate thread leaks or excessive concurrency, while a large number of loaded classes could point to potential class loader leaks or inefficient dynamic class generation.

**Identify garbage collection issues**

In the **Garbage Collection** section, find the **Top 10 Garbage Collections Invocations Per Minute** and **Top 10 Garbage Collection Duration** widgets for the different garbage collector types: **Young**, **Concurrent**, and **Mixed**. Look for any JVMs that have an unusually high number of collections or long collection durations compared to others. This could indicate configuration issues or memory leaks.

# CloudWatch solution: NGINX workload on Amazon EC2
<a name="Solution-NGINX-On-EC2"></a>

This solution helps you configure out-of-the-box metric collection using CloudWatch agents for NGINX application running on EC2 instances. For general information about all CloudWatch observability solutions, see [CloudWatch observability solutions](Monitoring-Solutions.md).

**Topics**
+ [Requirements](#Solution-NGINX-On-EC2-Requirements)
+ [Benefits](#Solution-NGINX-On-EC2-Benefits)
+ [Costs](#Solution-NGINX-On-EC2-Costs)
+ [CloudWatch agent configuration for this solution](#Solution-NGINX-CloudWatch-Agent)
+ [Deploy the agent for your solution](#Solution-NGINX-Agent-Deploy)
+ [Create the NGINX solution dashboard](#Solution-NGINX-Dashboard)

## Requirements
<a name="Solution-NGINX-On-EC2-Requirements"></a>

This solution is relevant for the following conditions:
+ Supported versions: NGINX version 1.24
+ Compute: Amazon EC2
+ Supports up to 500 EC2 instances across all NGINX workloads in a given AWS Region
+ Latest version of CloudWatch agent
+ Prometheus Exporter: nginxinc/nginx-prometheus-exporter (Apache 2.0 license)
+ SSM agent installed on EC2 instance
**Note**  
AWS Systems Manager (SSM agent) is pre-installed on some [Amazon Machine Images (AMIs)](https://docs.aws.amazon.com/systems-manager/latest/userguide/ami-preinstalled-agent.html) provided by AWS and trusted third-parties. If the agent isn't installed, you can install it manually using the procedure for your operating system type.  
 [Manually installing and uninstalling SSM Agent on EC2 instances for Linux](https://docs.aws.amazon.com/systems-manager/latest/userguide/manually-install-ssm-agent-linux.html) 
 [Manually installing and uninstalling SSM Agent on EC2 instances for macOS](https://docs.aws.amazon.com/systems-manager/latest/userguide/manually-install-ssm-agent-macos.html) 
 [Manually installing and uninstalling SSM Agent on EC2 instances for Windows Server](https://docs.aws.amazon.com/systems-manager/latest/userguide/manually-install-ssm-agent-windows.html) 

## Benefits
<a name="Solution-NGINX-On-EC2-Benefits"></a>

The solution delivers NGINX monitoring, providing valuable insights for the following use cases:
+ Review connection metrics to identify potential bottlenecks, connection issues, or unexpected usage.
+ Analyze HTTP request volume to understand overall traffic load on the NGINX.

Below are the key advantages of the solution:
+ Automates metric collection for NGINX using CloudWatch agent configuration, eliminating manual instrumentation.
+ Provides a pre-configured, consolidated CloudWatch dashboard for NGINX metrics. The dashboard will automatically handle metrics from new NGINX EC2 instances configured using the solution, even if those metrics don't exist when you first create the dashboard.

The following image is an example of the dashboard for this solution.

![\[Example dashboard for NGINX solution.\]](http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/images/NGINXDashboard.png)


## Costs
<a name="Solution-NGINX-On-EC2-Costs"></a>

This solution creates and uses resources in your account. You are charged for standard usage, including the following:
+ All metrics collected by the CloudWatch agent for this solution are published to CloudWatch Logs using the Embedded Metric Format (EMF). These CloudWatch logs are charged based on their volume and retention period. Therefore, you will not be billed for any **PutMetricData** API calls for this solution. The metrics extracted and ingested from your logs are charged as custom metrics. The number of metrics used by this solution depends on the number of EC2 hosts.
  + Each NGINX EC2 host configured for the solution publishes a total of eight metrics.
+ One custom dashboard.

For more information about CloudWatch pricing, see [Amazon CloudWatch Pricing](https://aws.amazon.com/cloudwatch/pricing/).

The pricing calculator can help you estimate approximate monthly costs for using this solution.

**To use the pricing calculator to estimate your monthly solution costs**

1. Open the [Amazon CloudWatch pricing calculator](https://calculator.aws/#/createCalculator/CloudWatch).

1. For **Choose a Region**, select the AWS Region where you would like to deploy the solution.

1. In the **Metrics** section, for **Number of metrics**, enter **8 \$1 number of EC2 instances configured for this solution**.

1. In the **Logs** section, for **Standard Logs: Data Ingested**, enter the estimated daily log volume generated by the CloudWatch Agent across all EC2 hosts. For example, five EC2 instances produce less than 1000 bytes per day. Once set up, you can check your byte usage using the ` IncomingBytes` metric, vended by CloudWatch Logs. Be sure to select the appropriate log group.

1. In the **Logs** section, for **Log Storage/Archival (Standard and Vended Logs)**, select **Yes to Store Logs: Assuming 1 month retention**. Modify this value if you decide to make custom changes to the retention period.

1. In the **Dashboards and Alarms** section, for **Number of Dashboards**, enter **1**.

1. You can see your monthly estimated costs at the bottom of the pricing calculator.

## CloudWatch agent configuration for this solution
<a name="Solution-NGINX-CloudWatch-Agent"></a>

The CloudWatch agent is software that runs continuously and autonomously on your servers and in containerized environments. It collects metrics, logs, and traces from your infrastructure and applications and sends them to CloudWatch and X-Ray.

For more information about the CloudWatch agent, see [Collect metrics, logs, and traces using the CloudWatch agent](Install-CloudWatch-Agent.md).

The agent configuration in this solution collects a set of metrics to help you get started monitoring and observing your NGINX workload. The CloudWatch agent can be configured to collect more NGINX metrics than the dashboard displays by default. For a list of all NGINX metrics that you can collect, see [Metrics for NGINX OSS](https://github.com/nginxinc/nginx-prometheus-exporter?tab=readme-ov-file#exported-metrics).

Before configuring the CloudWatch agent, you must first configure NGINX to expose its metrics. Secondly, you must install and configure the third party Prometheus metric exporter.

### Expose NGINX metrics
<a name="Solution-NGINX-Expose-Metrics"></a>

**Note**  
 The following commands are for Linux. Check [NGINX for Windows page](https://nginx.org/en/docs/windows.html) for equivalent commands in Windows Server. 

You must first enable the `stub_status` module. Add a new location block in your NGINX configuration file. Add the following lines in the `server` block of your `nginx.conf` to enable NGINX's `stub_status` module:

```
location /nginx_status { 
          stub_status on;
          allow 127.0.0.1; # Allow only localhost to access
          deny all; # Deny all other IPs
        }
```

Before reloading NGINX, validate your NGINX configuration:

```
sudo nginx -t
```

This validation command helps to prevent any unforeseen errors, which can cause your website to gown down. The following example demonstrates a successful response:

```
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
```

Once you've successfully validated the updated configuration, reload NGINX (no output is expected):

```
sudo systemctl reload nginx
```

This command instructs the NGINX process to reload the configuration. Reloads are more graceful compared to a full restart. A reload starts the new worker process with a new configuration, gracefully shutting down old worker processes.

Test the NGINX status endpoint:

```
curl http://127.0.0.1/nginx_status
```

The following example demonstrates a successful response:

```
Active connections: 1
server accepts handled requests
6 6 6
Reading: 0 Writing: 1 Waiting: 0
```

The following example demonstrates a failure response (review the previous steps before proceeding):

```
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
  <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
      <head>
          <title>The page is not found</title>
          ...
```

### Configure Prometheus metric exporter
<a name="Solution-NGINX-Configure-Prometheus"></a>

Download the latest NGINX Prometheus exporter release from the [official GitHub repository](https://github.com/nginxinc/nginx-prometheus-exporter/releases). You must download the relevant binary for your platform.

The following example demonstrates commands for AMD64:

```
cd /tmp
wget https://github.com/nginxinc/nginx-prometheus-exporter/releases/download/v1.3.0/nginx-prometheus-exporter_1.3.0_linux_amd64.tar.gz
tar -xzvf nginx-prometheus-exporter_1.3.0_linux_amd64.tar.gz
sudo cp nginx-prometheus-exporter /usr/local/bin/
rm /tmp/nginx-prometheus-exporter*
```

Run the Prometheus exporter and point it to the NGINX stub status page:

```
nohup /usr/local/bin/nginx-prometheus-exporter -nginx.scrape-uri http://127.0.0.1/nginx_status &>/dev/null &
```

The following example demonstrates a response (background job ID and PID):

```
[1] 74699
```

### Test the NGINX Prometheus endpoint
<a name="Solution-NGINX-Test-Prometheus"></a>

Validate that the NGINX Prometheus exporter has started to expose the relevant metrics:

```
curl http://localhost:port-number/metrics
```

The following example demonstrates a successful response:

```
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0
go_gc_duration_seconds{quantile="0.25"} 0
...
# HELP nginx_connections_accepted Accepted client connections
# TYPE nginx_connections_accepted counter
nginx_connections_accepted 14
# HELP nginx_connections_active Active client connections
# TYPE nginx_connections_active gauge
nginx_connections_active 1
...
# TYPE promhttp_metric_handler_requests_total counter
promhttp_metric_handler_requests_total{code="200"} 1
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0
```

### Agent configuration for this solution
<a name="Solution-NGINX-Agent-Config-Intro"></a>

The metrics collected by the agent are defined in the agent configuration. The solution provides agent configurations to collect the recommended metrics with suitable dimensions for the solution's dashboard.

The steps for deploying the solution are described later in [Deploy the agent for your solution](#Solution-NGINX-Agent-Deploy). The following information is intended to help you understand how to customize the agent configuration for your environment.

You must customize some parts of the agent and Prometheus configurations for your environment such as the port number used by the Prometheus exporter.

The port used by the Prometheus exporter can be verified using the following command:

```
sudo netstat -antp | grep nginx-prom
```

The following example demonstrates a response (see port value 9113):

```
tcp6 0 0 :::9113 :::* LISTEN 76398/nginx-prometh
```

### Agent configuration for NGINX hosts
<a name="Solution-NGINX-Agent-Config"></a>

The CloudWatch agent with Prometheus monitoring needs two configurations to scrape the Prometheus metrics. Each configuration will be stored as a separate parameter in SSM's Parameter Store, as detailed later in [Step 2: Store the recommended CloudWatch agent configuration file in Systems Manager Parameter Store](#Solution-NGINX-Agent-Step2).

The first configuration is for the Prometheus exporter, as documented in Prometheus' [ scrape\$1config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config) documentation. The second configuration is for the CloudWatch agent.

 **Prometheus configuration** 

Replace *port-number* with the port of your server.

```
global:
  scrape_interval: 30s
  scrape_timeout: 10s
  
scrape_configs:
- job_name: 'nginx'
  metrics_path: /metrics
  static_configs:
    - targets: ['localhost:port-number']
  ec2_sd_configs:
    - port: port-number
  relabel_configs:
    - source_labels: ['__meta_ec2_instance_id']
      target_label: InstanceId
  metric_relabel_configs:
    - source_labels: ['__name__']
      regex: 'nginx_up|nginx_http_requests_total|nginx_connections_.*'
      action: keep
```

 **CloudWatch agent configuration** 

As per the previous CloudWatch agent configuration, these metrics are published via CloudWatch Logs using the [embedded metric format (EMF)](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Embedded_Metric_Format_Specification.html). These logs are configured to use the log group ` nginx`. You can customize the *log\$1group\$1name* with a different name that represents the CloudWatch logs.

 If you are using Windows Server, set *prometheus\$1config\$1path* in the following configuration to `C:\\ProgramData\\Amazon\\AmazonCloudWatchAgent\\prometheus.yaml`. 

```
{
"agent": {
  "metrics_collection_interval": 60
},
"logs": {
  "metrics_collected": {
      "prometheus": {
          "log_group_name": "nginx",
          "prometheus_config_path": "/opt/aws/amazon-cloudwatch-agent/etc/prometheus.yaml",
          "emf_processor": {
              "metric_declaration_dedup": true,
              "metric_namespace": "CWAgent",
              "metric_declaration":[
                  {
                      "source_labels":["InstanceId"],
                      "metric_selectors":["nginx_up", "nginx_http_requests_total", "nginx_connections*"],
                      "dimensions": [["InstanceId"]]
                  }
              ]
          }
      }
  }
}
}
```

## Deploy the agent for your solution
<a name="Solution-NGINX-Agent-Deploy"></a>

There are several approaches for installing the CloudWatch agent, depending on the use case. We recommend using Systems Manager for this solution. It provides a console experience and makes it simpler to manage a fleet of managed servers within a single AWS account. The instructions in this section use Systems Manager and are intended for when you don't have the CloudWatch agent running with existing configurations. You can check whether the CloudWatch agent is running by following the steps in [Verify that the CloudWatch agent is running](troubleshooting-CloudWatch-Agent.md#CloudWatch-Agent-troubleshooting-verify-running).

If you are already running the CloudWatch agent on the EC2 hosts where the workload is deployed and managing agent configurations, you can skip the instructions in this section and follow your existing deployment mechanism to update the configuration. Be sure to merge new CloudWatch agent and Prometheus configurations with your existing configurations, and then deploy the merged configurations. If you are using Systems Manager to store and manage the configuration for the CloudWatch agent, you can merge the configuration to the existing parameter value. For more information, see [Managing CloudWatch agent configuration files](https://docs.aws.amazon.com/prescriptive-guidance/latest/implementing-logging-monitoring-cloudwatch/create-store-cloudwatch-configurations.html).

**Note**  
Using Systems Manager to deploy the following CloudWatch agent configurations will replace or overwrite any existing CloudWatch agent configuration on your EC2 instances. You can modify this configuration to suit your unique environment or use case. The metrics defined in configuration are the minimum required for the dashboard provided the solution.

The deployment process includes the following steps:
+ Step 1: Ensure that the target EC2 instances have the required IAM permissions.
+ Step 2: Store the recommended agent configuration file in the Systems Manager Parameter Store.
+ Step 3: Install the CloudWatch agent on one or more EC2 instances using an CloudFormation stack.
+ Step 4: Verify the agent setup is configured properly.

### Step 1: Ensure the target EC2 instances have the required IAM permissions
<a name="Solution-NGINX-Agent-Step1"></a>

You must grant permission for Systems Manager to install and configure the CloudWatch agent. You must grant permission for the CloudWatch agent to publish telemetry from your EC2 instance to CloudWatch. You must also grant the CloudWatch agent EC2 read access. EC2 read access is required for the EC2 InstanceId to be added as a metric dimension. This additional requirement is driven by `prometheus.yaml` as detailed above because it uses ` __meta_ec2_instance_id` via EC2 Service Discovery.

Make sure that the IAM role attached to the instance has the ** CloudWatchAgentServerPolicy**, **AmazonSSMManagedInstanceCore**, and ** AmazonEC2ReadOnlyAccess** IAM policies attached.
+ After the role is created, attach the role to your EC2 instances. To attach a role to an EC2 instance, follow the steps in [Attach an IAM role to an instance](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/attach-iam-role.html).

### Step 2: Store the recommended CloudWatch agent configuration file in Systems Manager Parameter Store
<a name="Solution-NGINX-Agent-Step2"></a>

Parameter Store simplifies the installation of the CloudWatch agent on an EC2 instance by securely storing and managing configuration parameters, eliminating the need for hard-coded values. This ensures a more secure and flexible deployment process, enabling centralized management and easier updates to configurations across multiple instances.

Use the following steps to store the recommended CloudWatch agent configuration file as a parameter in Parameter Store.

**To create the CloudWatch agent configuration file as a parameter**

1. Open the AWS Systems Manager console at [https://console.aws.amazon.com/systems-manager/](https://console.aws.amazon.com/systems-manager/).

1. Verify that the selected Region on the console is the Region where NGINX is running.

1. From the navigation pane, choose **Application Management**, **Parameter Store**

1. Follow these steps to create a new parameter for the configuration.

   1. Choose **Create parameter**.

   1. In the **Name** box, enter a name that you'll use to reference the CloudWatch agent configuration file in later steps. For example, ** AmazonCloudWatch-NGINX-CloudWatchAgent-Configuration**.

   1. (Optional) In the **Description** box, type a description for the parameter.

   1. For **Parameter tier**, choose **Standard**.

   1. For **Type**, choose **String**.

   1. For **Data type**, choose **text**.

   1. In the **Value** box, paste the corresponding JSON block that was listed in [Agent configuration for NGINX hosts](#Solution-NGINX-Agent-Config). Be sure to customize as required. For example, the relevant `log_group_name`. 

   1. Choose **Create parameter**.

**To create the Prometheus configuration file as a parameter**

1. Open the AWS Systems Manager console at [https://console.aws.amazon.com/systems-manager/](https://console.aws.amazon.com/systems-manager/).

1. From the navigation pane, choose **Application Management**, **Parameter Store**

1. Follow these steps to create a new parameter for the configuration.

   1. Choose **Create parameter**.

   1. In the **Name** box, enter a name that you'll use to reference the configuration file in later steps. For example, ** AmazonCloudWatch-NGINX-Prometheus-Configuration**.

   1. (Optional) In the **Description** box, type a description for the parameter.

   1. For **Parameter tier**, choose **Standard**.

   1. For **Type**, choose **String**.

   1. For **Data type**, choose **text**.

   1. In the **Value** box, paste the corresponding YAML block that was listed in the [Agent configuration for NGINX hosts](#Solution-NGINX-Agent-Config). Be sure to customize as required. For example, the relevant port number as per `targets`.

   1. Choose **Create parameter**.

### Step 3: Install the CloudWatch agent and apply the configuration using an CloudFormation template
<a name="Solution-NGINX-Agent-Step3"></a>

You can use AWS CloudFormation to install the agent and configure it to use the CloudWatch agent configuration that you created in the previous steps.

**To install and configure the CloudWatch agent for this solution**

1. Open the CloudFormation **Quick create stack** wizard using this link: [ https://console.aws.amazon.com/cloudformation/home?\$1/stacks/quickcreate?templateURL=https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/CloudWatchAgent/CFN/v1.0.0/cw-agent-installation-template-with-prometheus-config-1.0.0.json](https://console.aws.amazon.com/cloudformation/home?#/stacks/quickcreate?templateURL=https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/CloudWatchAgent/CFN/v1.0.0/cw-agent-installation-template-with-prometheus-config-1.0.0.json) .

1. Verify that the selected Region on the console is the Region where the NGINX workload is running.

1. For **Stack name**, enter a name to identity this stack, such as ** CWAgentInstallationStack**.

1. In the **Parameters** section, specify the following:

   1. For **CloudWatchAgentConfigSSM**, enter the name of the AWS Systems Manager parameter for the agent configuration that you created earlier, such as ** AmazonCloudWatch-NGINX-CloudWatchAgent-Configuration**.

   1. For **PrometheusConfigSSM**, enter the name of the AWS Systems Manager parameter for the agent configuration that you created earlier, such as ** AmazonCloudWatch-NGINX-Prometheus-Configuration**.

   1. To select the target instances, you have two options.

      1. For **InstanceIds**, specify a comma-delimited list of instance IDs list of instance IDs where you want to install the CloudWatch agent with this configuration. You can list a single instance or several instances.

      1. If you are deploying at scale, you can specify the ** TagKey** and the corresponding **TagValue** to target all EC2 instances with this tag and value. If you specify a ** TagKey**, you must specify a corresponding **TagValue**. (For an Auto Scaling group, specify **aws:autoscaling:groupName** for the **TagKey** and specify the Auto Scaling group name for the **TagValue** to deploy to all instances within the Auto Scaling group.)

1. Review the settings, then choose **Create stack**.

If you want to edit the template file first to customize it, choose the **Upload a template file** option under **Create Stack Wizard** to upload the edited template. For more information, see [Creating a stack on CloudFormation console](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-console-create-stack.html). You can use the following link to download the template: [https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/CloudWatchAgent/CFN/v1.0.0/cw-agent-installation-template-with-prometheus-config-1.0.0.json]( https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/CloudWatchAgent/CFN/v1.0.0/cw-agent-installation-template-with-prometheus-config-1.0.0.json). 

**Note**  
After this step is completed, this Systems Manager parameter will be associated with the CloudWatch agents running in the targeted instances. This means that:  
If the Systems Manager parameter is deleted, the agent will stop.
If the Systems Manager parameter is edited, the configuration changes will automatically apply to the agent at the scheduled frequency which is 30 days by default.
If you want to immediately apply changes to this Systems Manager parameter, you must run this step again. For more information about associations, see [Working with associations in Systems Manager](https://docs.aws.amazon.com/systems-manager/latest/userguide/state-manager-associations.html).

### Step 4: Verify the agent setup is configured properly
<a name="Solution-NGINX-Agent-Step4"></a>

You can verify whether the CloudWatch agent is installed by following the steps in [Verify that the CloudWatch agent is running](troubleshooting-CloudWatch-Agent.md#CloudWatch-Agent-troubleshooting-verify-running). If the CloudWatch agent is not installed and running, make sure you have set up everything correctly.
+ Be sure you have attached a role with correct permissions for the EC2 instance as described in [Step 1: Ensure the target EC2 instances have the required IAM permissions](#Solution-NGINX-Agent-Step1).
+ Be sure you have correctly configured the JSON for the Systems Manager parameter. Follow the steps in [Troubleshooting installation of the CloudWatch agent with CloudFormation](Install-CloudWatch-Agent-New-Instances-CloudFormation.md#CloudWatch-Agent-CloudFormation-troubleshooting).

If everything is set up correctly, then you should see the NGINX metrics being published to CloudWatch. You can check the CloudWatch console to verify they are being published.

**To verify that NGINX metrics are being published to CloudWatch**

1. Open the CloudWatch console at [https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/).

1. Choose **Metrics**, **All metrics**.

1. Make sure you've selected the Region where you deployed the solution, and choose **Custom namespaces**, **CWAgent**.

1. Search for metrics such as `nginx_http_requests_total`. If you see results for these metrics, then the metrics are being published to CloudWatch.

## Create the NGINX solution dashboard
<a name="Solution-NGINX-Dashboard"></a>

The dashboard provided by this solution presents NGINX workload metrics by aggregating and presenting metrics across all instances. The dashboard shows a breakdown of the top contributors (top 10 per metric widget) for each metric. This helps you to quickly identify outliers or instances that significantly contribute to the observed metrics.

To create the dashboard, you can use the following options:
+ Use CloudWatch console to create the dashboard.
+ Use AWS CloudFormation console to deploy the dashboard.
+ Download the AWS CloudFormation infrastructure as code and integrate it as part of your continuous integration (CI) automation.

By using the CloudWatch console to create a dashboard, you can preview the dashboard before actually creating and being charged.

**Note**  
The dashboard created with CloudFormation in this solution displays metrics from the Region where the solution is deployed. Be sure to create the CloudFormation stack in the Region where your NGINX metrics are published.  
If you've specified a custom namespace other than `CWAgent` in the CloudWatch agent configuration, you'll have to change the CloudFormation template for the dashboard to replace `CWAgent` with the customized namespace you are using.

**To create the dashboard via CloudWatch Console**

1. Open the CloudWatch Console **Create Dashboard** using this link: [ https://console.aws.amazon.com/cloudwatch/home?\$1dashboards?dashboardTemplate=NginxOnEc2&referrer=os-catalog ](https://console.aws.amazon.com/cloudwatch/home?#dashboards?dashboardTemplate=NginxOnEc2&referrer=os-catalog). 

1. Verify that the selected Region on the console is the Region where the NGINX workload is running.

1. Enter the name of the dashboard, then choose **Create Dashboard**.

   To easily differentiate this dashboard from similar dashboards in other Regions, we recommend including the Region name in the dashboard name, such as **NGINXDashboard-us-east-1**.

1. Preview the dashboard and choose **Save** to create the dashboard.

**To create the dashboard via CloudFormation**

1. Open the CloudFormation **Quick create stack** wizard using this link: [ https://console.aws.amazon.com/cloudformation/home?\$1/stacks/quickcreate?templateURL=https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/NGINX\$1EC2/CloudWatch/CFN/v1.0.0/dashboard-template-1.0.0.json](https://console.aws.amazon.com/cloudformation/home?#/stacks/quickcreate?templateURL=https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/NGINX_EC2/CloudWatch/CFN/v1.0.0/dashboard-template-1.0.0.json) .

1. Verify that the selected Region on the console is the Region where the NGINX workload is running.

1. For **Stack name**, enter a name to identity this stack, such as ** NGINXDashboardStack**.

1. In the **Parameters** section, specify the name of the dashboard under the **DashboardName** parameter.

1. To easily differentiate this dashboard from similar dashboards in other Regions, we recommend including the Region name in the dashboard name, such as ** NGINXDashboard-us-east-1**.

1. Acknowledge access capabilities for transforms under **Capabilities and transforms**. Note that CloudFormation doesn't add any IAM resources.

1. Review the settings, then choose **Create stack**.

1. After the stack status is **CREATE\$1COMPLETE**, choose the ** Resources** tab under the created stack and then choose the link under **Physical ID** to go to the dashboard. You can also access the dashboard in the CloudWatch console by choosing **Dashboards** in the left navigation pane of the console, and finding the dashboard name under **Custom Dashboards**.

If you want to edit the template file first to customize it, choose the **Upload a template file** option under **Create Stack Wizard** to upload the edited template. For more information, see [Creating a stack on CloudFormation console](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-console-create-stack.html). You can use the following link to download the template: [https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/NGINX\$1EC2/CloudWatch/CFN/v1.0.0/dashboard-template-1.0.0.json]( https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/NGINX_EC2/CloudWatch/CFN/v1.0.0/dashboard-template-1.0.0.json). 

### Get started with the NGINX dashboard
<a name="Solution-NGINX-Dashboard-GetStarted"></a>

Here are a few tasks that you can try out with the new NGINX dashboard. These tasks allow you to validate that the dashboard is working correctly and provide you some hands-on experience using it to monitor a NGINX workload. As you try these out, you'll get familiar with navigating the dashboard and interpreting the visualized metrics.

 **Review connection metrics** 

In the **Connections** section, you can find several key metrics that provide insights into the client connection handling of your NGINX server. Monitoring these connection metrics can help you identify potential bottlenecks, connection issues, or unexpected connection patterns.
+ Accepted client connections
+ Active client connections
+ Handled client connections
+ Connections reading requests
+ Idle client connections
+ Connections writing responses

 **Analyze HTTP request volume** 

The `request` metric in the **HTTP Requests** section shows the total number of HTTP requests handled by the NGINX server. Tracking this metric over time can help you understand the overall traffic load on your NGINX infrastructure and plan for resource allocation and scaling accordingly.

# CloudWatch solution: NVIDIA GPU workload on Amazon EC2
<a name="Solution-NVIDIA-GPU-On-EC2"></a>

This solution helps you configure out-of-the-box metric collection using CloudWatch agents for NVIDIA GPU workloads running on EC2 instances. Additionally, it helps you set up a pre-configured CloudWatch dashboard. For general information about all CloudWatch observability solutions, see [CloudWatch observability solutions](Monitoring-Solutions.md). 

**Topics**
+ [Requirements](#Solution-NVIDIA-GPU-On-EC2-Requirements)
+ [Benefits](#Solution-NVIDIA-GPU-On-EC2-Benefits)
+ [CloudWatch agent configuration for this solution](#Solution-NVIDIA-GPU-CloudWatch-Agent)
+ [Deploy the agent for your solution](#Solution-NVIDIA-GPU-Agent-Deploy)
+ [Create the NVIDIA GPU solution dashboard](#Solution-NVIDIA-GPU-Dashboard)

## Requirements
<a name="Solution-NVIDIA-GPU-On-EC2-Requirements"></a>

This solution is relevant for the following conditions:
+ Compute: Amazon EC2
+ Supports up to 500 GPUs across all EC2 instances in a given AWS Region
+ Latest version of CloudWatch agent
+ SSM agent installed on EC2 instance
+ The EC2 instance must have an NVIDIA driver installed. NVIDIA drivers are pre-installed on some Amazon Machine Images (AMIs). Otherwise, you can manually install the driver. For more information, see [Install NVIDIA drivers on Linux instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/install-nvidia-driver.html).

**Note**  
AWS Systems Manager (SSM agent) is pre-installed on some [Amazon Machine Images (AMIs)](https://docs.aws.amazon.com/systems-manager/latest/userguide/ami-preinstalled-agent.html) provided by AWS and trusted third-parties. If the agent isn't installed, you can install it manually using the procedure for your operating system type.  
[ Manually installing and uninstalling SSM Agent on EC2 instances for Linux](https://docs.aws.amazon.com/systems-manager/latest/userguide/manually-install-ssm-agent-linux.html)
[ Manually installing and uninstalling SSM Agent on EC2 instances for macOS](https://docs.aws.amazon.com/systems-manager/latest/userguide/manually-install-ssm-agent-macos.html)
[ Manually installing and uninstalling SSM Agent on EC2 instances for Windows Server](https://docs.aws.amazon.com/systems-manager/latest/userguide/manually-install-ssm-agent-windows.html)

## Benefits
<a name="Solution-NVIDIA-GPU-On-EC2-Benefits"></a>

The solution delivers NVIDIA monitoring, providing valuable insights for the following use cases:
+ Analyze GPU and memory usage for performance bottlenecks or the need for additional resources.
+ Monitor temperature and power draw to ensure GPUs operate within safe limits.
+ Evaluate encoder performance for GPU video workloads.
+ Verify PCIe connectivity for expected generation and width.
+ Monitor GPU clock speeds to detect scaling and throttling issues.

Below are the key advantages of the solution:
+ Automates metric collection for NVIDIA using CloudWatch agent configuration, eliminating manual instrumentation.
+ Provides a pre-configured, consolidated CloudWatch dashboard for NVIDIA metrics. The dashboard will automatically handle metrics from new NVIDIA EC2 instances configured using the solution, even if those metrics don't exist when you first create the dashboard.

The following image is an example of the dashboard for this solution.

![\[Example dashboard for NVIDIA GPU solution.\]](http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/images/NVIDIADashboard.png)


### Costs
<a name="Solution-NVIDIA-GPU-On-EC2-Costs"></a>

This solution creates and uses resources in your account. You are charged for standard usage, including the following:
+ All metrics collected by the CloudWatch agent are charged as custom metrics. The number of metrics used by this solution depends on the number of EC2 hosts.
  + Each EC2 host configured for the solution publishes a total of 17 metrics per GPU.
+ One custom dashboard.
+ API operations requested by the CloudWatch agent to publish the metrics. With the default configuration for this solution, the CloudWatch agent calls the **PutMetricData** once every minute for each EC2 host. This means the **PutMetricData** API will be called `30*24*60=43,200` in a 30-day month for each EC2 host.

For more information about CloudWatch pricing, see [Amazon CloudWatch Pricing](https://aws.amazon.com/cloudwatch/pricing/).

The pricing calculator can help you estimate approximate monthly costs for using this solution.

**To use the pricing calculator to estimate your monthly solution costs**

1. Open the [Amazon CloudWatch pricing calculator](https://calculator.aws/#/createCalculator/CloudWatch).

1. For **Choose a Region**, select the Region where you would like to deploy the solution.

1. In the **Metrics** section, for **Number of metrics**, enter **17 \$1 average number of GPUs per EC2 host \$1 number of EC2 instances configured for this solution**.

1. In the **APIs** section, for **Number of API requests**, enter **43200 \$1 number of EC2 instances configured for this solution**.

1. By default, the CloudWatch agent performs one **PutMetricData** operation each minute for each EC2 host.

1. In the **Dashboards and Alarms** section, for **Number of Dashboards**, enter **1**.

1. You can see your monthly estimated costs at the bottom of the pricing calculator.

## CloudWatch agent configuration for this solution
<a name="Solution-NVIDIA-GPU-CloudWatch-Agent"></a>

The CloudWatch agent is software that runs continuously and autonomously on your servers and in containerized environments. It collects metrics, logs, and traces from your infrastructure and applications and sends them to CloudWatch and X-Ray.

For more information about the CloudWatch agent, see [Collect metrics, logs, and traces using the CloudWatch agent](Install-CloudWatch-Agent.md).

The agent configuration in this solution collects a set of metrics to help you get started monitoring and observing your NVIDIA GPU. The CloudWatch agent can be configured to collect more NVIDIA GPU metrics than the dashboard displays by default. For a list of all NVIDIA GPU metrics that you can collect, see [Collect NVIDIA GPU metrics](CloudWatch-Agent-NVIDIA-GPU.md).

### Agent configuration for this solution
<a name="Solution-NVIDIA-GPU-Agent-Config"></a>

The metrics collected by the agent are defined in the agent configuration. The solution provides agent configurations to collect the recommended metrics with suitable dimensions for the solution's dashboard.

Use the following CloudWatch agent configuration on EC2 instances with NVIDIA GPUs. Configuration will be stored as a parameter in SSM's Parameter Store, as detailed later in [Step 2: Store the recommended CloudWatch agent configuration file in Systems Manager Parameter Store](#Solution-NVIDIA-GPU-Agent-Step2).

```
{
    "metrics": {
        "namespace": "CWAgent",
        "append_dimensions": {
            "InstanceId": "${aws:InstanceId}"
        },
        "metrics_collected": {
            "nvidia_gpu": {
                "measurement": [
                    "utilization_gpu",
                    "temperature_gpu",
                    "power_draw",
                    "utilization_memory",
                    "fan_speed",
                    "memory_total",
                    "memory_used",
                    "memory_free",
                    "pcie_link_gen_current",
                    "pcie_link_width_current",
                    "encoder_stats_session_count",
                    "encoder_stats_average_fps",
                    "encoder_stats_average_latency",
                    "clocks_current_graphics",
                    "clocks_current_sm",
                    "clocks_current_memory",
                    "clocks_current_video"
                ],
                "metrics_collection_interval": 60
            }
        }
    },
    "force_flush_interval": 60
}
```

## Deploy the agent for your solution
<a name="Solution-NVIDIA-GPU-Agent-Deploy"></a>

There are several approaches for installing the CloudWatch agent, depending on the use case. We recommend using Systems Manager for this solution. It provides a console experience and makes it simpler to manage a fleet of managed servers within a single AWS account. The instructions in this section use Systems Manager and are intended for when you don't have the CloudWatch agent running with existing configurations. You can check whether the CloudWatch agent is running by following the steps in [Verify that the CloudWatch agent is running](troubleshooting-CloudWatch-Agent.md#CloudWatch-Agent-troubleshooting-verify-running).

If you are already running the CloudWatch agent on the EC2 hosts where the workload is deployed and managing agent configurations, you can skip the instructions in this section and follow your existing deployment mechanism to update the configuration. Be sure to merge the agent configuration of NVIDIA GPU with your existing agent configuration, and then deploy the merged configuration. If you are using Systems Manager to store and manage the configuration for the CloudWatch agent, you can merge the configuration to the existing parameter value. For more information, see [Managing CloudWatch agent configuration files](https://docs.aws.amazon.com/prescriptive-guidance/latest/implementing-logging-monitoring-cloudwatch/create-store-cloudwatch-configurations.html).

**Note**  
Using Systems Manager to deploy the following CloudWatch agent configurations will replace or overwrite any existing CloudWatch agent configuration on your EC2 instances. You can modify this configuration to suit your unique environment or use case. The metrics defined in configuration are the minimum required for the dashboard provided the solution.

The deployment process includes the following steps:
+ Step 1: Ensure that the target EC2 instances have the required IAM permissions.
+ Step 2: Store the recommended agent configuration file in the Systems Manager Parameter Store.
+ Step 3: Install the CloudWatch agent on one or more EC2 instances using an CloudFormation stack.
+ Step 4: Verify the agent setup is configured properly.

### Step 1: Ensure the target EC2 instances have the required IAM permissions
<a name="Solution-NVIDIA-GPU-Agent-Step1"></a>

You must grant permission for Systems Manager to install and configure the CloudWatch agent. You must also grant permission for the CloudWatch agent to publish telemetry from your EC2 instance to CloudWatch. Make sure that the IAM role attached to the instance has the **CloudWatchAgentServerPolicy** and **AmazonSSMManagedInstanceCore** IAM policies attached.
+ After the role is created, attach the role to your EC2 instances. To attach a role to an EC2 instance, follow the steps in [Attach an IAM role to an instance](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/attach-iam-role.html).

### Step 2: Store the recommended CloudWatch agent configuration file in Systems Manager Parameter Store
<a name="Solution-NVIDIA-GPU-Agent-Step2"></a>

Parameter Store simplifies the installation of the CloudWatch agent on an EC2 instance by securely storing and managing configuration parameters, eliminating the need for hard-coded values. This ensures a more secure and flexible deployment process, enabling centralized management and easier updates to configurations across multiple instances.

Use the following steps to store the recommended CloudWatch agent configuration file as a parameter in Parameter Store.

**To create the CloudWatch agent configuration file as a parameter**

1. Open the AWS Systems Manager console at [https://console.aws.amazon.com/systems-manager/](https://console.aws.amazon.com/systems-manager/).

1. Verify that the selected Region on the console is the Region where the NVIDIA GPU workload is running.

1. From the navigation pane, choose **Application Management**, **Parameter Store**.

1. Follow these steps to create a new parameter for the configuration.

   1. Choose **Create parameter**.

   1. In the **Name** box, enter a name that you'll use to reference the CloudWatch agent configuration file in later steps. For example, **AmazonCloudWatch-NVIDIA-GPU-Configuration**.

   1. (Optional) In the **Description** box, type a description for the parameter.

   1. For **Parameter tier**, choose **Standard**.

   1. For **Type**, choose **String**.

   1. For **Data type**, choose **text**.

   1. In the **Value** box, paste the corresponding JSON block that was listed in [Agent configuration for this solution](#Solution-NVIDIA-GPU-Agent-Config).

   1. Choose **Create parameter**.

### Step 3: Install the CloudWatch agent and apply the configuration using an CloudFormation template
<a name="Solution-NVIDIA-GPU-Agent-Step3"></a>

You can use AWS CloudFormation to install the agent and configure it to use the CloudWatch agent configuration that you created in the previous steps.

**To install and configure the CloudWatch agent for this solution**

1. Open the CloudFormation **Quick create stack** wizard using this link: [ https://console.aws.amazon.com/cloudformation/home?\$1/stacks/quickcreate?templateURL=https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/CloudWatchAgent/CFN/v1.0.0/cw-agent-installation-template-1.0.0.json](https://console.aws.amazon.com/cloudformation/home?#/stacks/quickcreate?templateURL=https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/CloudWatchAgent/CFN/v1.0.0/cw-agent-installation-template-1.0.0.json).

1. Verify that the selected Region on the console is the Region where the NVIDIA GPU workload is running.

1. For **Stack name**, enter a name to identity this stack, such as **CWAgentInstallationStack**.

1. In the **Parameters** section, specify the following:

   1. For **CloudWatchAgentConfigSSM**, enter the name of the Systems Manager parameter for the agent configuration that you created earlier, such as **AmazonCloudWatch-NVIDIA-GPU-Configuration**.

   1. To select the target instances, you have two options.

      1. For **InstanceIds**, specify a comma-delimited list of instance IDs list of instance IDs where you want to install the CloudWatch agent with this configuration. You can list a single instance or several instances.

      1. If you are deploying at scale, you can specify the **TagKey** and the corresponding **TagValue** to target all EC2 instances with this tag and value. If you specify a **TagKey**, you must specify a corresponding **TagValue**. (For an Auto Scaling group, specify **aws:autoscaling:groupName** for the **TagKey** and specify the Auto Scaling group name for the **TagValue** to deploy to all instances within the Auto Scaling group.)

1. Review the settings, then choose **Create stack**.

If you want to edit the template file first to customize it, choose the **Upload a template file** option under **Create Stack Wizard** to upload the edited template. For more information, see [Creating a stack on CloudFormation console](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-console-create-stack.html).

**Note**  
After this step is completed, this Systems Manager parameter will be associated with the CloudWatch agents running in the targeted instances. This means that:  
If the Systems Manager parameter is deleted, the agent will stop.
If the Systems Manager parameter is edited, the configuration changes will automatically apply to the agent at the scheduled frequency which is 30 days by default.
If you want to immediately apply changes to this Systems Manager parameter, you must run this step again. For more information about associations, see [Working with associations in Systems Manager](https://docs.aws.amazon.com/systems-manager/latest/userguide/state-manager-associations.html).

### Step 4: Verify the agent setup is configured properly
<a name="Solution-NVIDIA-GPU-Agent-Step4"></a>

You can verify whether the CloudWatch agent is installed by following the steps in [Verify that the CloudWatch agent is running](troubleshooting-CloudWatch-Agent.md#CloudWatch-Agent-troubleshooting-verify-running). If the CloudWatch agent is not installed and running, make sure you have set up everything correctly.
+ Be sure you have attached a role with correct permissions for the EC2 instance as described in [Step 1: Ensure the target EC2 instances have the required IAM permissions](#Solution-NVIDIA-GPU-Agent-Step1).
+ Be sure you have correctly configured the JSON for the Systems Manager parameter. Follow the steps in [Troubleshooting installation of the CloudWatch agent with CloudFormation](Install-CloudWatch-Agent-New-Instances-CloudFormation.md#CloudWatch-Agent-CloudFormation-troubleshooting).

If everything is set up correctly, then you should see the NVIDIA GPU metrics being published to CloudWatch. You can check the CloudWatch console to verify they are being published.

**To verify that NVIDIA GPU metrics are being published to CloudWatch**

1. Open the CloudWatch console at [https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/).

1. Choose **Metrics**, **All metrics**.

1. Make sure you've selected the Region where you deployed the solution, and choose **Custom namespaces**, **CWAgent**.

1. Search for the metrics mentioned in [Agent configuration for this solution](#Solution-NVIDIA-GPU-Agent-Config), such as `nvidia_smi_utilization_gpu`. If you see results for these metrics, then the metrics are being published to CloudWatch.

## Create the NVIDIA GPU solution dashboard
<a name="Solution-NVIDIA-GPU-Dashboard"></a>

The dashboard provided by this solution presents NVIDIA GPUs metrics by aggregating and presenting metrics across all instances. The dashboard shows a breakdown of the top contributors (top 10 per metric widget) for each metric. This helps you to quickly identify outliers or instances that significantly contribute to the observed metrics.

To create the dashboard, you can use the following options:
+ Use CloudWatch console to create the dashboard.
+ Use AWS CloudFormation console to deploy the dashboard.
+ Download the AWS CloudFormation infrastructure as code and integrate it as part of your continuous integration (CI) automation.

By using the CloudWatch console to create a dashboard, you can preview the dashboard before actually creating and being charged.

**Note**  
The dashboard created with CloudFormation in this solution displays metrics from the Region where the solution is deployed. Be sure to create the CloudFormation stack in the Region where your NVIDIA GPU metrics are published.  
If you've specified a custom namespace other than CWAgent in the CloudWatch agent configuration, you'll have to change the CloudFormation template for the dashboard to replace CWAgent with the customized namespace you are using.

**To create the dashboard via CloudWatch Console**

1. Open the CloudWatch Console **Create Dashboard** using this link: [ https://console.aws.amazon.com/cloudwatch/home?\$1dashboards?dashboardTemplate=NvidiaGpuOnEc2&referrer=os-catalog ](https://console.aws.amazon.com/cloudwatch/home?#dashboards?dashboardTemplate=NvidiaGpuOnEc2&referrer=os-catalog). 

1. Verify that the selected Region on the console is the Region where the NVIDIA GPU workload is running.

1. Enter the name of the dashboard, then choose **Create Dashboard**.

   To easily differentiate this dashboard from similar dashboards in other Regions, we recommend including the Region name in the dashboard name, such as **NVIDIA-GPU-Dashboard-us-east-1**.

1. Preview the dashboard and choose **Save** to create the dashboard.

**To create the dashboard via CloudFormation**

1. Open the CloudFormation **Quick create stack** wizard using this link: [ https://console.aws.amazon.com/cloudformation/home?\$1/stacks/quickcreate?templateURL=https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/NVIDIA\$1GPU\$1EC2/CloudWatch/CFN/v1.0.0/dashboard-template-1.0.0.json](https://console.aws.amazon.com/cloudformation/home?#/stacks/quickcreate?templateURL=https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/NVIDIA_GPU_EC2/CloudWatch/CFN/v1.0.0/dashboard-template-1.0.0.json).

1. Verify that the selected Region on the console is the Region where the NVIDIA GPU workload is running.

1. For **Stack name**, enter a name to identity this stack, such as **NVIDIA-GPU-DashboardStack**.

1. In the **Parameters** section, specify the name of the dashboard under the **DashboardName** parameter.

1. To easily differentiate this dashboard from similar dashboards in other Regions, we recommend including the Region name in the dashboard name, such as **NVIDIA-GPU-Dashboard-us-east-1**.

1. Acknowledge access capabilities for transforms under **Capabilities and transforms**. Note that CloudFormation doesn't add any IAM resources.

1. Review the settings, then choose **Create stack**.

1. After the stack status is **CREATE\$1COMPLETE**, choose the ** Resources** tab under the created stack and then choose the link under **Physical ID** to go to the dashboard. You can also access the dashboard in the CloudWatch console by choosing **Dashboards** in the left navigation pane of the console, and finding the dashboard name under **Custom Dashboards**.

If you want to edit the template file to customize it for any purpose, you can use **Upload a template file** option under **Create Stack Wizard** to upload the edited template. For more information, see [Creating a stack on CloudFormation console](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-console-create-stack.html). You can use this link to download the template: [ https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/NVIDIA\$1GPU\$1EC2/CloudWatch/CFN/v1.0.0/dashboard-template-1.0.0.json](https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/NVIDIA_GPU_EC2/CloudWatch/CFN/v1.0.0/dashboard-template-1.0.0.json) .

### Get started with the NVIDIA GPU dashboard
<a name="Solution-NVIDIA-GPU-Dashboard-GetStarted"></a>

Here are a few tasks that you can try out with the new NVIDIA GPU dashboard. These tasks allow you to validate that the dashboard is working correctly and provide you some hands-on experience using it to monitor your NVIDIA GPUs. As you try these out, you'll get familiar with navigating the dashboard and interpreting the visualized metrics.

**Review GPU utilization**

From the **Utilization** section, find the **GPU Utilization** and **Memory Utilization** widgets. These show the percentage of time the GPU is being actively used for computations and the percentage of global memory being read or written, respectively. High utilization could indicate potential performance bottlenecks or the need for additional GPU resources.

**Analyze GPU memory usage**

In the **Memory** section, find the **Total Memory**, **Used Memory**, and **Free Memory** widgets. These provide insights into the overall memory capacity of the GPUs and how much memory is currently being consumed or available. Memory pressure could lead to performance issues or out-of-memory errors, so it's important to monitor these metrics and ensure sufficient memory is available for your workloads.

**Monitor temperature and power draw**

In the **Temperature / Power** section, find the **GPU Temperature** and **Power Draw** widgets. These metrics are essential for ensuring that your GPUs are operating within safe thermal and power limits.

**Identify encoder performance**

In the **Encoder** section, find the **Encoder Session Count**, **Average FPS**, and **Average Latency** widgets. These metrics are relevant if you're running video encoding workloads on your GPUs. Monitor these metrics to ensure that your encoders are performing optimally and identify any potential bottlenecks or performance issues.

**Check PCIe link status**

In the **PCIe** section, find the **PCIe Link Generation** and **PCIe Link Width** widgets. These metrics provide information about the PCIe link connecting the GPU to the host system. Ensure that the link is operating at the expected generation and width to avoid potential performance limitations due to PCIe bottlenecks.

**Review GPU clocks**

In the **Clock** section, find the **Graphics Clock**, **SM Clock**, **Memory Clock**, and **Video Clock** widgets. These metrics show the current operating frequencies of various GPU components. Monitoring these clocks can help identify potential issues with GPU clock scaling or frequency throttling, which could impact performance.

# CloudWatch solution: Kafka workload on Amazon EC2
<a name="Solution-Kafka-On-EC2"></a>

This solution helps you configure out-of-the-box metric collection using CloudWatch agents for Kafka workloads (brokers, producers, and consumers) running on EC2 instances. Additionally, it helps you set up a pre-configured CloudWatch dashboard. For general information about all CloudWatch observability solutions, see [CloudWatch observability solutions](Monitoring-Solutions.md).

**Topics**
+ [Requirements](#Solution-Kafka-On-EC2-Requirements)
+ [Benefits](#Solution-Kafka-On-EC2-Benefits)
+ [Costs](#Solution-Kafka-On-EC2-Costs)
+ [CloudWatch agent configuration for this solution](#Solution-Kafka-CloudWatch-Agent)
+ [Deploy the agent for your solution](#Solution-Kafka-Agent-Deploy)
+ [Create the Kafka solution dashboard](#Solution-Kafka-Dashboard)
+ [Configure the agent for multiple Kafka roles on the same instance](#Kafka-Multiple-Roles)

## Requirements
<a name="Solution-Kafka-On-EC2-Requirements"></a>

This solution is relevant for the following conditions:
+ Workload: Kafka v0.8.2.x and later
+ Compute: Amazon EC2
+ Supports up to 500 EC2 instances across all Kafka workloads in a given AWS Region
+ Latest version of CloudWatch agent
+ SSM agent installed on EC2 instance
**Note**  
AWS Systems Manager (SSM agent) is pre-installed on some [Amazon Machine Images (AMIs)](https://docs.aws.amazon.com/systems-manager/latest/userguide/ami-preinstalled-agent.html) provided by AWS and trusted third-parties. If the agent isn't installed, you can install it manually using the procedure for your operating system type.  
[Manually installing and uninstalling SSM Agent on EC2 instances for Linux](https://docs.aws.amazon.com/systems-manager/latest/userguide/manually-install-ssm-agent-linux.html)
[Manually installing and uninstalling SSM Agent on EC2 instances for macOS](https://docs.aws.amazon.com/systems-manager/latest/userguide/manually-install-ssm-agent-macos.html)
[Manually installing and uninstalling SSM Agent on EC2 instances for Windows Server](https://docs.aws.amazon.com/systems-manager/latest/userguide/manually-install-ssm-agent-windows.html)

## Benefits
<a name="Solution-Kafka-On-EC2-Benefits"></a>

The solution delivers Kafka server monitoring, providing valuable insights for the following use cases:
+ Monitor Kafka cluster health via replication and sync metrics.
+ Track broker performance through request failures and latencies along with network traffic.
+ Monitor producer/consumer errors, latencies, and consumer lag.
+ Analyze underlying JVM performance for Kafka clusters.
+ Switch between multiple Kafka clusters, producers, and consumers configured via the solution under the same account.

Below are the key advantages of the solution:
+ Automates metric collection for Kafka and the underlying JVM using CloudWatch agent configuration, eliminating manual instrumentation.
+ Provides a pre-configured, consolidated CloudWatch dashboard for Kafka and JVM metrics. The dashboard will automatically handle metrics from new Kafka EC2 instances configured using the solution, even if those metrics don't exist when you first create the dashboard. It also allows you to group the metrics into logical applications for easier focus and management.

The following image is an example of the dashboard for this solution.

![\[Kafka cluster dashboard showing metrics for partitions, producer/consumer performance, and broker status.\]](http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/images/KafkaDashboard.png)


## Costs
<a name="Solution-Kafka-On-EC2-Costs"></a>

This solution creates and uses resources in your account. You are charged for standard usage, including the following:
+ All metrics collected by the CloudWatch agent are charged as custom metrics. The number of metrics used by this solution depends on the number of EC2 hosts.
  + Each broker host configured for the solution publishes 33 metrics plus one metric (`disk_used_percent`) for which the metric count for each EC2 host depends on the number of disk paths for that host.
  + Each producer host configured for the solution publishes three metrics with the `topic` dimension and three metrics without the `topic` dimension. For the metrics with the `topic` dimension, each topic counts as a separate metric.
  + Each consumer host configured for the solution publishes two metrics with `topic` dimensions and three metrics without `topic` dimensions. For the metrics with topic dimensions, each topic counts as a separate metric.
+ One custom dashboard.
+ API operations requested by the CloudWatch agent to publish the metrics. With the default configuration for this solution, the CloudWatch agent calls the **PutMetricData** once every minute for each EC2 host. This means the **PutMetricData** API will be called `30*24*60=43,200` in a 30-day month for each EC2 host.

For more information about CloudWatch pricing, see [Amazon CloudWatch Pricing](https://aws.amazon.com/cloudwatch/pricing/).

The pricing calculator can help you estimate approximate monthly costs for using this solution. 

**To use the pricing calculator to estimate your monthly solution costs**

1. Open the [Amazon CloudWatch pricing calculator](https://calculator.aws/#/createCalculator/CloudWatch).

1. In the **Metrics** section, for **Number of metrics**, enter **broker\$1metrics\$1count \$1 producer\$1metrics\$1count \$1 consumer\$1metrics\$1count**. Calculate these as follows:
   + `broker_metrics_count` = (33 \$1 average number of disk paths per EC2 host) \$1 number\$1of\$1ec2\$1broker\$1hosts 
   + `producer_metrics_count` = (3 \$1 average\$1number\$1of\$1topics\$1per\$1producer\$1host \$1 3) \$1 number\$1of\$1ec2\$1producer\$1hosts 
   + `consumer_metrics_count` = (2 \$1 average\$1number\$1of\$1topics\$1per\$1consumer\$1host \$1 3) \$1 number\$1of\$1ec2\$1consumer\$1hosts 

1. In the **APIs** section, for **Number of API requests**, enter **43200 \$1 number of EC2 instances configured for this solution**.

   By default, the CloudWatch agent performs one **PutMetricData** operation each minute for each EC2 host.

1. In the **Dashboards and Alarms** section, for **Number of Dashboards**, enter **1**.

1. You can see your monthly estimated costs at the bottom of the pricing calculator.

## CloudWatch agent configuration for this solution
<a name="Solution-Kafka-CloudWatch-Agent"></a>

The CloudWatch agent is software that runs continuously and autonomously on your servers and in containerized environments. It collects metrics, logs, and traces from your infrastructure and applications and sends them to CloudWatch and X-Ray.

For more information about the CloudWatch agent, see [Collect metrics, logs, and traces using the CloudWatch agent](Install-CloudWatch-Agent.md).

The agent configuration in this solution collects the foundational metrics for Kafka, JVM, and EC2. The CloudWatch agent can be configured to collect more Kafka and JVM metrics than the dashboard displays by default. For a list of all Kafka metrics that you can collect, see [Collect Kafka metrics](CloudWatch-Agent-JMX-metrics.md#CloudWatch-Agent-Kafka-metrics). For a list of all JVM metrics that you can collect, see [Collect JVM metrics](CloudWatch-Agent-JMX-metrics.md#CloudWatch-Agent-JVM-metrics). For a list of EC2 metrics, see [Metrics collected by the CloudWatch agent on Linux and macOS instances](metrics-collected-by-CloudWatch-agent.md#linux-metrics-enabled-by-CloudWatch-agent).

**Expose JMX ports for the Kafka broker, producer, and consumer roles**

The CloudWatch agent relies on JMX to collect the metrics related to the Kafka brokers, producers, and consumers. To make this possible, you must expose the JMX port on your servers and applications.

For Kafka brokers, you must use the `JMX_PORT` environment variable to set the port. You'll have to restart the brokers after you set this environment variable. Review the starting scripts and configuration files of your application to find the best place to add these arguments.

For example, for Linux and macOS systems, you can use the following command to set the JMX port. Be sure to specify an unused port number.

```
export JMX_PORT=port-number
```

For Kafka producers and consumers, instructions for exposing the JMX port depend on the workload type you are using for your producer or consumer JVM application. See the documentation for your application to find these instructions.

In general, to enable a JMX port for monitoring and management, you would set the following system properties for your JVM application. The following example sets up unauthenticated JMX. If your security policies/requirements require you to enable JMX with password authentication or SSL for remote access, refer to the [JMX documentation ](https://docs.oracle.com/en/java/javase/17/management/monitoring-and-management-using-jmx-technology.html) to set the required property.

```
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=port-number
-Dcom.sun.management.jmxremote.authenticate=false 
-Dcom.sun.management.jmxremote.ssl=false
```

To verify the JMX port, run `ps aux | grep jmxremote.port`. The results should show that the JMX port was set on the JVM processes.

**Agent configuration for this solution**

The metrics collected by the agent are defined in the agent configuration. The solution provides agent configurations to collect the recommended metrics with suitable dimensions for the solution’s dashboard. Each Kafka role, such as broker, producer, or consumer, has its own agent configuration that enables the collection of Kafka metrics and underlying JVM and EC2 metrics.

The steps for deploying the solution are described later in [Deploy the agent for your solution](#Solution-Kafka-Agent-Deploy). The following information is intended to help you understand how to customize the agent configuration for your environment.

You must customize some parts of the following agent configuration for your environment:
+ The JMX port number is the port number that you configured in the previous section of this documentation. The port number is in the `endpoint` line in the configuration.
+ `ClusterName`– This is used as a dimension for broker metrics collected. Provide a meaningful name that represents the cluster grouping for the instances that run the Kafka broker.
+ `ProcessGroupName`– This is used as a dimension for JVM metrics collected for brokers. Provide the same value as you provide for `ClusterName`. This enables viewing the JVM metrics of the same Kafka broker group as the broker metrics in the solution dashboard.
+ `ProducerGroupName`– This is used as a dimension for producer metrics collected. Provide a meaningful name that represents the group of producer instances. For this value, you can specify your producer application or service that you want to use for a combined view of producer metrics in the solution dashboard.
+ `ConsumerGroupName`– This is used as a dimension for consumer metrics collected. Provide a meaningful name that represents the group of consumer instances This is not the same as the consumer group concept in Kafka. This is just a grouping dimension where you can specify your consumer application or service that you want to use for a combined view of consumer metrics in the solution dashboard 

For example, if you have two Kafka clusters running in the same account, one for the `order-processing` application and another for the `inventory-management` application, you should set the `ClusterName` and `ProcessGroupName` dimensions accordingly in the agent configuration of the broker instance.
+ For the `order-processing` cluster broker instances, set `ClusterName=order-processing` and `ProcessGroupName=order-processing`.
+ For the `inventory-management` cluster broker instances, set `ClusterName=inventory-management` and `ProcessGroupName=inventory-management`. 
+ Similarly, set the `ProducerGroupName` for producer instances and `ConsumerGroupName` for consumer instances based on their respective applications.

When you correctly set the above dimensions, the solution dashboard will automatically group the metrics based on the `ClusterName`, `ProducerGroupName`, and `ConsumerGroupName` dimensions. The dashboard will include dropdown options to select and view metrics for specific clusters and groups, allowing you to monitor the performance of individual clusters and groups separately.

Be sure to deploy the relevant agent configuration to the correct EC2 instances. Each configuration will be stored as a separate Parameter in SSM's Parameter Store, as detailed later in [Step 2: Store the recommended CloudWatch agent configuration file in Systems Manager Parameter Store](#Solution-Kafka-Agent-Step2).

The following instructions describe the situation where the producer, consumer, and broker roles are deployed to separate EC2 instances, without any overlap. If you are running multiple Kafka roles on the same EC2 instances, see [Configure the agent for multiple Kafka roles on the same instance](#Kafka-Multiple-Roles) for more information.

### Agent configuration for Kafka broker agents
<a name="Solution-Kafka-Agent-Broker"></a>

Use the following CloudWatch agent configuration on EC2 instances where Kafka broker agents are deployed. Replace *ClusterName* with the name of the cluster to use to group these metrics for a unified view. The value you specify for *ClusterName* is used as both the `ClusterName` dimension and the `ProcessGroupName` dimension. Replace *port-number* with the JMX port of your Kafka server. If JMX was enabled with password authentication or SSL for remote access, see [Collect Java Management Extensions (JMX) metrics](CloudWatch-Agent-JMX-metrics.md) for information about setting up TLS or authorization as required.

The EC2 metrics shown in this configuration (configuration shown outside the JMX block) only work for Linux and macOS instances. If you are using Windows instances, you can choose to omit these metrics in the configuration. For information about metrics collected on Windows instances, see [Metrics collected by the CloudWatch agent on Windows Server instances](metrics-collected-by-CloudWatch-agent.md#windows-metrics-enabled-by-CloudWatch-agent).

```
{
  "metrics": {
    "namespace": "CWAgent",
    "append_dimensions": {
      "InstanceId": "${aws:InstanceId}"
    },
    "metrics_collected": {
      "jmx": [
        {
          "endpoint": "localhost:port-number",
          "kafka": {
            "measurement": [
              "kafka.request.time.avg",
              "kafka.request.failed",
              "kafka.request.count",
              "kafka.purgatory.size",
              "kafka.partition.under_replicated",
              "kafka.partition.offline",
              "kafka.network.io",
              "kafka.leader.election.rate",
              "kafka.isr.operation.count"
            ]
          },
          "append_dimensions": {
            "ClusterName": "ClusterName"
          }
        },
        {
          "endpoint": "localhost:port-number",
          "jvm": {
            "measurement": [
              "jvm.classes.loaded",
              "jvm.gc.collections.count",
              "jvm.gc.collections.elapsed",
              "jvm.memory.heap.committed",
              "jvm.memory.heap.max",
              "jvm.memory.heap.used",
              "jvm.memory.nonheap.committed",
              "jvm.memory.nonheap.max",
              "jvm.memory.nonheap.used",
              "jvm.threads.count"
            ]
          },
          "append_dimensions": {
            "ProcessGroupName": "ClusterName"
          }
        }
      ],
      "disk": {
        "measurement": [
          "used_percent"
        ]
      },
      "mem": {
        "measurement": [
          "used_percent"
        ]
      },
      "swap": {
        "measurement": [
          "used_percent"
        ]
      },
      "netstat": {
        "measurement": [
          "tcp_established",
          "tcp_time_wait"
        ]
      }
    }
  }
}
```

### Agent configuration for Kafka producers
<a name="Solution-Kafka-Agent-Producer"></a>

Use the following CloudWatch agent configuration on Amazon EC2 instances where Kafka producers are deployed. Replace *ProducerGroupName* with the name of the application or group that you want to use to group your metrics for a unified view. Replace *port-number* with the JMX port of your Kafka producer application.

 The solution doesn’t enable JVM metrics for Kafka producers because the solution dashboard doesn’t display JVM metrics related to JVM for producers. You can customize the agent configuration to emit JVM metrics as well, however, JVM metrics related to producers are not visible on the solution dashboard.

```
{
  "metrics": {
    "namespace": "CWAgent",
    "append_dimensions": {
      "InstanceId": "${aws:InstanceId}"
    },
    "metrics_collected": {
      "jmx": [
        {
          "endpoint": "localhost:port-number",
          "kafka-producer": {
            "measurement": [
              "kafka.producer.request-rate",
              "kafka.producer.byte-rate",
              "kafka.producer.request-latency-avg",
              "kafka.producer.response-rate",
              "kafka.producer.record-error-rate",
              "kafka.producer.record-send-rate"
            ]
          },
          "append_dimensions": {
            "ProducerGroupName": "ProducerGroupName"
          }
        }
      ]
    }
  }
}
```

### Agent configuration for Kafka consumers
<a name="Solution-Kafka-Agent-Consumer"></a>

Use the following CloudWatch agent configuration on EC2 instances where Kafka consumers are running. Replace *ConsumerGroupName* with the name of the application or group to use to group these metrics for a unified view. Replace *port-number* with the JMX port of your Kafka consumer application.

The solution doesn’t enable JVM metrics for Kafka consumers because the solution dashboard doesn’t display JVM metrics related to JVM for consumers. You can customize the agent configuration to emit JVM metrics as well, however JVM metrics related to consumer are not visible on the solution dashboard.

```
{
  "metrics": {
    "append_dimensions": {
      "InstanceId": "${aws:InstanceId}"
    },
    "metrics_collected": {
      "jmx": [
        {
          "endpoint": "localhost:port-number",
          "kafka-consumer": {
            "measurement": [
              "kafka.consumer.fetch-rate",
              "kafka.consumer.total.bytes-consumed-rate",
              "kafka.consumer.records-consumed-rate",
              "kafka.consumer.bytes-consumed-rate",
              "kafka.consumer.records-lag-max"
            ]
          },
          "append_dimensions": {
            "ConsumerGroupName": "ConsumerGroupName"
          }
        }
      ]
    }
  }
}
```

## Deploy the agent for your solution
<a name="Solution-Kafka-Agent-Deploy"></a>

There are several approaches for installing the CloudWatch agent, depending on the use case. We recommend using Systems Manager for this solution. It provides a console experience and makes it simpler to manage a fleet of managed servers within a single AWS account. The instructions in this section use Systems Manager and are intended for when you don’t have the CloudWatch agent running with existing configurations. You can check whether the CloudWatch agent is running by following the steps in [Verify that the CloudWatch agent is running](troubleshooting-CloudWatch-Agent.md#CloudWatch-Agent-troubleshooting-verify-running).

If you are already running the CloudWatch agent on the EC2 hosts where the workload is deployed and managing the agent configurations, you can skip the instructions in this section and follow your existing deployment mechanism to update the configuration. Be sure to merge the agent configuration according to the role (broker, producer, or consumer) with your existing agent configuration, and then deploy the merged configuration. If you are using Systems Manager to store and manage the configuration for the CloudWatch agent, you can merge the configuration to the existing parameter value. For more information, see [Managing CloudWatch agent configuration files](https://docs.aws.amazon.com/prescriptive-guidance/latest/implementing-logging-monitoring-cloudwatch/create-store-cloudwatch-configurations.html).

**Note**  
Using Systems Manager to deploy the following CloudWatch agent configurations will replace or overwrite any existing CloudWatch agent configuration on your EC2 instances. You can modify this configuration to suit your unique environment or use case. The metrics defined in this solution are the minimum required for the recommended dashboard. 

The deployment process includes the following steps:
+ Step 1: Ensure that the target EC2 instances have the required IAM permissions.
+ Step 2: Store the recommended agent configuration file in the Systems Manager Parameter Store.
+ Step 3: Install the CloudWatch agent on one or more EC2 instances using an CloudFormation stack.
+ Step 4: Verify the agent setup is configured properly.

You must repeat these steps based on whether your broker, producer, and consumer are deployed on the same EC2 instance or different instances. For example, if the Kafka broker, producer, and consumers are getting deployed on separate instances without overlap, you must repeat these steps three times with the appropriate agent configurations for broker, producer, and consumer EC2 instances.

### Step 1: Ensure the target EC2 instances have the required IAM permissions
<a name="Solution-Kafka-Agent-Step1"></a>

You must grant permission for Systems Manager to install and configure the CloudWatch agent. You must also grant permission for the CloudWatch agent to publish telemetry from your EC2 instance to CloudWatch. Make sure that the IAM role attached to the instance has the **CloudWatchAgentServerPolicy** and **AmazonSSMManagedInstanceCore** IAM policies attached.
+ After the role is created, attach the role to your EC2 instances. Follow the steps in [Launch an instance with an IAM role](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html#launch-instance-with-role) to attach a role while launching a new EC2 instance. To attach a role to an existing EC2 instance, follow the steps in [Attach an IAM role to an instance](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html#attach-iam-role).

### Step 2: Store the recommended CloudWatch agent configuration file in Systems Manager Parameter Store
<a name="Solution-Kafka-Agent-Step2"></a>

Parameter Store simplifies the installation of the CloudWatch agent on an EC2 instance by securely storing and managing configuration parameters, eliminating the need for hard-coded values. This ensures a more secure and flexible deployment process, enabling centralized management and easier updates to configurations across multiple instances.

Use the following steps to store the recommended CloudWatch agent configuration file as a parameter in Parameter Store.

**To create the CloudWatch agent configuration file as a parameter**

1. Open the AWS Systems Manager console at [https://console.aws.amazon.com/systems-manager/](https://console.aws.amazon.com/systems-manager/).

1. From the navigation pane, choose **Application Management**, **Parameter Store**.

1. Follow these steps to create a new parameter for the configuration.

   1. Choose **Create parameter**.

   1. Provide a name for the parameter that will store your CloudWatch agent configuration, such as **AmazonCloudWatch-Kafka-Producer-Configuration** for producers, **AmazonCloudWatch-Kafka-Consumer-Configuration** for consumers, or **AmazonCloudWatch-Kafka-Broker-Configuration** for brokers. If you have multiple Kafka roles on a single EC2, name the roles accordingly for easier identification. This value will later be used to distribute this configuration to the agent running on your EC2 instance.

   1. For **Parameter tier**, choose **Standard**. 

   1. For **Type**, choose **String**.

   1. For **Data type**, choose **text**.

   1. In the **Value** box, paste the full text of the CloudWatch agent configuration. Be sure to select the JSON block for the Kafka role that this instance is hosting. Refer to the configuration provided in [Agent configuration for Kafka broker agents](#Solution-Kafka-Agent-Broker), [Agent configuration for Kafka producers](#Solution-Kafka-Agent-Producer), and [Agent configuration for Kafka consumers](#Solution-Kafka-Agent-Consumer) when storing the configuration for broker, producer, and consumer respectively. If you are running multiple Kafka roles on the same EC2 instance, be sure to merge the configuration if required as described in [Configure the agent for multiple Kafka roles on the same instance](#Kafka-Multiple-Roles) on the same instance

   1. Choose **Create parameter**. 

### Step 3: Install the CloudWatch agent and apply the configuration using an CloudFormation template
<a name="Solution-Kafka-Agent-Step3"></a>

You can use AWS CloudFormation to install the agent and configure it to use the CloudWatch agent configuration that you created in the previous steps.

**To install and configure the CloudWatch agent for this solution**

1. Open the CloudFormation **Quick create stack** wizard using this link: [https://console.aws.amazon.com/cloudformation/home?\$1/stacks/quickcreate?templateURL=https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/CloudWatchAgent/CFN/v1.0.0/cw-agent-installation-template-1.0.0.json](https://console.aws.amazon.com/cloudformation/home?#/stacks/quickcreate?templateURL=https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/CloudWatchAgent/CFN/v1.0.0/cw-agent-installation-template-1.0.0.json). 

1. Verify that the selected Region on the console is the Region where the Kafka workload is running.

1. For **Stack name**, enter a name to identity this stack, such as **CWAgentInstallationStack**.

1. In the **Parameters** section, specify the following:

   1. For **CloudWatchAgentConfigSSM**, enter the name of the Systems Manager parameter for the agent configuration that you created earlier, such as **AmazonCloudWatch-Kafka-Broker-Configuration** for brokers, **AmazonCloudWatch-Kafka-Producer-Configuration** for producers, and **AmazonCloudWatch-Kafka-Consumer-Configuration** for consumers.

   1. To select the target instances, you have two options.

      1. For **InstanceIds**, specify a comma-delimited list of instance IDs list of instance IDs where you want to install the CloudWatch agent with this configuration. You can list a single instance or several instances.

      1. If you are deploying at scale, you can specify the **TagKey** and the corresponding **TagValue** to target all EC2 instances with this tag and value. If you specify a **TagKey**, you must specify a corresponding **TagValue**. (For an Auto Scaling group, specify **aws:autoscaling:groupName** for the **TagKey** and specify the Auto Scaling group name for the **TagValue** to deploy to all instances within the Auto Scaling group.)

         If you specify both the **InstanceIds** and the **TagKeys** parameters, the **InstanceIds** will take precedence and the tags will be ignored.

1. Review the settings, then choose **Create stack**. 

If you want to edit the template file first to customize it, choose the **Upload a template file** option under **Create Stack Wizard** to upload the edited template. For more information, see [Creating a stack on CloudFormation console](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-console-create-stack.html). You can use the following link to download the template: [https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/CloudWatchAgent/CFN/v1.0.0/cw-agent-installation-template-1.0.0.json](https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/CloudWatchAgent/CFN/v1.0.0/cw-agent-installation-template-1.0.0.json). 

**Note**  
After this step is completed, this Systems Manager parameter will be associated with the CloudWatch agents running in the targeted instances. This means that:  
If the Systems Manager parameter is deleted, the agent will stop.
If the Systems Manager parameter is edited, the configuration changes will automatically apply to the agent at the scheduled frequency which is 30 days by default.
If you want to immediately apply changes to this Systems Manager parameter, you must run this step again. For more information about associations, see [Working with associations in Systems Manager](https://docs.aws.amazon.com/systems-manager/latest/userguide/state-manager-associations.html).

### Step 4: Verify the agent setup is configured properly
<a name="Solution-Kafka-Agent-Step4"></a>

You can verify whether the CloudWatch agent is installed by following the steps in [Verify that the CloudWatch agent is running](troubleshooting-CloudWatch-Agent.md#CloudWatch-Agent-troubleshooting-verify-running). If the CloudWatch agent is not installed and running, make sure you have set up everything correctly.
+ Be sure you have attached a role with correct permissions for the EC2 instance as described in [Step 1: Ensure the target EC2 instances have the required IAM permissions](Solution-Tomcat-On-EC2.md#Solution-Tomcat-Agent-Step1).
+ Be sure you have correctly configured the JSON for the Systems Manager parameter. Follow the steps in [Troubleshooting installation of the CloudWatch agent with CloudFormation](Install-CloudWatch-Agent-New-Instances-CloudFormation.md#CloudWatch-Agent-CloudFormation-troubleshooting).

If everything is set up correctly, then you should see the Kafka metrics being published to CloudWatch. You can check the CloudWatch console to verify they are being published.

**To verify that Kafka metrics are being published to CloudWatch**

1. Open the CloudWatch console at [https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/).

1. Choose **Metrics**, **All metrics**.

1. Make sure you've selected the Region where you deployed the solution, and choose **Custom namespaces**, **CWAgent**.

1. Search for the metrics mentioned in the agent configuration section of this document, such as `kafka.partition.offline` for brokers, `kafka.consumer.fetch.rate` for consumers, or `kafka.producer.request-rate` for producers. If you see results for these metrics, then the metrics are being published to CloudWatch.

## Create the Kafka solution dashboard
<a name="Solution-Kafka-Dashboard"></a>

This dashboard displays the newly emitted metrics for both Kafka and the underlying JVM. This dashboard provides a top contributor view for the health of your Kafka workload, across producers, brokers, and consumers. The top contributor view displays the top 10 per metric widget. This allows you to identify outliers at a glance. 

The solution dashboard doesn't display EC2 metrics. To view EC2 metrics, you'll need to use the EC2 automatic dashboard to see EC2 vended metrics and use the EC2 console dashboard to see EC2 metrics that are collected by the CloudWatch agent. For more information about automatic dashboards for AWS services, see [Viewing a CloudWatch dashboard for a single AWS service](CloudWatch_Automatic_Dashboards_Focus_Service.md).

To create the dashboard, you can use the following options:
+ Use CloudWatch console to create the dashboard.
+ Use AWS CloudFormation console to deploy the dashboard.
+ Download the AWS CloudFormation infrastructure as code and integrate it as part of your continuous integration (CI) automation.

By using the CloudWatch console to create a dashboard, you can preview the dashboard before actually creating and being charged.

**Note**  
The dashboard created with CloudFormation in this solution displays metrics from the Region where the solution is deployed. Be sure to create the CloudFormation stack in the Region where your JVM and Kafka metrics are published.  
If you've specified a custom namespace other than `CWAgent` in the CloudWatch agent configuration, you'll have to change the CloudFormation template for the dashboard to replace `CWAgent` with the customized namespace you are using.

**To create the dashboard via CloudWatch Console**
**Note**  
Solution dashboards currently display garbage collection-related metrics only for the G1 Garbage Collector, which is the default collector for the latest Java versions. If you are using a different garbage collection algorithm, the widgets pertaining to garbage collection are empty. However, you can customize these widgets by changing the dashboard CloudFormation template and applying the appropriate garbage collection type to the name dimension of the garbage collection-related metrics. For example, if you are using parallel garbage collection, change the **name=\$1"G1 Young Generation\$1"** to **name=\$1"Parallel GC\$1"** of the garbage collection count metric `jvm.gc.collections.count`.

1. Open the CloudWatch Console **Create Dashboard** using this link: [ https://console.aws.amazon.com/cloudwatch/home?\$1dashboards?dashboardTemplate=ApacheKafkaOnEc2&referrer=os-catalog ](https://console.aws.amazon.com/cloudwatch/home?#dashboards?dashboardTemplate=ApacheKafkaOnEc2&referrer=os-catalog). 

1. Verify that the selected Region on the console is the Region where the Kafka workload is running.

1. Enter the name of the dashboard, then choose **Create Dashboard**.

   To easily differentiate this dashboard from similar dashboards in other Regions, we recommend including the Region name in the dashboard name, such as **KafkaDashboard-us-east-1**.

1. Preview the dashboard and choose **Save** to create the dashboard.

**To create the dashboard via CloudFormation**

1. Open the CloudFormation **Quick create stack** wizard using this link: [https://console.aws.amazon.com/cloudformation/home?\$1/stacks/quickcreate?templateURL=https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/Kafka\$1EC2/CloudWatch/CFN/v1.0.0/dashboard-template-1.0.0.json](https://console.aws.amazon.com/cloudformation/home?#/stacks/quickcreate?templateURL=https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/Kafka_EC2/CloudWatch/CFN/v1.0.0/dashboard-template-1.0.0.json). 

1. Verify that the selected Region on the console is the Region where the Kafka workload is running.

1. For **Stack name**, enter a name to identity this stack, such as **KafkaDashboardStack**.

1. In the **Parameters** section, specify the name of the dashboard under the **DashboardName** parameter.

   To easily differentiate this dashboard from similar dashboards in other Regions, we recommend including the Region name in the dashboard name, such as **KafkaDashboard-us-east-1**.

1. Acknowledge access capabilities for transforms under **Capabilities and transforms**. Note that CloudFormation doesn't add any IAM resources.

1. Review the settings, then choose **Create stack**. 

1. After the stack status is **CREATE\$1COMPLETE**, choose the **Resources** tab under the created stack and then choose the link under **Physical ID** to go to the dashboard. You can also access the dashboard in the CloudWatch console by choosing **Dashboards** in the left navigation pane of the console, and finding the dashboard name under **Custom Dashboards**.

If you want to edit the template file to customize it for any purpose, you can use **Upload a template file** option under **Create Stack Wizard** to upload the edited template. For more information, see [Creating a stack on CloudFormation console](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-console-create-stack.html). You can use this link to download the template: [https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/Kafka\$1EC2/CloudWatch/CFN/v1.0.0/dashboard-template-1.0.0.json](https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/Kafka_EC2/CloudWatch/CFN/v1.0.0/dashboard-template-1.0.0.json). 

**Note**  
Solution dashboards currently display garbage collection-related metrics only for the G1 Garbage Collector, which is the default collector for the latest Java versions. If you are using a different garbage collection algorithm, the widgets pertaining to garbage collection are empty. However, you can customize these widgets by changing the dashboard CloudFormation template and applying the appropriate garbage collection type to the name dimension of the garbage collection-related metrics. For example, if you are using parallel garbage collection, change the **name=\$1"G1 Young Generation\$1"** to **name=\$1"Parallel GC\$1"** of the garbage collection count metric `jvm.gc.collections.count`. 

### Get started with the Kafka dashboard
<a name="Solution-Kafka-Dashboard-GetStarted"></a>

Here are a few tasks that you can try out with the new Kafka dashboard. These tasks allow you to validate that the dashboard is working correctly and provide you some hands-on experience using it to monitor a Kafka cluster. As you try these out, you'll get familiar with navigating the dashboard and interpreting the visualized metrics.

**Using the dropdown lists**

The dashboard provides dropdown lists at the top that you can use to filter and select the specific Kafka cluster, producer, and consumer groups that you want to monitor.
+ To display metrics for a specific Kafka cluster, select that cluster name in the **Kafka Cluster** dropdown list. 
+ To display metrics for a specific Kafka producer group, select that producer group name in the **Kafka Producer** dropdown list. 
+ To display metrics for a specific Kafka consumer group, select that consumer group name in the **Kafka Consumer Group** dropdown list. 

**Verify cluster health**

From the **Cluster Overview** section, find the **Partitions Under Replicated** and **In-Sync Replicas** widgets. These should ideally be zero or a small number. A large value for any of these metrics could indicate issues with the Kafka cluster that need investigation. 

**Investigate broker performance**

In the **Brokers** section, find the **Failed Fetch Requests** and **Failed Producer Requests** widgets. These show the number of failed requests for fetch and produce operations, respectively. High failure rates could indicate issues with the brokers or network connectivity that require further investigation.

**Monitor producer performance**

In the **Producer Group Overview** section, find the **Average Request Rate**, **Average Request Latency**, and **Average Record Send/Error Rate** widgets. These will give you an overview of how the producers in the selected group are performing. You can also drill down to view metrics for specific producers and topics in the **Producers** section.

**Monitor consumer lag**

In the **Consumer Group Overview** section, find the **Consumer Lag** widget. This shows how far behind the consumers are in processing messages from the latest offsets in the partitions they are subscribed to. Ideally, the consumer lag should be low or zero. A high consumer lag could indicate that the consumers are unable to keep up with the rate of data production, leading to potential data loss or delays in processing. You can also drill down to view metrics for specific consumers and topics in the **Consumers** section.

## Configure the agent for multiple Kafka roles on the same instance
<a name="Kafka-Multiple-Roles"></a>

The individual configurations for Kafka roles listed in [CloudWatch agent configuration for this solution](#Solution-Kafka-CloudWatch-Agent) apply only when the producer, consumer, and broker roles are deployed on separate EC2 instances, without any overlap. If you are running multiple Kafka roles on the same Amazon EC2 instances, you have two options:
+ Create a single agent configuration file which lists and configures all metrics for all the Kafka roles deployed on that instance. If you are going to use Systems Manager to manage agent configuration, this is the preferred option.

  If you choose this option and the multiple Kafka roles are part of the same JVM process, you must specify the same endpoint for each Kafka role in the agent configuration. If the multiple Kafka roles are part of different JVM processes, the endpoint for each role can be different depending on the JMX port set for that process.
+ Create separate agent configuration files for each Kafka role, and configure the agent to apply both configuration files. For instructions for applying multiple configuration files, see [Creating multiple CloudWatch agent configuration files](create-cloudwatch-agent-configuration-file.md#CloudWatch-Agent-multiple-config-files).

The following example shows a CloudWatch agent configuration where the producer and consumer roles are running on one instance as part of the same JVM process. In this case, the port number must be the same in both the producer and consumer parts of the configuration below. If instead the two roles were running as part of different JVM processes, you could specify different port numbers for each, according to the JMX port of each individual JVM process.

```
{
  "metrics": {
    "namespace": "CWAgent",
    "append_dimensions": {
      "InstanceId": "${aws:InstanceId}"
    },
    "metrics_collected": {
      "jmx": [
        {
          "endpoint": "localhost:port-number",
          "kafka-producer": {
            "measurement": [
              "kafka.producer.request-rate",
              "kafka.producer.byte-rate",
              "kafka.producer.request-latency-avg",
              "kafka.producer.response-rate",
              "kafka.producer.record-error-rate",
              "kafka.producer.record-send-rate"
            ]
          },
          "append_dimensions": {
            "ProducerGroupName": "ProducerGroupName"
          }
        },
        {
          "endpoint": "localhost:port-number",
          "kafka-consumer": {
            "measurement": [
              "kafka.consumer.fetch-rate",
              "kafka.consumer.total.bytes-consumed-rate",
              "kafka.consumer.records-consumed-rate",
              "kafka.consumer.bytes-consumed-rate",
              "kafka.consumer.records-lag-max"
            ]
          },
          "append_dimensions": {
            "ConsumerGroupName": "ConsumerGroupName"
          }
        }
      ]
    }
  }
}
```

# CloudWatch solution: Tomcat workload on Amazon EC2
<a name="Solution-Tomcat-On-EC2"></a>

This solution helps you configure out-of-the-box metric collection using CloudWatch agents for Tomcat server running on EC2 instances. Additionally, it helps you set up a pre-configured CloudWatch dashboard. For general information about all CloudWatch observability solutions, see [CloudWatch observability solutions](Monitoring-Solutions.md).

**Topics**
+ [Requirements](#Solution-Tomcat-On-EC2-Requirements)
+ [Benefits](#Solution-Tomcat-On-EC2-Benefits)
+ [Costs](#Solution-Tomcat-On-EC2-Costs)
+ [CloudWatch agent configuration for this solution](#Solution-Tomcat-CloudWatch-Agent)
+ [Deploy the agent for your solution](#Solution-Tomcat-Agent-Deploy)
+ [Create the Tomcat solution dashboard](#Solution-Tomcat-Dashboard)

## Requirements
<a name="Solution-Tomcat-On-EC2-Requirements"></a>

This solution is relevant for the following conditions:
+ Supported versions: Tomcat versions 9, 10.1, and 11 (beta)
+ Compute: Amazon EC2
+ Supports up to 500 EC2 instances across all Tomcat workloads in a given AWS Region
+ Latest version of CloudWatch agent
+ SSM agent installed on EC2 instance
**Note**  
AWS Systems Manager (SSM agent) is pre-installed on some [Amazon Machine Images (AMIs)](https://docs.aws.amazon.com/systems-manager/latest/userguide/ami-preinstalled-agent.html) provided by AWS and trusted third-parties. If the agent isn't installed, you can install it manually using the procedure for your operating system type.  
[Manually installing and uninstalling SSM Agent on EC2 instances for Linux](https://docs.aws.amazon.com/systems-manager/latest/userguide/manually-install-ssm-agent-linux.html)
[Manually installing and uninstalling SSM Agent on EC2 instances for macOS](https://docs.aws.amazon.com/systems-manager/latest/userguide/manually-install-ssm-agent-macos.html)
[Manually installing and uninstalling SSM Agent on EC2 instances for Windows Server](https://docs.aws.amazon.com/systems-manager/latest/userguide/manually-install-ssm-agent-windows.html)

## Benefits
<a name="Solution-Tomcat-On-EC2-Benefits"></a>

The solution delivers Tomcat server monitoring, providing valuable insights for the following use cases:
+ Detect Tomcat server errors and performance issues.
+ Monitor network traffic for data transfer problems.
+ Track thread usage and active user sessions.
+ Analyze underlying JVM performance for Tomcat server.

Below are the key advantages of the solution:
+ Automates metric collection for Apache Tomcat and the underlying JVM using CloudWatch agent configuration, eliminating manual instrumentation.
+ Provides a pre-configured, consolidated CloudWatch dashboard for Apache Tomcat and JVM metrics. The dashboard will automatically handle metrics from new Tomcat EC2 instances configured using the solution, even if those metrics don't exist when you first create the dashboard. It also allows you to group the metrics into logical applications for easier focus and management.

The following image is an example of the dashboard for this solution.

![\[Example dashboard for Apache Tomcat solution.\]](http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/images/TomcatDashboard.png)


## Costs
<a name="Solution-Tomcat-On-EC2-Costs"></a>

This solution creates and uses resources in your account. You are charged for standard usage, including the following:
+ All metrics collected by the CloudWatch agent are charged as custom metrics. The number of metrics used by this solution depends on the number of EC2 hosts.
  + Each Tomcat host configured for the solution publishes a total of 27 metrics plus one metric (`disk_used_percent`) for which the metric count depends on number of disk paths for that host.
+ One custom dashboard.
+ API operations requested by the CloudWatch agent to publish the metrics. With the default configuration for this solution, the CloudWatch agent calls the **PutMetricData** once every minute. This means the **PutMetricData** API will be called `30*24*60=43,200` in a 30-day month for each EC2 host.

For more information about CloudWatch pricing, see [Amazon CloudWatch Pricing](https://aws.amazon.com/cloudwatch/pricing/).

The pricing calculator can help you estimate approximate monthly costs for using this solution. 

**To use the pricing calculator to estimate your monthly solution costs**

1. Open the [Amazon CloudWatch pricing calculator](https://calculator.aws/#/createCalculator/CloudWatch).

1. In the **Metrics** section, for **Number of metrics**, enter **(27 \$1 average number of disk paths per EC2 host) \$1 number of EC2 instances configured for this solution**.

1. In the **APIs** section, for **Number of API requests**, enter **43200 \$1 number of EC2 instances configured for this solution**.

   By default, the solution performs one **PutMetricData** operation each minute for each EC2 host.

1. In the **Dashboards and Alarms** section, for **Number of Dashboards**, enter **1**.

1. You can see your monthly estimated costs at the bottom of the pricing calculator.

## CloudWatch agent configuration for this solution
<a name="Solution-Tomcat-CloudWatch-Agent"></a>

The CloudWatch agent is software that runs continuously and autonomously on your servers and in containerized environments. It collects metrics, logs, and traces from your infrastructure and applications and sends them to CloudWatch and X-Ray.

For more information about the CloudWatch agent, see [Collect metrics, logs, and traces using the CloudWatch agent](Install-CloudWatch-Agent.md).

The agent configuration in this solution collects the foundational metrics for Tomcat, JVM, and EC2. The CloudWatch agent can be configured to collect more JVM metrics than the dashboard displays by default. For a list of all Tomcat metrics that you can collect, see [Collect Tomcat metrics](CloudWatch-Agent-JMX-metrics.md#CloudWatch-Agent-Tomcat-metrics). For a list of all JVM metrics that you can collect, see [Collect JVM metrics](CloudWatch-Agent-JMX-metrics.md#CloudWatch-Agent-JVM-metrics). For a list of Amazon EC2 metrics, see [Metrics collected by the CloudWatch agent on Linux and macOS instances](metrics-collected-by-CloudWatch-agent.md#linux-metrics-enabled-by-CloudWatch-agent).

**Expose JMX ports for the Tomcat server**

The CloudWatch agent relies on JMX to collect the metrics related to the Tomcat server and JVM process. To make this possible, you must expose the JMX port from your servers. To enable a JMX port for monitoring and management, you would set system properties for your Tomcat servers. You can use the environment variable `CATALINA_OPTS ` to set the required system properties for Tomcat. Review the startup scripts and configuration files of your Tomcat server on the best place to set the environment variable. Be sure that you specify an unused port number. You will need to restart the server after the change. 

```
export CATALINA_OPTS="-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=<<port-number>> -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false"
```

This example sets up unauthenticated JMX. If your security policies/requirements require you to enable JMX with password authentication or SSL for remote access, refer to the [JMX documentation ](https://docs.oracle.com/en/java/javase/17/management/monitoring-and-management-using-jmx-technology.html) to set the required property.

To verify the JMX port, run `ps aux | grep jmxremote.port`. The results should show that the JMX port was set on the JVM processes.

### Agent configuration for Tomcat solution
<a name="Solution-Agent-Configuration-Tomcat-Solution"></a>

The metrics collected by the agent are defined in the agent configuration. The solution provides agent configurations to collect the recommended metrics with suitable dimensions for the solution’s dashboard.

The steps for deploying the solution are described later in [Deploy the agent for your solution](#Solution-Tomcat-Agent-Deploy). The following information is intended to help you understand how to customize the agent configuration for your environment.

You must customize some parts of the following agent configuration for your environment:
+ The JMX port number is the port number that you configured in the previous section of this documentation. The port number is in the `endpoint` line in the configuration.
+ `AppName` – This is used as a dimension for the Tomcat application metrics collected. Provide a meaningful name that represents the grouping for the instances that run the Tomcat application.
+ `ProcessGroupName` – This is used as a dimension for JVM metrics collected for Tomcat hosts. Provide the value which is the same as `AppName` above. This is to enable viewing the JVM metrics of the same Tomcat app group as server metrics under the solution dashboard.

For example, if you have two Tomcat apps running in the same AWS account, one for the `billing-system` application and another for the `order-system` application, you can set the `AppName` and `ProcessGroupName` dimensions accordingly in the agent configuration of each instance.
+ For the `billing-system` application instances, set `AppName=billing-system` and `ProcessGroupName=billing-system`.
+ For the `order-system` application instances, set `AppName=order-system` and `ProcessGroupName=order-system`.

When you follow these guidelines, the solution will automatically group the metrics based on the `AppName` and `ProcessGroupName` dimensions. The dashboard will include dropdown options to select and view metrics for a specific Tomcat application, allowing you to monitor the performance of individual applications separately.

### Agent configuration for Tomcat hosts
<a name="Solution-Agent-Configuration-Tomcat-Host"></a>

Use the following CloudWatch agent configuration on EC2 instances where your Tomcat applications are deployed. Configuration will be stored as a parameter in SSM's Parameter Store, as detailed later in [Step 2: Store the recommended CloudWatch agent configuration file in Systems Manager Parameter Store](#Solution-Tomcat-Agent-Step2).

Replace *AppName* with a meaningful name that represents the Tomcat application the instances are part of. Replace *port-number* with the JMX port of your Tomcat server. If JMX was enabled with password authentication or SSL for remote access, see [Collect Java Management Extensions (JMX) metrics](CloudWatch-Agent-JMX-metrics.md) for information about setting up TLS or authorization in agent configuration as required.

The EC2 metrics shown in this configuration (configuration shown outside the JMX block) only work for Linux and macOS instances. If you are using Windows instances, you can choose to omit these metrics in the configuration. For information about metrics collected on Windows instances, see [Metrics collected by the CloudWatch agent on Windows Server instances](metrics-collected-by-CloudWatch-agent.md#windows-metrics-enabled-by-CloudWatch-agent).

```
{
  "metrics": {
    "namespace": "CWAgent",
    "append_dimensions": {
      "InstanceId": "${aws:InstanceId}"
    },
    "metrics_collected": {
      "jmx": [
        {
          "endpoint": "localhost:port-number",
          "tomcat": {
            "measurement": [
              "tomcat.sessions",
              "tomcat.errors",
              "tomcat.processing_time",
              "tomcat.traffic",
              "tomcat.max_time",
              "tomcat.request_count",
              "tomcat.threads"
            ]
          },
          "append_dimensions": {
            "AppName": "AppName"
          }
        },
        {
          "endpoint": "localhost:port-number",
          "jvm": {
            "measurement": [
              "jvm.classes.loaded",
              "jvm.gc.collections.count",
              "jvm.gc.collections.elapsed",
              "jvm.memory.heap.committed",
              "jvm.memory.heap.max",
              "jvm.memory.heap.used",
              "jvm.memory.nonheap.committed",
              "jvm.memory.nonheap.max",
              "jvm.memory.nonheap.used",
              "jvm.threads.count"
            ]
          },
          "append_dimensions": {
            "ProcessGroupName": "AppName"
          }
        }
      ],
      "disk": {
        "measurement": [
          "used_percent"
        ]
      },
      "mem": {
        "measurement": [
          "used_percent"
        ]
      },
      "swap": {
        "measurement": [
          "used_percent"
        ]
      },
      "netstat": {
        "measurement": [
          "tcp_established",
          "tcp_time_wait"
        ]
      }
    }
  }
}
```

## Deploy the agent for your solution
<a name="Solution-Tomcat-Agent-Deploy"></a>

There are several approaches for installing the CloudWatch agent, depending on the use case. We recommend using Systems Manager for this solution. It provides a console experience and makes it simpler to manage a fleet of managed servers within a single AWS account. The instructions in this section use Systems Manager and are intended for when you don’t have the CloudWatch agent running with existing configurations. You can check whether the CloudWatch agent is running by following the steps in [Verify that the CloudWatch agent is running](troubleshooting-CloudWatch-Agent.md#CloudWatch-Agent-troubleshooting-verify-running).

If you are already running the CloudWatch agent on the EC2 hosts where the JVM application is deployed and managing the agent configurations, you can skip the instructions in this section and follow your existing deployment mechanism to update the configuration. Be sure to merge the agent configuration of JVM with your existing agent configuration, and then deploy the merged configuration. If you are using Systems Manager to store and manage the configuration for the CloudWatch agent, you can merge the configuration to the existing parameter value. For more information, see [Managing CloudWatch agent configuration files](https://docs.aws.amazon.com/prescriptive-guidance/latest/implementing-logging-monitoring-cloudwatch/create-store-cloudwatch-configurations.html).

**Note**  
Using Systems Manager to deploy the following CloudWatch agent configurations will replace or overwrite any existing CloudWatch agent configuration on your EC2 instances. You can modify this configuration to suit your unique environment or use case. The metrics defined in this solution are the minimum required for the recommended dashboard. 

The deployment process includes the following steps:
+ Step 1: Ensure that the target EC2 instances have the required IAM permissions.
+ Step 2: Store the recommended agent configuration file in the Systems Manager Parameter Store.
+ Step 3: Install the CloudWatch agent on one or more EC2 instances using an CloudFormation stack.
+ Step 4: Verify the agent setup is configured properly.

### Step 1: Ensure the target EC2 instances have the required IAM permissions
<a name="Solution-Tomcat-Agent-Step1"></a>

You must grant permission for Systems Manager to install and configure the CloudWatch agent. You must also grant permission for the CloudWatch agent to publish telemetry from your EC2 instance to CloudWatch. Make sure that the IAM role attached to the instance has the **CloudWatchAgentServerPolicy** and **AmazonSSMManagedInstanceCore** IAM policies attached.
+ After the role is created, attach the role to your EC2 instances. Follow the steps in [Launch an instance with an IAM role](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html#launch-instance-with-role) to attach a role while launching a new EC2 instance. To attach a role to an existing EC2 instance, follow the steps in [Attach an IAM role to an instance](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html#attach-iam-role).

### Step 2: Store the recommended CloudWatch agent configuration file in Systems Manager Parameter Store
<a name="Solution-Tomcat-Agent-Step2"></a>

Parameter Store simplifies the installation of the CloudWatch agent on an EC2 instance by securely storing and managing configuration parameters, eliminating the need for hard-coded values. This ensures a more secure and flexible deployment process, enabling centralized management and easier updates to configurations across multiple instances.

Use the following steps to store the recommended CloudWatch agent configuration file as a parameter in Parameter Store.

**To create the CloudWatch agent configuration file as a parameter**

1. Open the AWS Systems Manager console at [https://console.aws.amazon.com/systems-manager/](https://console.aws.amazon.com/systems-manager/).

1. From the navigation pane, choose **Application Management**, **Parameter Store**.

1. Follow these steps to create a new parameter for the configuration.

   1. Choose **Create parameter**.

   1. In the **Name** box, enter a name that you'll use to reference the CloudWatch agent configuration file in later steps. For example, **AmazonCloudWatch-Tomcat-Configuration**.

   1. (Optional) In the **Description** box, type a description for the parameter.

   1. For **Parameter tier**, choose **Standard**. 

   1. For **Type**, choose **String**.

   1. For **Data type**, choose **text**.

   1. In the **Value** box, paste the corresponding JSON block that was listed in [Agent configuration for Tomcat hosts](#Solution-Agent-Configuration-Tomcat-Host). Be sure to customize the grouping dimension value and port number as described.

   1. Choose **Create parameter**. 

### Step 3: Install the CloudWatch agent and apply the configuration using an CloudFormation template
<a name="Solution-Tomcat-Agent-Step3"></a>

You can use AWS CloudFormation to install the agent and configure it to use the CloudWatch agent configuration that you created in the previous steps.

**To install and configure the CloudWatch agent for this solution**

1. Open the CloudFormation **Quick create stack** wizard using this link: [https://console.aws.amazon.com/cloudformation/home?\$1/stacks/quickcreate?templateURL=https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/CloudWatchAgent/CFN/v1.0.0/cw-agent-installation-template-1.0.0.json](https://console.aws.amazon.com/cloudformation/home?#/stacks/quickcreate?templateURL=https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/CloudWatchAgent/CFN/v1.0.0/cw-agent-installation-template-1.0.0.json). 

1. Verify that the selected Region on the console is the Region where the Tomcat workload is running.

1. For **Stack name**, enter a name to identity this stack, such as **CWAgentInstallationStack**.

1. In the **Parameters** section, specify the following:

   1. For **CloudWatchAgentConfigSSM**, enter the name of the Systems Manager parameter for the agent configuration that you created earlier, such as **AmazonCloudWatch-Tomcat-Configuration**.

   1. To select the target instances, you have two options.

      1. For **InstanceIds**, specify a comma-delimited list of instance IDs list of instance IDs where you want to install the CloudWatch agent with this configuration. You can list a single instance or several instances.

      1. If you are deploying at scale, you can specify the **TagKey** and the corresponding **TagValue** to target all EC2 instances with this tag and value. If you specify a **TagKey**, you must specify a corresponding **TagValue**. (For an Auto Scaling group, specify **aws:autoscaling:groupName** for the **TagKey** and specify the Auto Scaling group name for the **TagValue** to deploy to all instances within the Auto Scaling group.)

         If you specify both the **InstanceIds** and the **TagKeys** parameters, the **InstanceIds** will take precedence and the tags will be ignored.

1. Review the settings, then choose **Create stack**. 

If you want to edit the template file first to customize it, choose the **Upload a template file** option under **Create Stack Wizard** to upload the edited template. For more information, see [Creating a stack on CloudFormation console](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-console-create-stack.html). You can use the following link to download the template: [ https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/CloudWatchAgent/CFN/v1.0.0/cw-agent-installation-template-1.0.0.json](https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/CloudWatchAgent/CFN/v1.0.0/cw-agent-installation-template-1.0.0.json). 

**Note**  
After this step is completed, this Systems Manager parameter will be associated with the CloudWatch agents running in the targeted instances. This means that:  
If the Systems Manager parameter is deleted, the agent will stop.
If the Systems Manager parameter is edited, the configuration changes will automatically apply to the agent at the scheduled frequency which is 30 days by default.
If you want to immediately apply changes to this Systems Manager parameter, you must run this step again. For more information about associations, see [Working with associations in Systems Manager](https://docs.aws.amazon.com/systems-manager/latest/userguide/state-manager-associations.html).

### Step 4: Verify the agent setup is configured properly
<a name="Solution-Tomcat-Agent-Step4"></a>

You can verify whether the CloudWatch agent is installed by following the steps in [Verify that the CloudWatch agent is running](troubleshooting-CloudWatch-Agent.md#CloudWatch-Agent-troubleshooting-verify-running). If the CloudWatch agent is not installed and running, make sure you have set up everything correctly.
+ Be sure you have attached a role with correct permissions for the EC2 instance as described in [Step 1: Ensure the target EC2 instances have the required IAM permissions](#Solution-Tomcat-Agent-Step1).
+ Be sure you have correctly configured the JSON for the Systems Manager parameter. Follow the steps in [Troubleshooting installation of the CloudWatch agent with CloudFormation](Install-CloudWatch-Agent-New-Instances-CloudFormation.md#CloudWatch-Agent-CloudFormation-troubleshooting).

If everything is set up correctly, then you should see the Tomcat metrics being published to CloudWatch. You can check the CloudWatch console to verify they are being published.

**To verify that Tomcat metrics are being published to CloudWatch**

1. Open the CloudWatch console at [https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/).

1. Choose **Metrics**, **All metrics**.

1. Make sure you've selected the Region where you deployed the solution, and choose **Custom namespaces**, **CWAgent**.

1. Search for the metrics mentioned in the agent configuration section of this document, such as `tomcat.errors`. If you see results for these metrics, then the metrics are being published to CloudWatch.

## Create the Tomcat solution dashboard
<a name="Solution-Tomcat-Dashboard"></a>

This dashboard displays the newly emitted metrics, showing both the Tomcat application server and the underlying JVM. This dashboard provides a top contributor view for the health of your Tomcat workload. The top contributor view displays the top 10 per metric widget. This allows you to identify outliers at a glance. The dashboard also includes an overview of the cluster by aggregating and presenting metrics across all instances, providing a high-level summary of the overall health and operational state of the cluster.

The solution dashboard doesn't display EC2 metrics. To view EC2 metrics, you'll need to use the EC2 automatic dashboard to see EC2 vended metrics and use the EC2 console dashboard to see EC2 metrics that are collected by the CloudWatch agent. For more information about automatic dashboards for AWS services, see [Viewing a CloudWatch dashboard for a single AWS service](CloudWatch_Automatic_Dashboards_Focus_Service.md).

To create the dashboard, you can use the following options:
+ Use CloudWatch console to create the dashboard.
+ Use AWS CloudFormation console to deploy the dashboard.
+ Download the AWS CloudFormation infrastructure as code and integrate it as part of your continuous integration (CI) automation.

By using the CloudWatch console to create a dashboard, you can preview the dashboard before actually creating and being charged.

**Note**  
The dashboard created with CloudFormation in this solution displays metrics from the Region where the solution is deployed. Be sure to create the CloudFormation stack in the Region where your Tomcat metrics are published.  
If you've specified a custom namespace other than `CWAgent` in the CloudWatch agent configuration, you'll have to change the CloudFormation template for the dashboard to replace `CWAgent` with the customized namespace you are using.

**To create the dashboard via CloudWatch Console**
**Note**  
Solution dashboards currently display garbage collection-related metrics only for the G1 Garbage Collector, which is the default collector for the latest Java versions. If you are using a different garbage collection algorithm, the widgets pertaining to garbage collection are empty. However, you can customize these widgets by changing the dashboard CloudFormation template and applying the appropriate garbage collection type to the name dimension of the garbage collection-related metrics. For example, if you are using parallel garbage collection, change the **name=\$1"G1 Young Generation\$1"** to **name=\$1"Parallel GC\$1"** of the garbage collection count metric `jvm.gc.collections.count`.

1. Open the CloudWatch Console **Create Dashboard** using this link: [ https://console.aws.amazon.com/cloudwatch/home?\$1dashboards?dashboardTemplate=ApacheTomcatOnEc2&referrer=os-catalog ](https://console.aws.amazon.com/cloudwatch/home?#dashboards?dashboardTemplate=ApacheTomcatOnEc2&referrer=os-catalog). 

1. Verify that the selected Region on the console is the Region where the Tomcat workload is running.

1. Enter the name of the dashboard, then choose **Create Dashboard**.

   To easily differentiate this dashboard from similar dashboards in other Regions, we recommend including the Region name in the dashboard name, such as **TomcatDashboard-us-east-1**.

1. Preview the dashboard and choose **Save** to create the dashboard.

**To create the dashboard via CloudFormation**

1. Open the CloudFormation **Quick create stack** wizard using this link: [https://console.aws.amazon.com/cloudformation/home?\$1/stacks/quickcreate?templateURL=https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/Tomcat\$1EC2/CloudWatch/CFN/v1.0.0/dashboard-template-1.0.0.json](https://console.aws.amazon.com/cloudformation/home?#/stacks/quickcreate?templateURL=https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/Tomcat_EC2/CloudWatch/CFN/v1.0.0/dashboard-template-1.0.0.json). 

1. Verify that the selected Region on the console is the Region where the Tomcat workload is running.

1. For **Stack name**, enter a name to identity this stack, such as **TomcatDashboard-us-east-1**.

1. In the **Parameters** section, specify the name of the dashboard under the **DashboardName** parameter.

1. To easily differentiate this dashboard from similar dashboards in other Regions, we recommend including the Region name in the dashboard name, such as **TomcatDashboard-us-east-1**.

1. Acknowledge access capabilities for transforms under **Capabilities and transforms**. Note that CloudFormation doesn't add any IAM resources.

1. Review the settings, then choose **Create stack**. 

1. After the stack status is **CREATE\$1COMPLETE**, choose the **Resources** tab under the created stack and then choose the link under **Physical ID** to go to the dashboard. You can also access the dashboard in the CloudWatch console by choosing **Dashboards** in the left navigation pane of the console, and finding the dashboard name under **Custom Dashboards**.

If you want to edit the template file to customize it for any purpose, you can use **Upload a template file** option under **Create Stack Wizard** to upload the edited template. For more information, see [Creating a stack on CloudFormation console](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-console-create-stack.html). You can use this link to download the template: [https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/Tomcat\$1EC2/CloudWatch/CFN/v1.0.0/dashboard-template-1.0.0.json](https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/Tomcat_EC2/CloudWatch/CFN/v1.0.0/dashboard-template-1.0.0.json). 

**Note**  
Solution dashboards currently display garbage collection-related metrics only for the G1 Garbage Collector, which is the default collector for the latest Java versions. If you are using a different garbage collection algorithm, the widgets pertaining to garbage collection are empty. However, you can customize these widgets by changing the dashboard CloudFormation template and applying the appropriate garbage collection type to the name dimension of the garbage collection-related metrics. For example, if you are using parallel garbage collection, change the **name=\$1"G1 Young Generation\$1"** to **name=\$1"Parallel GC\$1"** of the garbage collection count metric `jvm.gc.collections.count`. 

### Get started with the Tomcat monitoring dashboard
<a name="Solution-Tomcat-GetStarted"></a>

Here are a few tasks that you can try out with the new Tomcat dashboard. These tasks allow you to validate that the dashboard is working correctly and provide you some hands-on experience using it to monitor a Tomcat application. As you try these out, you'll get familiar with navigating the dashboard and interpreting the visualized metrics.

**Using the dropdown list **

The dashboard provides a dropdown list at the top that you can use to filter and select the specific Tomcat application that you want to monitor. To display metrics for a specific Tomcat application, select that application name in the **Tomcat App** dropdown list. 

**Verify application health **

From the **App Overview** section, find the **Requests**, **Errors**, and **Error Rate** widgets. These provide a high-level summary of the application's request handling performance. Look for any abnormally high error counts or rates, which could indicate issues that need investigation.

**Monitor request processing **

In the **Request Processing Time** section, find the **Max Time** and **Total Time to Process All Requests** widgets. These metrics help you identify potential performance bottlenecks in request processing. Look for any servers with significantly higher max processing times compared to others. 

**Analyze network traffic**

In the **Network Traffic ** section, find the **Sent Traffic** and **Received Traffic ** widgets. These show the amount of data being sent and received by the application over the network. Unexpectedly high traffic levels could indicate potential issues with network saturation or inefficient data transfer.

**Investigate thread usage **

In the **Sessions and Threads** section, find the **Busy Threads Count**, **Threads Count**, and **Sessions** widgets. These metrics provide insights into the application's thread management and active user sessions. Look for any servers with an abnormally high number of busy threads or sessions, which could indicate potential resource constraints.

# CloudWatch solution: Amazon EC2 health
<a name="Solution-EC2-Health"></a>

This solution helps you configure out-of-the-box metric collection using CloudWatch agents for workloads running on EC2 instances. Additionally, it helps you set up a pre-configured CloudWatch dashboard.

**Topics**
+ [Requirements](#Solution-EC2-Health-Requirements)
+ [Benefits](#Solution-EC2-Health-Benefits)
+ [Costs](#Solution-EC2-Health-Costs)
+ [CloudWatch agent configuration for this solution](#Solution-EC2-Health-Agent-Config)
+ [Deploy the agent for your solution](#Solution-EC2-Health-Deploy)
+ [Create the EC2 Health solution dashboard](#Solution-EC2-Health-Dashboard)
+ [Get started with the EC2 Health solution dashboard](#Solution-EC2-Health-Dashboard-Usage)

## Requirements
<a name="Solution-EC2-Health-Requirements"></a>

This solution is relevant for the following conditions:
+ Compute: Amazon EC2
+ Platform: Linux and macOS
+ Supports up to 500 EC2 instances in a given AWS Region
+ Latest version of CloudWatch agent
+ SSM agent installed on EC2 instance
**Note**  
AWS Systems Manager (SSM agent) is pre-installed on some [Amazon Machine Images (AMIs)](https://docs.aws.amazon.com/systems-manager/latest/userguide/ami-preinstalled-agent.html) provided by AWS and trusted third-parties. If the agent isn't installed, you can install it manually using the procedure for your operating system type.  
[ Manually installing and uninstalling SSM Agent on EC2 instances for Linux](https://docs.aws.amazon.com/systems-manager/latest/userguide/manually-install-ssm-agent-linux.html)
[ Manually installing and uninstalling SSM Agent on EC2 instances for macOS](https://docs.aws.amazon.com/systems-manager/latest/userguide/manually-install-ssm-agent-macos.html)
[ Manually installing and uninstalling SSM Agent on EC2 instances for Windows Server](https://docs.aws.amazon.com/systems-manager/latest/userguide/manually-install-ssm-agent-windows.html)

## Benefits
<a name="Solution-EC2-Health-Benefits"></a>

The solution delivers EC2 server monitoring using the CloudWatch Agent, providing additional system-level metrics on top of the standard EC2 namespace metrics for the following use cases:
+ Detect CPU performance issues and resource constraints.
+ Monitor disk utilization and storage capacity across different disks throughout your EC2 instances.
+ Track memory usage patterns and potential memory leaks.
+ Analyze I/O operations and their impact on overall performance.
+ Observe network traffic patterns and potential anomalies.

Below are the key advantages of the solution:
+ Automates metric collection for EC2 instances eliminating manual instrumentation.
+ Provides a pre-configured, consolidated CloudWatch dashboard for EC2 instance metrics. The dashboard will automatically handle metrics from new EC2 instances configured using the solution, even if those metrics don't exist when you first create the dashboard. It also allows you to observe EC2 instances managed via Auto Scaling groups.

The following image is an example of the dashboard for this solution.

![\[Example of EC2 Health dashboard\]](http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/images/EC2HealthDashboard.png)


## Costs
<a name="Solution-EC2-Health-Costs"></a>

This solution creates and uses resources in your account. You are charged for standard usage, including the following:
+ All metrics collected by the CloudWatch agent are charged as custom metrics. The number of metrics used by this solution depends on the number of EC2 hosts.

  The total number of CloudWatch agent metrics depends on the configuration of disks. Excluding disk and diskio metrics, the solution publishes six metrics. The number of disk metrics (`disk_used_percent`, `disk_inodes_free`) depend on the count of `device/fstype/path` dimensions. The diskio metric (`diskio_io_time`) depends on the count of `name` dimensions. For example, a single t2.micro with default settings as per EC2 console experience, produces a total of 22 CloudWatch agent metrics (4 CPU, 12 disk, 4 diskio, 1 memory, and 1 swap). Vended metrics like `AWS/EC2` are provided free of charge.
+ One custom dashboard.
+ API operations requested by the CloudWatch agent to publish the metrics. With the default configuration for this solution, the CloudWatch agent calls the **PutMetricData** once every minute. This means the **PutMetricData** API will be called `30*24*60=43,200` in a 30-day month for each EC2 host.

For more information about CloudWatch pricing, see [Amazon CloudWatch Pricing](https://aws.amazon.com/cloudwatch/pricing/).

The pricing calculator can help you estimate approximate monthly costs for using this solution.

**To use the pricing calculator to estimate your monthly solution costs**

1. Open the [Amazon CloudWatch pricing calculator](https://calculator.aws/#/createCalculator/CloudWatch).

1. In the **Metrics** section, for **Number of metrics**, enter **(6 \$1 total count of disk and diskio metrics per EC2 host as described above) \$1 number of EC2 instances configured for this solution**.

1. In the **APIs** section, for **Number of API requests**, enter **43200 \$1 number of EC2 instances configured for this solution**.

1. By default, the solution performs one **PutMetricData** operation each minute for each EC2 host.

1. In the **Dashboards and Alarms** section, for **Number of Dashboards**, enter **1**.

1. You can see your monthly estimated costs at the bottom of the pricing calculator.

## CloudWatch agent configuration for this solution
<a name="Solution-EC2-Health-Agent-Config"></a>

The CloudWatch agent is software that runs continuously and autonomously on your servers and in containerized environments. It collects metrics, logs, and traces from your infrastructure and applications and sends them to CloudWatch and X-Ray.

For more information about the CloudWatch agent, see [Collect metrics, logs, and traces with the CloudWatch agent](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Install-CloudWatch-Agent.html).

The agent configuration in this solution collects a set of metrics to help you get started monitoring and observing your EC2 instances. The CloudWatch agent can be configured to collect more EC2 metrics than the dashboard displays by default. For a list of Amazon EC2 metrics, see [Metrics collected by the CloudWatch agent on Linux and macOS instances](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/metrics-collected-by-CloudWatch-agent.html#linux-metrics-enabled-by-CloudWatch-agent). For information about metrics collected on Windows instances, see [Metrics collected by the CloudWatch agent on Windows Server instances](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/metrics-collected-by-CloudWatch-agent.html#windows-metrics-enabled-by-CloudWatch-agent).

### Agent configuration for EC2 Health solution
<a name="Solution-EC2-Health-Agent-Config-Details"></a>

The metrics collected by the agent are defined in the agent configuration. The solution provides agent configurations to collect the recommended metrics with suitable dimensions for the solution's dashboard.

The steps for deploying the solution are described later in [Deploy the agent for your solution](#Solution-EC2-Health-Deploy). The following information is intended to help you understand how to customize the agent configuration for your environment.

**Note**  
If an EC2 instance is not part of an Auto Scaling group, the CloudWatch agent drops the `AutoScalingGroupName` dimension entirely. This behavior helps to prevent dimension names with null/empty values. Each metric widget included in the solution dashboard searches for metrics which include and exclude the `AutoScalingGroup` dimension. This helps to ensure that all EC2 instances where the solution is applied are supported by the same dashboard.

If you wish to make any modifications to the agent configuration, you must apply the same changes to the solution's accompanying dashboard. For example, if you decide to omit the ImageId dimension, the same dimension must be removed from the metric search expression used in the dashboard widgets.

### Agent configuration for EC2 Instances
<a name="Solution-EC2-Health-Agent-Config-Instance"></a>

Use the following CloudWatch agent configuration on Amazon EC2 instances where your workloads are deployed.

```
{
    "agent": {
      "metrics_collection_interval": 60,
      "run_as_user": "cwagent"
    },
    "metrics": {
      "append_dimensions": {
        "InstanceId": "${aws:InstanceId}",
        "InstanceType": "${aws:InstanceType}",
        "ImageId": "${aws:ImageId}",
        "AutoScalingGroupName": "${aws:AutoScalingGroupName}"
      },
      "metrics_collected": {
        "cpu": {
          "measurement": [
            "cpu_usage_idle",
             "cpu_usage_iowait",
             "cpu_usage_user",
             "cpu_usage_system"
          ],
          "totalcpu": true
        },
        "disk": {
          "measurement": [
            "used_percent",
            "inodes_free"
          ],
          "resources": [
            "*"
          ],
          "dimensions": [
            ["device", "fstype", "path"]
          ]
        },
        "diskio": {
           "measurement": [
             "io_time"
           ],
           "resources": [
             "*"
           ]
          },
        "mem": {
          "measurement": [
            "used_percent"
          ]
        },
        "swap": {
          "measurement": [
            "used_percent"
          ]
        }
      }
    }
  }
```

## Deploy the agent for your solution
<a name="Solution-EC2-Health-Deploy"></a>

There are several approaches for installing the CloudWatch agent, depending on the use case. We recommend using Systems Manager for this solution. It provides a console experience and makes it simpler to manage a fleet of managed servers within a single AWS account. The instructions in this section use Systems Manager and are intended for when you don't have the CloudWatch agent running with existing configurations. You can check whether the CloudWatch agent is running by following the steps in [Verify that the CloudWatch agent is running](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/troubleshooting-CloudWatch-Agent.html#CloudWatch-Agent-troubleshooting-verify-running).

If you are already running the CloudWatch agent on the EC2 hosts and managing the agent configurations, you can skip the instructions in this section and follow your existing deployment mechanism to update the configuration. Be sure to merge the EC2 Health agent configuration with your existing agent configuration, and then deploy the merged configuration. If you are using Systems Manager to store and manage the configuration for the CloudWatch agent, you can merge the configuration to the existing parameter value. For more information, see [Managing CloudWatch agent configuration files](https://docs.aws.amazon.com/prescriptive-guidance/latest/implementing-logging-monitoring-cloudwatch/create-store-cloudwatch-configurations.html).

**Note**  
Using Systems Manager to deploy the following CloudWatch agent configurations will replace or overwrite any existing CloudWatch agent configuration on your EC2 instances. You can modify this configuration to suit your unique environment or use case. The metrics defined in configuration are the minimum required for the dashboard provided the solution.

The deployment process includes the following steps:
+ Step 1: Ensure that the target EC2 instances have the required IAM permissions.
+ Step 2: Store the recommended agent configuration file in the Systems Manager Parameter Store.
+ Step 3: Install the CloudWatch agent on one or more EC2 instances using an CloudFormation stack.
+ Step 4: Verify the agent setup is configured properly.

### Step 1: Ensure the target EC2 instances have the required IAM permissions
<a name="Solution-EC2-Health-Deploy-Step1"></a>

You must grant permission for Systems Manager to install and configure the CloudWatch agent. You must also grant permission for the CloudWatch agent to publish telemetry from your EC2 instance to CloudWatch. Make sure that the IAM role attached to the instance has the **CloudWatchAgentServerPolicy** and **AmazonSSMManagedInstanceCore** IAM policies attached.
+ After the role is created, attach the role to your EC2 instances. To attach a role to an EC2 instance, follow the steps in [Attach an IAM role to an instance](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/attach-iam-role.html).

### Step 2: Store the recommended CloudWatch agent configuration file in Systems Manager Parameter Store
<a name="Solution-EC2-Health-Deploy-Step2"></a>

Parameter Store simplifies the installation of the CloudWatch agent on an EC2 instance by securely storing and managing configuration parameters, eliminating the need for hard-coded values. This ensures a more secure and flexible deployment process, enabling centralized management and easier updates to configurations across multiple instances.

Use the following steps to store the recommended CloudWatch agent configuration file as a parameter in Parameter Store. 

**To create the CloudWatch agent configuration file as a parameter**

1. Open the AWS Systems Manager console at [https://console.aws.amazon.com/systems-manager/](https://console.aws.amazon.com/systems-manager/).

1. Verify that the selected Region on the console is the Region where the EC2 instances are running.

1. From the navigation pane, choose **Application Management**, **Parameter Store**.

1. Follow these steps to create a new parameter for the configuration.

   1. Choose **Create parameter**.

   1. In the **Name** box, enter a name that you'll use to reference the CloudWatch agent configuration file in later steps. For example, **AmazonCloudWatch-EC2Health-Configuration**.

   1. (Optional) In the **Description** box, type a description for the parameter.

   1. For **Parameter tier**, choose **Standard**.

   1. For **Type**, choose **String**.

   1. For **Data type**, choose **text**.

   1. In the **Value** box, paste the agent configuration JSON provided earlier in this document.

   1. Choose **Create parameter**.

### Step 3: Install the CloudWatch agent and apply the configuration using an CloudFormation template
<a name="Solution-EC2-Health-Deploy-Step3"></a>

You can use CloudFormation to install the agent and configure it to use the CloudWatch agent configuration that you created in the previous steps.

**To install and configure the CloudWatch agent for this solution**

1. Open the CloudFormation **Quick create stack** wizard using this link: [ https://console.aws.amazon.com/cloudformation/home?\$1/stacks/quickcreate?templateURL=https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/CloudWatchAgent/CFN/v1.0.0/cw-agent-installation-template-1.0.0.json ](https://console.aws.amazon.com/cloudformation/home?#/stacks/quickcreate?templateURL=https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/CloudWatchAgent/CFN/v1.0.0/cw-agent-installation-template-1.0.0.json).

1. Verify that the selected Region on the console is the Region where the EC2 instances are running.

1. For **Stack name**, enter a name to identity this stack, such as **CWAgentInstallationStack**.

1. In the **Parameters** section, specify the following:

   1. For **CloudWatchAgentConfigSSM**, enter the name of the Systems Manager parameter for the agent configuration that you created earlier, such as **AmazonCloudWatch-EC2Health-Configuration**.

   1. To select the target instances, you have two options.

      1. For **InstanceIds**, specify a comma-delimited list of instance IDs list of instance IDs where you want to install the CloudWatch agent with this configuration. You can list a single instance or several instances.

      1. If you are deploying at scale, you can specify the **TagKey** and the corresponding **TagValue** to target all EC2 instances with this tag and value. If you specify a **TagKey**, you must specify a corresponding **TagValue**. (For an Auto Scaling group, specify **aws:autoscaling:groupName** for the **TagKey** and specify the Auto Scaling group name for the **TagValue** to deploy to all instances within the Auto Scaling group.)

      If you specify both the **InstanceIds** and the **TagKeys** parameters, the **InstanceIds** will take precedence and the tags will be ignored.

1. Review the settings, then choose **Create stack**.

If you want to edit the template file first to customize it, choose the **Upload a template file** option under **Create Stack Wizard** to upload the edited template. For more information, see [Creating a stack on CloudFormation console](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-console-create-stack.html). You can use the following link to download the template: [ https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/CloudWatchAgent/CFN/v1.0.0/cw-agent-installation-template-1.0.0.json ](https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/CloudWatchAgent/CFN/v1.0.0/cw-agent-installation-template-1.0.0.json).

**Note**  
After this step is completed, this Systems Manager parameter will be associated with the CloudWatch agents running in the targeted instances. This means that:  
If the Systems Manager parameter is deleted, the agent will stop.
If the Systems Manager parameter is edited, the configuration changes will automatically apply to the agent at the scheduled frequency which is 30 days by default.
If you want to immediately apply changes to this Systems Manager parameter, you must run this step again. For more information about associations, see [Working with associations in AWS Systems Manager](https://docs.aws.amazon.com/systems-manager/latest/userguide/state-manager-associations.html).

### Step 4: Verify the agent setup is configured properly
<a name="Solution-EC2-Health-Deploy-Step4"></a>

You can verify whether the CloudWatch agent is installed by following the steps in [Verify that the CloudWatch agent is running](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/troubleshooting-CloudWatch-Agent.html#CloudWatch-Agent-troubleshooting-verify-running). If the CloudWatch agent is not installed and running, make sure you have set up everything correctly.
+ Be sure you have attached a role with correct permissions for the EC2 instance as described in [Step 1: Ensure the target EC2 instances have the required IAM permissions](#Solution-EC2-Health-Deploy-Step1).
+ Be sure you have correctly configured the JSON for the Systems Manager parameter. Follow the steps in [Troubleshooting installation of the CloudWatch agent with CloudFormation](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Install-CloudWatch-Agent-New-Instances-CloudFormation.html#CloudWatch-Agent-CloudFormation-troubleshooting).

**To verify that EC2 health metrics are being published to CloudWatch**

1. Open the CloudWatch console at [https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/).

1. Choose **Metrics**, **All metrics**.

1. Make sure you've selected the Region where you deployed the solution, and choose **Custom namespaces**, **CWAgent**.

1. Search for the metrics mentioned in the agent configuration section of this document, such as `mem_used_percent`. If you see results for these metrics, then the metrics are being published to CloudWatch.

## Create the EC2 Health solution dashboard
<a name="Solution-EC2-Health-Dashboard"></a>

This dashboard displays the newly emitted metrics, showing the EC2 Health metrics. This dashboard provides a top contributor view for the health of your EC2 instances in a single region. The top contributor view displays the top 10 per metric widget. This allows you to identify outliers at a glance.

To create the dashboard, you can use the following options:
+ Use CloudWatch console to create the dashboard.
+ Use AWS CloudFormation console to deploy the dashboard.
+ Download the AWS CloudFormation infrastructure as code and integrate it as part of your continuous integration (CI) automation.

By using the CloudWatch console to create a dashboard, you can preview the dashboard before actually creating and being charged.

**Note**  
The dashboard created with CloudFormation in this solution displays metrics from the Region where the solution is deployed. Be sure to create the CloudFormation stack in the Region where your EC2 metrics are published.  
If you've specified a custom namespace other than `CWAgent` in the CloudWatch agent configuration, you'll have to change the CloudFormation template for the dashboard to replace `CWAgent` with the customized namespace you are using.

**To create the dashboard via CloudWatch Console**

1. Open the CloudWatch Console **Create Dashboard** using this link: [ https://console.aws.amazon.com/cloudwatch/home?\$1dashboards?dashboardTemplate=Ec2LinuxMacOsHealth&referrer=os-catalog ](https://console.aws.amazon.com/cloudwatch/home?#dashboards?dashboardTemplate=Ec2LinuxMacOsHealth&referrer=os-catalog). 

1. Verify that the selected Region on the console is the Region where the EC2 instances are running.

1. Enter the name of the dashboard, then choose **Create Dashboard**.

   To easily differentiate this dashboard from similar dashboards in other Regions, we recommend including the Region name in the dashboard name, such as **EC2HealthDashboard-us-east-1**.

1. Preview the dashboard and choose **Save** to create the dashboard.

**To create the dashboard via CloudFormation**

1. Open the CloudFormation **Quick create stack** wizard using this link: [ https://console.aws.amazon.com/cloudformation/home?\$1/stacks/quickcreate?templateURL=https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/EC2\$1Health/CloudWatch/CFN/v1.0.0/dashboard-template-linux-macos-1.0.0.json ](https://console.aws.amazon.com/cloudformation/home?#/stacks/quickcreate?templateURL=https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/EC2_Health/CloudWatch/CFN/v1.0.0/dashboard-template-linux-macos-1.0.0.json). 

1. Verify that the selected Region on the console is the Region where the EC2 instances are running.

1. For **Stack name**, enter a name to identity this stack, such as `EC2HealthDashboardStack`.

1. In the **Parameters** section, specify the name of the dashboard under the **DashboardName** parameter.

   To easily differentiate this dashboard from similar dashboards in other Regions, we recommend including the Region name in the dashboard name, such as **EC2HealthDashboard-us-east-1**.

1. Acknowledge access capabilities for transforms under **Capabilities and transforms**. Note that CloudFormation doesn't add any IAM resources.

1. Review the settings, then choose **Create stack**.

1. After the stack status is **CREATE\$1COMPLETE**, choose the **Resources** tab under the created stack and then choose the link under **Physical ID** to go to the dashboard. You can also access the dashboard in the CloudWatch console by choosing **Dashboards** in the left navigation pane of the console, and finding the dashboard name under **Custom Dashboards**.

If you want to edit the template file to customize it for any purpose, you can use the **Upload a template file** option under **Create Stack Wizard** to upload the edited template. For more information, see [Creating a stack on CloudFormation console](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-console-create-stack.html). You can use this link to download the template: [ https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/EC2\$1Health/CloudWatch/CFN/v1.0.0/dashboard-template-linux-macos-1.0.0.json ](https://aws-observability-solutions-prod-us-east-1.s3.us-east-1.amazonaws.com/EC2_Health/CloudWatch/CFN/v1.0.0/dashboard-template-linux-macos-1.0.0.json) 

## Get started with the EC2 Health solution dashboard
<a name="Solution-EC2-Health-Dashboard-Usage"></a>

Here are a few tasks that you can try out with the new EC2 monitoring dashboard. These tasks allow you to validate that the dashboard is working correctly and provide you some hands-on experience using it to monitor EC2 instances. As you try these out, you'll get familiar with navigating the dashboard and interpreting the visualized metrics.

Monitor the various CPU utilization metrics  
In the **CPU** section, examine the array of CPU usage metrics. These provide insight into how CPU resources are being utilized across different activities like user processes, system tasks, and I/O operations. Look for instances with consistently high utilization or unusual patterns, which might indicate the need for scaling or optimization.

Analyze disk utilization across different devices  
Navigate to the **Disk** section to find the storage usage and inode availability metrics. These help you identify instances that are running low on storage space or file system resources. Pay attention to any instances approaching high disk usage levels, as this could lead to performance issues or service disruptions.

Investigate memory utilization patterns  
In the **Memory** section, observe the graph which plots memory utilization over time. This shows how much of the available memory is being used by each instance. Look for patterns or spikes in memory usage that might correlate with specific times or events. High memory utilization could indicate the need for instance resizing or application optimization.

Correlate patterns across core utilization metrics  
Compare and watch out for related utilization patterns. For example, a workload running a log rotation process could present regular increases in **CPU** and **memory** utilization, followed by a decrease in **disk** utilization.

Inspect network activity  
In the **Network** section, examine the inbound and outbound network traffic metrics, both in terms of data volume and packet count. These give you insight into the network activity for your EC2 instances. Look out for both regular or anomalous spikes in network traffic, or imbalances between inbound and outbound data.