**Help improve this page** 

To contribute to this user guide, choose the **Edit this page on GitHub** link that is located in the right pane of every page.

# Monitor your cluster performance and view logs
<a name="eks-observe"></a>

You can observe your data in Amazon EKS using many available monitoring or logging tools. Your Amazon EKS log data can be streamed to AWS services or to partner tools for data analysis. There are many services available in the AWS Management Console that provide data for troubleshooting your Amazon EKS issues. You can also use an AWS-supported open-source solution for [monitoring Amazon EKS infrastructure](https://docs.aws.amazon.com/grafana/latest/userguide/solution-eks.html).

After selecting **Clusters** in the left navigation pane of the Amazon EKS console, you can view cluster health and details by choosing your cluster’s name and choosing the **Observability** tab. To view details about any existing Kubernetes resources that are deployed to your cluster, see [View Kubernetes resources in the AWS Management Console](view-kubernetes-resources.md).

Monitoring is an important part of maintaining the reliability, availability, and performance of Amazon EKS and your AWS solutions. We recommend that you collect monitoring data from all of the parts of your AWS solution. That way, you can more easily debug a multi-point failure if one occurs. Before you start monitoring Amazon EKS, make sure that your monitoring plan addresses the following questions.
+ What are your goals? Do you need real-time notifications if your clusters scale dramatically?
+ What resources need to be observed?
+ How frequently do you need to observe these resources? Does your company want to respond quickly to risks?
+ What tools do you intend to use? If you already run AWS Fargate as part of your launch, then you can use the built-in [log router](fargate-logging.md).
+ Who do you intend to perform the monitoring tasks?
+ Whom do you want notifications to be sent to when something goes wrong?

## Monitoring and logging on Amazon EKS
<a name="logging-monitoring"></a>

Amazon EKS provides built-in tools for monitoring and logging. For supported versions, the observability dashboard gives visibility into the performance of your cluster. It helps you to quickly detect, troubleshoot, and remediate issues. In addition to monitoring features, it includes lists based on the control plane audit logs. The Kubernetes control plane exposes a number of metrics that can also be scraped outside of the console.

Control plane logging records all API calls to your clusters, audit information capturing what users performed what actions to your clusters, and role-based information. For more information, see [Logging and monitoring on Amazon EKS](https://docs.aws.amazon.com/prescriptive-guidance/latest/implementing-logging-monitoring-cloudwatch/amazon-eks-logging-monitoring.html) in the * AWS Prescriptive Guidance*.

Amazon EKS control plane logging provides audit and diagnostic logs directly from the Amazon EKS control plane to CloudWatch Logs in your account. These logs make it easy for you to secure and run your clusters. You can select the exact log types you need, and logs are sent as log streams to a group for each Amazon EKS cluster in CloudWatch. For more information, see [Send control plane logs to CloudWatch Logs](control-plane-logs.md).

**Note**  
When you check the Amazon EKS authenticator logs in Amazon CloudWatch, the entries are displayed that contain text similar to the following example text.  

```
level=info msg="mapping IAM role" groups="[]" role="arn:aws:iam::111122223333:role/XXXXXXXXXXXXXXXXXX-NodeManagerRole-XXXXXXXX" username="eks:node-manager"
```
Entries that contain this text are expected. The `username` is an Amazon EKS internal service role that performs specific operations for managed node groups and Fargate.  
For low-level, customizable logging, then [Kubernetes logging](https://kubernetes.io/docs/concepts/cluster-administration/logging/) is available.

Amazon EKS is integrated with AWS CloudTrail, a service that provides a record of actions taken by a user, role, or an AWS service in Amazon EKS. CloudTrail captures all API calls for Amazon EKS as events. The calls captured include calls from the Amazon EKS console and code calls to the Amazon EKS API operations. For more information, see [Log API calls as AWS CloudTrail events](logging-using-cloudtrail.md).

The Kubernetes API server exposes a number of metrics that are useful for monitoring and analysis. For more information, see [Monitor your cluster metrics with Prometheus](prometheus.md).

To configure Fluent Bit for custom Amazon CloudWatch logs, see [Setting up Fluent Bit](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Container-Insights-setup-logs-FluentBit.html#Container-Insights-FluentBit-setup) in the *Amazon CloudWatch User Guide*.

## Amazon EKS monitoring and logging tools
<a name="eks-monitor-tools"></a>

Amazon Web Services provides various tools that you can use to monitor Amazon EKS. You can configure some tools to set up automatic monitoring, but some require manual calls. We recommend that you automate monitoring tasks as much as your environment and existing toolset allows.

The following table describes various monitoring tool options.


| Areas | Tool | Description | Setup | 
| --- | --- | --- | --- | 
|  Control plane  |   [Observability dashboard](observability-dashboard.md)   |  For supported versions, the observability dashboard gives visibility into the performance of your cluster. It helps you to quickly detect, troubleshoot, and remediate issues.  |   [Setup procedure](observability-dashboard.md)   | 
|  Applications / control plane  |   [Prometheus](https://docs.aws.amazon.com/prometheus/latest/userguide/what-is-Amazon-Managed-Service-Prometheus.html)   |  Prometheus can be used to monitor metrics and alerts for applications and the control plane.  |   [Setup procedure](prometheus.md)   | 
|  Applications  |   [CloudWatch Container Insights](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/ContainerInsights.html)   |  CloudWatch Container Insights collects, aggregates, and summarizes metrics and logs from your containerized applications and microservices.  |   [Setup procedure](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/deploy-container-insights-EKS.html)   | 
|  Applications  |   [AWS Distro for OpenTelemetry (ADOT)](https://aws-otel.github.io/docs/introduction)   |  ADOT can collect and sends correlated metrics, trace data, and metadata to AWS monitoring services or partners. It can be set up through CloudWatch Container Insights.  |   [Setup procedure](opentelemetry.md)   | 
|  Applications  |   [Amazon DevOps Guru](https://aws.amazon.com/about-aws/whats-new/2021/11/amazon-devops-guru-coverage-amazon-eks-metrics-cluster/)   |  Amazon DevOps Guru detects node-level operational performance and availability.  |   [Setup procedure](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/deploy-container-insights-EKS.html)   | 
|  Applications  |   [AWS X-Ray](https://docs.aws.amazon.com/xray/latest/devguide/aws-xray.html)   |   AWS X-Ray receives trace data about your application. This trace data includes ingoing and outgoing requests and metadata about the requests. For Amazon EKS, the implementation requires the OpenTelemetry add-on.  |   [Setup procedure](https://docs.aws.amazon.com/xray/latest/devguide/xray-instrumenting-your-app.html)   | 
|  Applications  |   [Amazon CloudWatch](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/WhatIsCloudWatch.html)   |  CloudWatch provides some basic Amazon EKS metrics for free on supported versions. You can expand this functionality with the CloudWatch Observability Operator to handle collecting metrics, logs, and trace data.  |   [Setup procedure](cloudwatch.md)   | 

The following table describes various logging tool options.


| Areas | Tool | Description | Setup | 
| --- | --- | --- | --- | 
|  Control plane  |   [Observability dashboard](observability-dashboard.md)   |  For supported versions, the observability dashboard shows lists based on the control plane audit logs. It also includes links to control plane logs in Amazon CloudWatch.  |   [Setup procedure](observability-dashboard.md)   | 
|  Applications  |   [Amazon CloudWatch Container Insights](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/ContainerInsights.html)   |  Amazon CloudWatch Container Insights collects, aggregates, and summarizes metrics and logs from your containerized applications and microservices.  |   [Setup procedure](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Container-Insights-setup-EKS-quickstart.html)   | 
|  Control plane  |   [Amazon CloudWatch Logs](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/WhatIsCloudWatchLogs.html)   |  You can send audit and diagnostic logs directly from the Amazon EKS control plane to CloudWatch Logs in your account.  |   [Setup procedure](control-plane-logs.md)   | 
|  Control plane  |   [AWS CloudTrail](logging-using-cloudtrail.md)   |  It logs API calls by a user, role, or service.  |   [Setup procedure](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-create-and-update-a-trail.html)   | 
|  Multiple areas for AWS Fargate instances  |   [AWS Fargate log router](fargate-logging.md)   |  For AWS Fargate instances, the log router streams logs to AWS services or partner tools. It uses [AWS for Fluent Bit](https://github.com/aws/aws-for-fluent-bit). Logs can be streamed to other AWS services or partner tools.  |   [Setup procedure](fargate-logging.md)   | 

# Monitor your cluster with the observability dashboard
<a name="observability-dashboard"></a>

The Amazon EKS console includes an observability dashboard that gives visibility into the performance of your cluster. The information it provides helps you to quickly detect, troubleshoot, and remediate issues. You can open the applicable section of the observability dashboard by choosing an item in the **Health and performance summary**. This summary is included in several places, including the **Observability** tab.

The observability dashboard is split into several tabs.

## Summary
<a name="observability-summary"></a>

The **Health and performance summary** lists the quantity of items in various categories. Each number acts as a hyperlink to a location in the observability dashboard with a list for that category.

## Cluster health
<a name="observability-cluster-health"></a>

 **Cluster health** provides important notifications to be aware of, some of which you may need to take action on as soon as possible. With this list, you can see descriptions and the affected resources. Cluster health includes two tables: **Health issues** and **Configuration insights**. To refresh the status of **Health issues**, choose the refresh button ( ↻ ). **Configuration insights** update automatically once every 24 hours and can’t be manually refreshed.

For more information about **Health issues**, see [Cluster health FAQs and error codes with resolution paths](troubleshooting.md#cluster-health-status). For more information about **Configuration insights**, see [Prepare for Kubernetes version upgrades and troubleshoot misconfigurations with cluster insights](cluster-insights.md).

## Control plane monitoring
<a name="observability-control-plane"></a>

The **Control plane monitoring** tab is divided into three sections, each of which help you to monitor and troubleshoot your cluster’s control plane.

### Metrics
<a name="observability-metrics"></a>

For clusters that are Kubernetes version `1.28` and above, the **Metrics** section shows graphs of several metrics gathered for various control plane components.

You can set the time period used by the X-axis of every graph by making selections at the top of the section. You can refresh data with the refresh button ( ↻ ). For each separate graph, the vertical ellipses button ( ⋮ ) opens a menu with options from CloudWatch.

These metrics and more are automatically available as basic monitoring metrics in CloudWatch under the `AWS/EKS` namespace. For more information, see [Basic monitoring and detailed monitoring](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/cloudwatch-metrics-basic-detailed.html) in the *Amazon CloudWatch User Guide*. To get more detailed metrics, visualization, and insights, see [Container Insights](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/ContainerInsights.html) in the *Amazon CloudWatch User Guide*. Or if you prefer Prometheus based monitoring, see [Monitor your cluster metrics with Prometheus](prometheus.md).

The following table describes available metrics.


| Metric | Description | 
| --- | --- | 
|  APIServer Requests  |  The requests per minute made to the API server.  | 
|  APIServer Total Requests 4XX  |  The count of API server requests per minute that had HTTP 4XX response codes (client-side errors).  | 
|  APIServer Total Requests 5XX  |  The count of API server requests per minute that had HTTP 5XX response codes (server-side errors).  | 
|  APIServer Total Requests 429  |  The count of API server requests per minute that had HTTP 429 response codes (too many requests).  | 
|  Storage size  |  The storage database (`etcd`) size.  | 
|  Scheduler attempts  |  The number of attempts to schedule pods by results "unschedulable" "error", and "scheduled".  | 
|  Pending pods  |  The number of pending pods by queue type of "active", "backoff", "unschedulable", and "gated".  | 
|  API server request latency  |  The latency for API server requests.  | 
|  API server current inflight requests  |  The current in-flight requests for the API server.  | 
|  Webhook requests  |  The webhook requests per minute.  | 
|  Webhook request rejections  |  The count of webhook requests that were rejected.  | 
|  Webhook request latency P99  |  The 99th percentile latency of external, third-party webhook requests.  | 

### CloudWatch Log Insights
<a name="observability-log-insights"></a>

The **CloudWatch Log Insights** section shows various lists based on the control plane audit logs. The Amazon EKS control plane logs need to be turned on to use this feature, which you can do from the **View control plane logs in CloudWatch** section.

When enough time has passed to collect data, you can **Run all queries** or choose **Run query** for a single list at a time. An additional cost will incur from CloudWatch whenever you run queries. Choose the time period of results you want to view at the top of the section. If you want more advanced control for any query, you can choose **View in CloudWatch**. This will allow you to update a query in CloudWatch to fit your needs.

For more information, see [Analyzing log data with CloudWatch Logs Insights](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AnalyzingLogData.html) in the Amazon CloudWatch Logs User Guide.

### View control plane logs in CloudWatch
<a name="observability-cp-logs"></a>

Choose **Manage logging** to update the log types that are available. It takes several minutes for the logs to appear in CloudWatch Logs after you enable logging. When enough time has passed, choose any of the **View** links in this section to navigate to the applicable log.

For more information, see [Send control plane logs to CloudWatch Logs](control-plane-logs.md).

## Cluster insights
<a name="observability-cluster-insights"></a>

The **Upgrade insights** table both surfaces issues and recommends corrective actions, accelerating the validation process for upgrading to new Kubernetes versions. Amazon EKS automatically scans clusters against a list of potential Kubernetes version upgrade impacting issues. The **Upgrade insights** table lists the insight checks performed by Amazon EKS against this cluster, along with their associated statuses.

Amazon EKS maintains and periodically refreshes the list of insight checks to be performed based on evaluations of changes in the Kubernetes project as well as Amazon EKS service changes tied to new versions. The Amazon EKS console automatically refreshes the status of each insight, which can be seen in the last refresh time column.

For more information, see [Prepare for Kubernetes version upgrades and troubleshoot misconfigurations with cluster insights](cluster-insights.md).

## Node health issues
<a name="observability-node-health-issues"></a>

The Amazon EKS node monitoring agent automatically reads node logs to detect health issues. Regardless of the auto repair setting, all node health issues are reported so that you can investigate as needed. If an issue type is listed without a description, you can read the description in its popover element.

When you refresh the page, any resolved issues will disappear from the list. If auto repair is enabled, you could temporarily see some health issues that will be resolved without action from you. Issues that are not supported by auto repair may require manual action from you depending on the type.

For node health issues to be reported, your cluster must use Amazon EKS Auto Mode or have the node monitoring agent add-on. For more information, see [Detect node health issues and enable automatic node repair](node-health.md).

## EKS Capabilities
<a name="observability-capabilities"></a>

The **Capabilities** section shows the status and health of your EKS Capability resources in the cluster. Health and status notifications for both capabilities and their managed Kubernetes resources in your cluster can be monitored here. When you refresh the page, any resolved issues will disappear from the list.

For more information, see [Working with capability resources](working-with-capabilities.md).

# Monitor Kubernetes workload traffic with Container Network Observability
<a name="network-observability"></a>

Amazon EKS provides enhanced network observability features that provide deeper insights into your container networking environment. These capabilities help you better understand, monitor, and troubleshoot your Kubernetes network landscape in AWS. With enhanced container network observability, you can leverage granular, network-related metrics for better proactive anomaly detection across cluster traffic, cross-AZ flows, and AWS services. Using these metrics, you can measure system performance and visualize the underlying metrics using your preferred observability stack.

In addition, Amazon EKS now provides network monitoring visualizations in the AWS console that accelerate and enhance precise troubleshooting for faster root cause analysis. You can also leverage these visual capabilities to pinpoint top-talkers and network flows causing retransmissions and retransmission timeouts, eliminating blind spots during incidents.

These capabilities are enabled by [Amazon CloudWatch Network Flow Monitor](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-NetworkFlowMonitor.html).

## Use cases
<a name="_use_cases"></a>

### Measure network performance to detect anomalies
<a name="_measure_network_performance_to_detect_anomalies"></a>

Several teams standardize on an observability stack that allows them to measure their system’s performance, visualize system metrics and be alarmed in the event that a specific threshold is breached. Container network observability in EKS aligns with this by exposing key system metrics that you can scrape to broaden observability of your system’s network performance at the pod and worker node level.

### Leverage console visualizations for more precise troubleshooting
<a name="_leverage_console_visualizations_for_more_precise_troubleshooting"></a>

In the event of an alarm from your monitoring system, you may want to hone in on the cluster and workload where an issue originated from. To support this, you can leverage visualizations in the EKS console that narrow the scope of investigation at a cluster level, and accelerate the disclosure of the network flows responsible for the most retransmissions, retransmission timeouts, and the volume of data transferred.

### Track top-talkers in your Amazon EKS environment
<a name="_track_top_talkers_in_your_amazon_eks_environment"></a>

A lot of teams run EKS as the foundation for their platforms, making it the focal point for an application environment’s network activity. Using the network monitoring capabilities in this feature, you can track which workloads are responsible for the most traffic (measured by data volume) within the cluster, across AZs, as well as traffic to external destinations within AWS (DynamoDB and S3) and beyond the AWS cloud (the internet or on-prem). Additionally, you can monitor the performance of each of these flows based on retransmissions, retransmission timeouts, and data transferred.

## Features
<a name="_features"></a>

1. Performance metrics - This feature allows you to scrape network-related system metrics for pods and worker nodes directly from the Network Flow Monitor (NFM) Agent running in your EKS cluster.

1. Service map - This feature dynamically visualizes intercommunication between workloads in the cluster, allowing you to quickly disclose key metrics (retransmissions - RT, retransmission timeouts - RTO, and data transferred - DT) associated with network flows between communicating pods.

1. Flow table - With this table, you can monitor the top talkers across the Kubernetes workloads in your cluster from three different angles: AWS service view, cluster view, and external view. For each view, you can see the retransmissions, retransmission timeouts, and data transferred between the source pod and its destination.
   +  AWS service view: Shows top talkers to AWS services (DynamoDB and S3)
   + Cluster view: Shows top talkers within the cluster (east ← to → west)
   + External view: Shows top talkers to cluster-external destinations outside AWS 

## Get started
<a name="_get_started"></a>

To get started, enable Container Network Observability in the EKS console for a new or existing cluster. This will automate the creation of Network Flow Monitor (NFM) dependencies ([Scope](https://docs.aws.amazon.com/networkflowmonitor/2.0/APIReference/API_CreateScope.html) and [Monitor](https://docs.aws.amazon.com/networkflowmonitor/2.0/APIReference/API_CreateMonitor.html) resources). In addition, you will have to install the [Network Flow Monitor Agent add-on](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-NetworkFlowMonitor-agents-kubernetes-eks.html). Alternatively, you can install these dependencies using the ` AWS CLI`, [EKS APIs](https://docs.aws.amazon.com/eks/latest/APIReference/API_Operations_Amazon_Elastic_Kubernetes_Service.html) (for the add-on), [NFM APIs](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-NetworkFlowMonitor-API-operations.html) or Infrastructure as Code (like [Terraform](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/networkflowmonitor_monitor)). Once these dependencies are in place, you can configure your preferred monitoring tool to scrape network performance metrics for pods and worker nodes from the NFM agent. To visualize the network activity and performance of your workloads, you can navigate to the EKS console under the “Network” tab of the cluster’s observability dashboard.

When using Network Flow Monitor in EKS, you can maintain your existing observability workflow and technology stack while leveraging a set of additional features which further enable you to understand and optimize the network layer of your EKS environment. You can learn more about the [Network Flow Monitor pricing here](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-NetworkFlowMonitor.pricing.html).

## Prerequisites and important notes
<a name="_prerequisites_and_important_notes"></a>

1. As mentioned above, if you enable Container Network Observability from the EKS console, the underlying NFM resource dependencies (Scope and Monitor) will be automatically created on your behalf, and you will be guided through the installation process of the EKS add-on for NFM.

1. If you want to enable this feature using Infrastructure as Code (IaC) like Terraform, you will have to define the following dependencies in your IaC: NFM Scope, NFM Monitor, EKS add-on for NFM. In addition, you’ll have to grant the [relevant permissions](https://docs.aws.amazon.com/aws-managed-policy/latest/reference/CloudWatchNetworkFlowMonitorAgentPublishPolicy.html) to the EKS add-on using [Pod Identity](https://docs.aws.amazon.com/eks/latest/userguide/pod-id-agent-setup.html) or [IAM roles for service accounts (IRSA)](https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html).

1. You must be running a minimum version of 1.1.0 for the NFM agent’s EKS add-on.

1. You have to use v6.21.0 or higher of the [Terraform AWS Provider](https://github.com/hashicorp/terraform-provider-aws) for support of Network Flow Monitor resources.

### Required IAM permissions
<a name="_required_iam_permissions"></a>

#### EKS add-on for NFM agent
<a name="_eks_add_on_for_nfm_agent"></a>

You can use the `CloudWatchNetworkFlowMonitorAgentPublishPolicy` [AWS managed policy](https://docs.aws.amazon.com/aws-managed-policy/latest/reference/CloudWatchNetworkFlowMonitorAgentPublishPolicy.html) with Pod Identity. This policy contains permissions for the NFM agent to send telemetry reports (metrics) to a Network Flow Monitor endpoint.

```
{
  "Version" : "2012-10-17",
  "Statement" : [
    {
      "Effect" : "Allow",
      "Action" : [
        "networkflowmonitor:Publish"
      ],
      "Resource" : "*"
    }
  ]
}
```

#### Container Network Observability in the EKS console
<a name="_container_network_observability_in_the_eks_console"></a>

The following permissions are required to enable the feature and visualize the service map and flow table in the console.

```
{
  "Version" : "2012-10-17",
  "Statement" : [
    {
      "Effect": "Allow",
      "Action": [
        "networkflowmonitor:ListScopes",
        "networkflowmonitor:ListMonitors",
        "networkflowmonitor:GetScope",
        "networkflowmonitor:GetMonitor",
        "networkflowmonitor:CreateScope",
        "networkflowmonitor:CreateMonitor",
        "networkflowmonitor:TagResource",
        "networkflowmonitor:StartQueryMonitorTopContributors",
        "networkflowmonitor:StopQueryMonitorTopContributors",
        "networkflowmonitor:GetQueryStatusMonitorTopContributors",
        "networkflowmonitor:GetQueryResultsMonitorTopContributors"
      ],
      "Resource": "*"
    }
  ]
}
```

## Using AWS CLI, EKS API and NFM API
<a name="using_shared_aws_cli_eks_api_and_nfm_api"></a>

```
#!/bin/bash

# Script to create required Network Flow Monitor resources
set -e

CLUSTER_NAME="my-eks-cluster"
CLUSTER_ARN="arn:aws:eks:{Region}:{Account}:cluster/{ClusterName}"
REGION="us-west-2"
AGENT_NAMESPACE="amazon-network-flow-monitor"

echo "Creating Network Flow Monitor resources..."

# Check if Network Flow Monitor agent is running in the cluster
echo "Checking for Network Flow Monitor agent in cluster..."
if kubectl get pods -n "$AGENT_NAMESPACE" --no-headers 2>/dev/null | grep -q "Running"; then
    echo "Network Flow Monitor agent exists and is running in the cluster"
else
    echo "Network Flow Monitor agent not found. Installing as EKS addon..."
    aws eks create-addon \
        --cluster-name "$CLUSTER_NAME" \
        --addon-name "$AGENT_NAMESPACE" \
        --region "$REGION"
    echo "Network Flow Monitor addon installation initiated"
fi

# Get Account ID
ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)

echo "Cluster ARN: $CLUSTER_ARN"
echo "Account ID: $ACCOUNT_ID"

# Check for existing scope
echo "Checking for existing Network Flow Monitor Scope..."
EXISTING_SCOPE=$(aws networkflowmonitor list-scopes --region $REGION --query 'scopes[0].scopeArn' --output text 2>/dev/null || echo "None")

if [ "$EXISTING_SCOPE" != "None" ] && [ "$EXISTING_SCOPE" != "null" ]; then
    echo "Using existing scope: $EXISTING_SCOPE"
    SCOPE_ARN=$EXISTING_SCOPE
else
    echo "Creating new Network Flow Monitor Scope..."
    SCOPE_RESPONSE=$(aws networkflowmonitor create-scope \
        --targets "[{\"targetIdentifier\":{\"targetId\":{\"accountId\":\"${ACCOUNT_ID}\"},\"targetType\":\"ACCOUNT\"},\"region\":\"${REGION}\"}]" \
        --region $REGION \
        --output json)

    SCOPE_ARN=$(echo $SCOPE_RESPONSE | jq -r '.scopeArn')
    echo "Scope created: $SCOPE_ARN"
fi

# Create Network Flow Monitor with EKS Cluster as local resource
echo "Creating Network Flow Monitor..."
MONITOR_RESPONSE=$(aws networkflowmonitor create-monitor \
    --monitor-name "${CLUSTER_NAME}-monitor" \
    --local-resources "type=AWS::EKS::Cluster,identifier=${CLUSTER_ARN}" \
    --scope-arn "$SCOPE_ARN" \
    --region $REGION \
    --output json)

MONITOR_ARN=$(echo $MONITOR_RESPONSE | jq -r '.monitorArn')

echo "Monitor created: $MONITOR_ARN"

echo "Network Flow Monitor setup complete!"
echo "Monitor ARN: $MONITOR_ARN"
echo "Scope ARN: $SCOPE_ARN"
echo "Local Resource: AWS::EKS::Cluster (${CLUSTER_ARN})"
```

## Using Infrastructure as Code (IaC)
<a name="_using_infrastructure_as_code_iac"></a>

### Terraform
<a name="_terraform"></a>

If you are using Terraform to manage your AWS cloud infrastructure, you can include the following resource configurations to enable Container Network Observability for your cluster.

#### NFM Scope
<a name="_nfm_scope"></a>

```
data "aws_caller_identity" "current" {}

resource "aws_networkflowmonitor_scope" "example" {
  target {
    region = "us-east-1"
    target_identifier {
      target_type = "ACCOUNT"
      target_id {
        account_id = data.aws_caller_identity.current.account_id
      }
    }
  }

  tags = {
    Name = "example"
  }
}
```

#### NFM Monitor
<a name="_nfm_monitor"></a>

```
resource "aws_networkflowmonitor_monitor" "example" {
  monitor_name = "eks-cluster-name-monitor"
  scope_arn    = aws_networkflowmonitor_scope.example.scope_arn

  local_resource {
    type       = "AWS::EKS::Cluster"
    identifier = aws_eks_cluster.example.arn
  }

  remote_resource {
    type       = "AWS::Region"
    identifier = "us-east-1" # this must be the same region that the cluster is in
  }

  tags = {
    Name = "example"
  }
}
```

#### EKS add-on for NFM
<a name="_eks_add_on_for_nfm"></a>

```
resource "aws_eks_addon" "example" {
  cluster_name                = aws_eks_cluster.example.name
  addon_name                  = "aws-network-flow-monitoring-agent"
}
```

## How does it work?
<a name="_how_does_it_work"></a>

### Performance metrics
<a name="_performance_metrics"></a>

#### System metrics
<a name="_system_metrics"></a>

If you are running third party (3P) tooling to monitor your EKS environment (such as Prometheus and Grafana), you can scrape the supported system metrics directly from the Network Flow Monitor agent. These metrics can be sent to your monitoring stack to expand measurement of your system’s network performance at the pod and worker node level. The available metrics are listed in the table, under Supported system metrics.

![\[Illustration of scraping system metrics\]](http://docs.aws.amazon.com/eks/latest/userguide/images/nfm-eks-metrics-workflow.png)


To enable these metrics, override the following environment variables using the configuration variables during the installation process (see: https://aws.amazon.com/blogs/containers/amazon-eks-add-ons-advanced-configuration/):

```
OPEN_METRICS:
    Enable or disable open metrics. Disabled if not supplied
    Type: String
    Values: [“on”, “off”]
OPEN_METRICS_ADDRESS:
    Listening IP address for open metrics endpoint. Defaults to 127.0.0.1 if not supplied
    Type: String
OPEN_METRICS_PORT:
    Listening port for open metrics endpoint. Defaults to 80 if not supplied
    Type: Integer
    Range: [0..65535]
```

#### Flow level metrics
<a name="_flow_level_metrics"></a>

In addition, Network Flow Monitor captures network flow data along with flow level metrics: retransmissions, retransmission timeouts, and data transferred. This data is processed by Network Flow Monitor and visualized in the EKS console to surface traffic in your cluster’s environment, and how it’s performing based on these flow level metrics.

The diagram below depicts a workflow in which both types of metrics (system and flow level) can be leveraged to gain more operational intelligence.

![\[Illustration of workflow with different performance metrics\]](http://docs.aws.amazon.com/eks/latest/userguide/images/nfm-eks-metrics-types-workflow.png)


1. The platform team can collect and visualize system metrics in their monitoring stack. With alerting in place, they can detect network anomalies or issues impacting pods or worker nodes using the system metrics from the NFM agent.

1. As a next step, platform teams can leverage the native visualizations in the EKS console to further narrow the scope of investigation and accelerate troubleshooting based on flow representations and their associated metrics.

Important note: The scraping of system metrics from the NFM agent and the process of the NFM agent pushing flow-level metrics to the NFM backend are independent processes.

##### Supported system metrics
<a name="_supported_system_metrics"></a>

Important note: system metrics are exported in [OpenMetrics](https://openmetrics.io/) format.


| Metric name | Type | Dimensions | Description | 
| --- | --- | --- | --- | 
|  ingress\$1flow  |  Gauge  |  instance\$1id, iface, pod, namespace, node  |  Ingress TCP flow count (TcpPassiveOpens)  | 
|  egress\$1flow  |  Gauge  |  instance\$1id, iface, pod, namespace, node  |  Egress TCP flow count (TcpActiveOpens)  | 
|  ingress\$1packets  |  Gauge  |  instance\$1id, iface, pod, namespace, node  |  Ingress packet count (delta)  | 
|  egress\$1packets  |  Gauge  |  instance\$1id, iface, pod, namespace, node  |  Egress packet count (delta)  | 
|  ingress\$1bytes  |  Gauge  |  instance\$1id, iface, pod, namespace, node  |  Ingress bytes count (delta)  | 
|  egress\$1bytes  |  Gauge  |  instance\$1id, iface, pod, namespace, node  |  Egress bytes count (delta)  | 
|  bw\$1in\$1allowance\$1exceeded  |  Gauge  |  instance\$1id, eni, node  |  Packets queued/dropped due to inbound bandwidth limit  | 
|  bw\$1out\$1allowance\$1exceeded  |  Gauge  |  instance\$1id, eni, node  |  Packets queued/dropped due to outbound bandwidth limit  | 
|  pps\$1allowance\$1exceeded  |  Gauge  |  instance\$1id, eni, node  |  Packets queued/dropped due to bidirectional PPS limit  | 
|  conntrack\$1allowance\$1exceeded  |  Gauge  |  instance\$1id, eni, node  |  Packets dropped due to connection tracking limit  | 
|  linklocal\$1allowance\$1exceeded  |  Gauge  |  instance\$1id, eni, node  |  Packets dropped due to local proxy service PPS limit  | 

##### Supported flow level metrics
<a name="_supported_flow_level_metrics"></a>


| Metric name | Type | Description | 
| --- | --- | --- | 
|  TCP retransmissions  |  Counter  |  Number of times a sender resends a packet that was lost or corrupted during transmission.  | 
|  TCP retransmission timeouts  |  Counter  |  Number of times a sender initiated a waiting period to determine if a packet was lost in transit.  | 
|  Data (bytes) transferred  |  Counter  |  Volume of data transferred between a source and destination for a given flow.  | 

### Service map and flow table
<a name="_service_map_and_flow_table"></a>

![\[Illustration of how NFM works with EKS\]](http://docs.aws.amazon.com/eks/latest/userguide/images/nfm-eks-workflow.png)


1. When installed, the Network Flow Monitor agent runs as a DaemonSet on every worker node and collects the top 500 network flows (based on volume of data transferred) every 30 seconds.

1. These network flows are sorted into the following categories: Intra AZ, Inter AZ, EC2 → S3, EC2 → DynamoDB (DDB), and Unclassified. Each flow has 3 metrics associated with it: retransmissions, retransmission timeouts, and data transferred (in bytes).
   + Intra AZ - network flows between pods in the same AZ
   + Inter AZ - network flows between pods in different AZs
   + EC2 → S3 - network flows from pods to S3
   + EC2 → DDB - network flows from pods to DDB
   + Unclassified - network flows from pods to the Internet or on-prem

1. Network flows from the Network Flow Monitor Top Contributors API are used to power the following experiences in the EKS console:
   + Service map: Visualization of network flows within the cluster (Intra AZ and Inter AZ).
   + Flow table: Table presentation of network flows within the cluster (Intra AZ and Inter AZ), from pods to AWS services (EC2 → S3 and EC2 → DDB), and from pods to external destinations (Unclassified).

The network flows pulled from the Top Contributors API are scoped to a 1 hour time range, and can include up to 500 flows from each category. For the service map, this means up to 1000 flows can be sourced and presented from the Intra AZ and Inter AZ flow categories over a 1 hour time range. For the flow table, this means that up to 3000 network flows can be sourced and presented from all 6 network flow categories over a 2 hour time range.

#### Example: Service map
<a name="_example_service_map"></a>

 *Deployment view* 

![\[Illustration of service map with ecommerce app in deployment view\]](http://docs.aws.amazon.com/eks/latest/userguide/images/ecommerce-deployment.png)


 *Pod view* 

![\[Illustration of service map with ecommerce app in pod view\]](http://docs.aws.amazon.com/eks/latest/userguide/images/ecommerce-pod.png)


 *Deployment view* 

![\[Illustration of service map with photo-gallery app in deployment view\]](http://docs.aws.amazon.com/eks/latest/userguide/images/photo-gallery-deployment.png)


 *Pod view* 

![\[Illustration of service map with photo-gallery app in pod view\]](http://docs.aws.amazon.com/eks/latest/userguide/images/photo-gallery-pod.png)


#### Example: Flow table
<a name="_example_flow_table"></a>

 * AWS service view* 

![\[Illustration of flow table view\]](http://docs.aws.amazon.com/eks/latest/userguide/images/aws-service-view.png)


 *Cluster view* 

![\[Illustration of flow table in cluster view\]](http://docs.aws.amazon.com/eks/latest/userguide/images/cluster-view.png)


## Considerations and limitations
<a name="_considerations_and_limitations"></a>
+ Container Network Observability in EKS is only available in regions where [Network Flow Monitor is supported](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-NetworkFlowMonitor-Regions.html).
+ Supported system metrics are in OpenMetrics format, and can be directly scraped from the Network Flow Monitor (NFM) agent.
+ To enable Container Network Observability in EKS using Infrastructure as Code (IaC) like [Terraform](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/networkflowmonitor_monitor), you need to have these dependencies defined and created in your configurations: NFM scope, NFM monitor and the NFM agent.
+ Network Flow Monitor supports up to approximately 5 million flows per minute. This is approximately 5,000 EC2 instances (EKS worker nodes) with the Network Flow Monitor agent installed. Installing agents on more than 5000 instances may affect monitoring performance until additional capacity is available.
+ You must be running a minimum version of 1.1.0 for the NFM agent’s EKS add-on.
+ You have to use v6.21.0 or higher of the [Terraform AWS Provider](https://github.com/hashicorp/terraform-provider-aws) for support of Network Flow Monitor resources.
+ To enrich the network flows with pod metadata, your pods should be running in their own isolated network namespace, not the host network namespace.

# Monitor your cluster metrics with Prometheus
<a name="prometheus"></a>

 [Prometheus](https://prometheus.io/) is a monitoring and time series database that scrapes endpoints. It provides the ability to query, aggregate, and store collected data. You can also use it for alerting and alert aggregation. This topic explains how to set up Prometheus as either a managed or open source option. Monitoring Amazon EKS control plane metrics is a common use case.

Amazon Managed Service for Prometheus is a Prometheus-compatible monitoring and alerting service that makes it easy to monitor containerized applications and infrastructure at scale. It is a fully-managed service that automatically scales the ingestion, storage, querying, and alerting of your metrics. It also integrates with AWS security services to enable fast and secure access to your data. You can use the open-source PromQL query language to query your metrics and alert on them. Also, you can use alert manager in Amazon Managed Service for Prometheus to set up alerting rules for critical alerts. You can then send these critical alerts as notifications to an Amazon SNS topic.

There are several different options for using Prometheus with Amazon EKS:
+ You can turn on Prometheus metrics when first creating an Amazon EKS cluster or you can create your own Prometheus scraper for existing clusters. Both of these options are covered by this topic.
+ You can deploy Prometheus using Helm. For more information, see [Deploy Prometheus using Helm](deploy-prometheus.md).
+ You can view control plane raw metrics in Prometheus format. For more information, see [Fetch control plane raw metrics in Prometheus format](view-raw-metrics.md).

## Step 1: Turn on Prometheus metrics
<a name="turn-on-prometheus-metrics"></a>

**Important**  
Amazon Managed Service for Prometheus resources are outside of the cluster lifecycle and need to be maintained independent of the cluster. When you delete your cluster, make sure to also delete any applicable scrapers to stop applicable costs. For more information, see [Find and delete scrapers](https://docs.aws.amazon.com/prometheus/latest/userguide/AMP-collector-how-to.html#AMP-collector-list-delete) in the *Amazon Managed Service for Prometheus User Guide*.

Prometheus discovers and collects metrics from your cluster through a pull-based model called scraping. Scrapers are set up to gather data from your cluster infrastructure and containerized applications. When you turn on the option to send Prometheus metrics, Amazon Managed Service for Prometheus provides a fully managed agentless scraper.

If you haven’t created the cluster yet, you can turn on the option to send metrics to Prometheus when first creating the cluster. In the Amazon EKS console, this option is in the **Configure observability** step of creating a new cluster. For more information, see [Create an Amazon EKS cluster](create-cluster.md).

If you already have an existing cluster, you can create your own Prometheus scraper. To do this in the Amazon EKS console, navigate to your cluster’s **Observability** tab and choose the **Add scraper** button. If you would rather do so with the AWS API or AWS CLI, see [Create a scraper](https://docs.aws.amazon.com/prometheus/latest/userguide/AMP-collector-how-to.html#AMP-collector-create) in the *Amazon Managed Service for Prometheus User Guide*.

The following options are available when creating the scraper with the Amazon EKS console.

 **Scraper alias**   
(Optional) Enter a unique alias for the scraper.

 **Destination**   
Choose an Amazon Managed Service for Prometheus workspace. A workspace is a logical space dedicated to the storage and querying of Prometheus metrics. With this workspace, you will be able to view Prometheus metrics across the accounts that have access to it. The **Create new workspace** option tells Amazon EKS to create a workspace on your behalf using the **Workspace alias** you provide. With the **Select existing workspace** option, you can select an existing workspace from a dropdown list. For more information about workspaces, see [Managing workspaces](https://docs.aws.amazon.com/prometheus/latest/userguide/AMP-manage-ingest-query.html) in the *Amazon Managed Service for Prometheus User Guide*.

 **Service access**   
This section summarizes the permissions you grant when sending Prometheus metrics:  
+ Allow Amazon Managed Service for Prometheus to describe the scraped Amazon EKS cluster
+ Allow remote writing to the Amazon Managed Prometheus workspace
If the `AmazonManagedScraperRole` already exists, the scraper uses it. Choose the `AmazonManagedScraperRole` link to see the **Permission details**. If the `AmazonManagedScraperRole` doesn’t exist already, choose the **View permission details** link to see the specific permissions you are granting by sending Prometheus metrics.

 **Subnets**   
Modify the subnets that the scraper will inherit as needed. If you need to add a grayed out subnet option, go back to the create cluster **Specify networking** step.

 **Scraper configuration**   
Modify the scraper configuration in YAML format as needed. To do so, use the form or upload a replacement YAML file. For more information, see [Scraper configuration](https://docs.aws.amazon.com/prometheus/latest/userguide/AMP-collector-how-to.html#AMP-collector-configuration) in the *Amazon Managed Service for Prometheus User Guide*.

Amazon Managed Service for Prometheus refers to the agentless scraper that is created alongside the cluster as an AWS managed collector. For more information about AWS managed collectors, see [Ingest metrics with AWS managed collectors](https://docs.aws.amazon.com/prometheus/latest/userguide/AMP-collector.html) in the *Amazon Managed Service for Prometheus User Guide*.

**Important**  
If you create a Prometheus scraper using the AWS CLI or AWS API, you need to adjust its configuration to give the scraper in-cluster permissions. For more information, see [Configuring your Amazon EKS cluster](https://docs.aws.amazon.com/prometheus/latest/userguide/AMP-collector-how-to.html#AMP-collector-eks-setup) in the *Amazon Managed Service for Prometheus User Guide*.
If you have a Prometheus scraper created before November 11, 2024 that uses the `aws-auth` `ConfigMap` instead of access entries, you need to update it to access additional metrics from the Amazon EKS cluster control plane. For the updated configuration, see [Manually configuring Amazon EKS for scraper access](https://docs.aws.amazon.com/prometheus/latest/userguide/AMP-collector-how-to.html#AMP-collector-eks-manual-setup) in the *Amazon Managed Service for Prometheus User Guide*.

## Step 2: Use the Prometheus metrics
<a name="use-prometheus-metrics"></a>

For more information about how to use the Prometheus metrics after you turn them on for your cluster, see the [Amazon Managed Service for Prometheus User Guide](https://docs.aws.amazon.com/prometheus/latest/userguide/what-is-Amazon-Managed-Service-Prometheus.html).

## Step 3: Manage Prometheus scrapers
<a name="viewing-prometheus-scraper-details"></a>

To manage scrapers, choose the **Observability** tab in the Amazon EKS console. A table shows a list of scrapers for the cluster, including information such as the scraper ID, alias, status, and creation date. You can add more scrapers, edit scrapers, delete scrapers, or view more information about the current scrapers.

To see more details about a scraper, choose the scraper ID link. For example, you can view the ARN, environment, workspace ID, IAM role, configuration, and networking information. You can use the scraper ID as input to Amazon Managed Service for Prometheus API operations like [https://docs.aws.amazon.com/prometheus/latest/APIReference/API_DescribeScraper.html](https://docs.aws.amazon.com/prometheus/latest/APIReference/API_DescribeScraper.html), [https://docs.aws.amazon.com/prometheus/latest/APIReference/API_UpdateScraper.html](https://docs.aws.amazon.com/prometheus/latest/APIReference/API_UpdateScraper.html), and [https://docs.aws.amazon.com/prometheus/latest/APIReference/API_DeleteScraper.html](https://docs.aws.amazon.com/prometheus/latest/APIReference/API_DeleteScraper.html). For more information on using the Prometheus API, see the [Amazon Managed Service for Prometheus API Reference](https://docs.aws.amazon.com/prometheus/latest/userguide/AMP-APIReference.html).

# Deploy Prometheus using Helm
<a name="deploy-prometheus"></a>

As an alternative to using Amazon Managed Service for Prometheus, you can deploy Prometheus into your cluster with Helm. If you already have Helm installed, you can check your version with the `helm version` command. Helm is a package manager for Kubernetes clusters. For more information about Helm and how to install it, see [Deploy applications with Helm on Amazon EKS](helm.md).

After you configure Helm for your Amazon EKS cluster, you can use it to deploy Prometheus with the following steps.

1. Create a Prometheus namespace.

   ```
   kubectl create namespace prometheus
   ```

1. Add the `prometheus-community` chart repository.

   ```
   helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
   ```

1. Deploy Prometheus.

   ```
   helm upgrade -i prometheus prometheus-community/prometheus \
       --namespace prometheus \
       --set alertmanager.persistence.storageClass="gp2" \
       --set server.persistentVolume.storageClass="gp2"
   ```
**Note**  
If you get the error `Error: failed to download "stable/prometheus" (hint: running helm repo update may help)` when executing this command, run `helm repo update prometheus-community`, and then try running the Step 2 command again.

   If you get the error `Error: rendered manifests contain a resource that already exists`, run `helm uninstall your-release-name -n namespace `, then try running the Step 3 command again.

1. Verify that all of the Pods in the `prometheus` namespace are in the `READY` state.

   ```
   kubectl get pods -n prometheus
   ```

   An example output is as follows.

   ```
   NAME                                             READY   STATUS    RESTARTS   AGE
   prometheus-alertmanager-59b4c8c744-r7bgp         1/2     Running   0          48s
   prometheus-kube-state-metrics-7cfd87cf99-jkz2f   1/1     Running   0          48s
   prometheus-node-exporter-jcjqz                   1/1     Running   0          48s
   prometheus-node-exporter-jxv2h                   1/1     Running   0          48s
   prometheus-node-exporter-vbdks                   1/1     Running   0          48s
   prometheus-pushgateway-76c444b68c-82tnw          1/1     Running   0          48s
   prometheus-server-775957f748-mmht9               1/2     Running   0          48s
   ```

1. Use `kubectl` to port forward the Prometheus console to your local machine.

   ```
   kubectl --namespace=prometheus port-forward deploy/prometheus-server 9090
   ```

1. Point a web browser to `http://localhost:9090` to view the Prometheus console.

1. Choose a metric from the **- insert metric at cursor** menu, then choose **Execute**. Choose the **Graph** tab to show the metric over time. The following image shows `container_memory_usage_bytes` over time.  
![\[Prometheus metrics\]](http://docs.aws.amazon.com/eks/latest/userguide/images/prometheus-metric.png)

1. From the top navigation bar, choose **Status**, then **Targets**.  
![\[Prometheus console\]](http://docs.aws.amazon.com/eks/latest/userguide/images/prometheus.png)

   All of the Kubernetes endpoints that are connected to Prometheus using service discovery are displayed.

# Fetch control plane raw metrics in Prometheus format
<a name="view-raw-metrics"></a>

The Kubernetes control plane exposes a number of metrics that are represented in a [Prometheus format](https://github.com/prometheus/docs/blob/master/content/docs/instrumenting/exposition_formats.md). These metrics are useful for monitoring and analysis. They are exposed internally through metrics endpoints, and can be accessed without fully deploying Prometheus. However, deploying Prometheus more easily allows analyzing metrics over time.

To view the raw metrics output, replace `endpoint` and run the following command.

```
kubectl get --raw endpoint
```

This command allows you to pass any endpoint path and returns the raw response. The output lists different metrics line-by-line, with each line including a metric name, tags, and a value.

```
metric_name{tag="value"[,...]} value
```

## Fetch metrics from the API server
<a name="fetch-metrics"></a>

The general API server endpoint is exposed on the Amazon EKS control plane. This endpoint is primarily useful when looking at a specific metric.

```
kubectl get --raw /metrics
```

An example output is as follows.

```
[...]
# HELP rest_client_requests_total Number of HTTP requests, partitioned by status code, method, and host.
# TYPE rest_client_requests_total counter
rest_client_requests_total{code="200",host="127.0.0.1:21362",method="POST"} 4994
rest_client_requests_total{code="200",host="127.0.0.1:443",method="DELETE"} 1
rest_client_requests_total{code="200",host="127.0.0.1:443",method="GET"} 1.326086e+06
rest_client_requests_total{code="200",host="127.0.0.1:443",method="PUT"} 862173
rest_client_requests_total{code="404",host="127.0.0.1:443",method="GET"} 2
rest_client_requests_total{code="409",host="127.0.0.1:443",method="POST"} 3
rest_client_requests_total{code="409",host="127.0.0.1:443",method="PUT"} 8
# HELP ssh_tunnel_open_count Counter of ssh tunnel total open attempts
# TYPE ssh_tunnel_open_count counter
ssh_tunnel_open_count 0
# HELP ssh_tunnel_open_fail_count Counter of ssh tunnel failed open attempts
# TYPE ssh_tunnel_open_fail_count counter
ssh_tunnel_open_fail_count 0
```

This raw output returns verbatim what the API server exposes.

## Fetch control plane metrics with `metrics.eks.amazonaws.com`
<a name="fetch-metrics-prometheus"></a>

For clusters that are Kubernetes version `1.28` and above, Amazon EKS also exposes metrics under the API group `metrics.eks.amazonaws.com`. These metrics include control plane components such as `kube-scheduler` and `kube-controller-manager`.

**Note**  
If you have a webhook configuration that could block the creation of the new `APIService` resource `v1.metrics.eks.amazonaws.com` on your cluster, the metrics endpoint feature might not be available. You can verify that in the `kube-apiserver` audit log by searching for the `v1.metrics.eks.amazonaws.com` keyword.

### Fetch `kube-scheduler` metrics
<a name="fetch-metrics-scheduler"></a>

To retrieve `kube-scheduler` metrics, use the following command.

```
kubectl get --raw "/apis/metrics.eks.amazonaws.com/v1/ksh/container/metrics"
```

An example output is as follows.

```
# TYPE scheduler_pending_pods gauge
scheduler_pending_pods{queue="active"} 0
scheduler_pending_pods{queue="backoff"} 0
scheduler_pending_pods{queue="gated"} 0
scheduler_pending_pods{queue="unschedulable"} 18
# HELP scheduler_pod_scheduling_attempts [STABLE] Number of attempts to successfully schedule a pod.
# TYPE scheduler_pod_scheduling_attempts histogram
scheduler_pod_scheduling_attempts_bucket{le="1"} 79
scheduler_pod_scheduling_attempts_bucket{le="2"} 79
scheduler_pod_scheduling_attempts_bucket{le="4"} 79
scheduler_pod_scheduling_attempts_bucket{le="8"} 79
scheduler_pod_scheduling_attempts_bucket{le="16"} 79
scheduler_pod_scheduling_attempts_bucket{le="+Inf"} 81
[...]
```

### Fetch `kube-controller-manager` metrics
<a name="fetch-metrics-controller"></a>

To retrieve `kube-controller-manager` metrics, use the following command.

```
kubectl get --raw "/apis/metrics.eks.amazonaws.com/v1/kcm/container/metrics"
```

An example output is as follows.

```
[...]
workqueue_work_duration_seconds_sum{name="pvprotection"} 0
workqueue_work_duration_seconds_count{name="pvprotection"} 0
workqueue_work_duration_seconds_bucket{name="replicaset",le="1e-08"} 0
workqueue_work_duration_seconds_bucket{name="replicaset",le="1e-07"} 0
workqueue_work_duration_seconds_bucket{name="replicaset",le="1e-06"} 0
workqueue_work_duration_seconds_bucket{name="replicaset",le="9.999999999999999e-06"} 0
workqueue_work_duration_seconds_bucket{name="replicaset",le="9.999999999999999e-05"} 19
workqueue_work_duration_seconds_bucket{name="replicaset",le="0.001"} 109
workqueue_work_duration_seconds_bucket{name="replicaset",le="0.01"} 139
workqueue_work_duration_seconds_bucket{name="replicaset",le="0.1"} 181
workqueue_work_duration_seconds_bucket{name="replicaset",le="1"} 191
workqueue_work_duration_seconds_bucket{name="replicaset",le="10"} 191
workqueue_work_duration_seconds_bucket{name="replicaset",le="+Inf"} 191
workqueue_work_duration_seconds_sum{name="replicaset"} 4.265655885000002
[...]
```

### Understand the scheduler and controller manager metrics
<a name="scheduler-controller-metrics"></a>

The following table describes the scheduler and controller manager metrics that are made available for Prometheus style scraping. For more information about these metrics, see [Kubernetes Metrics Reference](https://kubernetes.io/docs/reference/instrumentation/metrics/) in the Kubernetes documentation.


| Metric | Control plane component | Description | 
| --- | --- | --- | 
|  scheduler\$1pending\$1pods  |  scheduler  |  The number of Pods that are waiting to be scheduled onto a node for execution.  | 
|  scheduler\$1schedule\$1attempts\$1total  |  scheduler  |  The number of attempts made to schedule Pods.  | 
|  scheduler\$1preemption\$1attempts\$1total  |  scheduler  |  The number of attempts made by the scheduler to schedule higher priority Pods by evicting lower priority ones.  | 
|  scheduler\$1preemption\$1victims  |  scheduler  |  The number of Pods that have been selected for eviction to make room for higher priority Pods.  | 
|  scheduler\$1pod\$1scheduling\$1attempts  |  scheduler  |  The number of attempts to successfully schedule a Pod.  | 
|  scheduler\$1scheduling\$1attempt\$1duration\$1seconds  |  scheduler  |  Indicates how quickly or slowly the scheduler is able to find a suitable place for a Pod to run based on various factors like resource availability and scheduling rules.  | 
|  scheduler\$1pod\$1scheduling\$1sli\$1duration\$1seconds  |  scheduler  |  The end-to-end latency for a Pod being scheduled, from the time the Pod enters the scheduling queue. This might involve multiple scheduling attempts.  | 
|  cronjob\$1controller\$1job\$1creation\$1skew\$1duration\$1seconds  |  controller manager  |  The time between when a cronjob is scheduled to be run, and when the corresponding job is created.  | 
|  workqueue\$1depth  |  controller manager  |  The current depth of queue.  | 
|  workqueue\$1adds\$1total  |  controller manager  |  The total number of adds handled by workqueue.  | 
|  workqueue\$1queue\$1duration\$1seconds  |  controller manager  |  The time in seconds an item stays in workqueue before being requested.  | 
|  workqueue\$1work\$1duration\$1seconds  |  controller manager  |  The time in seconds processing an item from workqueue takes.  | 

## Deploy a Prometheus scraper to consistently scrape metrics
<a name="deploy-prometheus-scraper"></a>

To deploy a Prometheus scraper to consistently scrape the metrics, use the following configuration:

```
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-conf
data:
  prometheus.yml: |-
    global:
      scrape_interval: 30s
    scrape_configs:
    # apiserver metrics
    - job_name: apiserver-metrics
      kubernetes_sd_configs:
      - role: endpoints
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - source_labels:
          [
            __meta_kubernetes_namespace,
            __meta_kubernetes_service_name,
            __meta_kubernetes_endpoint_port_name,
          ]
        action: keep
        regex: default;kubernetes;https
    # Scheduler metrics
    - job_name: 'ksh-metrics'
      kubernetes_sd_configs:
      - role: endpoints
      metrics_path: /apis/metrics.eks.amazonaws.com/v1/ksh/container/metrics
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - source_labels:
          [
            __meta_kubernetes_namespace,
            __meta_kubernetes_service_name,
            __meta_kubernetes_endpoint_port_name,
          ]
        action: keep
        regex: default;kubernetes;https
    # Controller Manager metrics
    - job_name: 'kcm-metrics'
      kubernetes_sd_configs:
      - role: endpoints
      metrics_path: /apis/metrics.eks.amazonaws.com/v1/kcm/container/metrics
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - source_labels:
          [
            __meta_kubernetes_namespace,
            __meta_kubernetes_service_name,
            __meta_kubernetes_endpoint_port_name,
          ]
        action: keep
        regex: default;kubernetes;https
---
apiVersion: v1
kind: Pod
metadata:
  name: prom-pod
spec:
  containers:
  - name: prom-container
    image: prom/prometheus
    ports:
    - containerPort: 9090
    volumeMounts:
    - name: config-volume
      mountPath: /etc/prometheus/
  volumes:
  - name: config-volume
    configMap:
      name: prometheus-conf
```

The permission that follows is required for the Pod to access the new metrics endpoint.

```
{
  "effect": "allow",
  "apiGroups": [
    "metrics.eks.amazonaws.com"
  ],
  "resources": [
    "kcm/metrics",
    "ksh/metrics"
  ],
  "verbs": [
    "get"
  ] },
```

To patch the role being used, you can use the following command.

```
kubectl patch clusterrole <role-name> --type=json -p='[
  {
    "op": "add",
    "path": "/rules/-",
    "value": {
      "verbs": ["get"],
      "apiGroups": ["metrics.eks.amazonaws.com"],
      "resources": ["kcm/metrics", "ksh/metrics"]
    }
  }
]'
```

Then you can view the Prometheus dashboard by proxying the port of the Prometheus scraper to your local port.

```
kubectl port-forward pods/prom-pod 9090:9090
```

For your Amazon EKS cluster, the core Kubernetes control plane metrics are also ingested into Amazon CloudWatch Metrics under the `AWS/EKS` namespace. To view them, open the [CloudWatch console](https://console.aws.amazon.com/cloudwatch/home#logs:prefix=/aws/eks) and select **All metrics** from the left navigation pane. On the **Metrics** selection page, choose the `AWS/EKS` namespace and a metrics dimension for your cluster.

# Monitor cluster data with Amazon CloudWatch
<a name="cloudwatch"></a>

Amazon CloudWatch is a monitoring service that collects metrics and logs from your cloud resources. CloudWatch provides some basic Amazon EKS metrics for free when using a new cluster that is version `1.28` and above. However, when using the CloudWatch Observability Operator as an Amazon EKS add-on, you can gain enhanced observability features.

## Basic metrics in Amazon CloudWatch
<a name="cloudwatch-basic-metrics"></a>

For clusters that are Kubernetes version `1.28` and above, you get CloudWatch vended metrics for free in the `AWS/EKS` namespace. The following table gives a list of the basic metrics that are available for the supported versions. Every metric listed has a frequency of one minute.


| Metric name | Description | 
| --- | --- | 
|   `apiserver_flowcontrol_current_executing_seats`   |  The number of seats currently in use for executing API requests. Seat allocation is determined by the priority\$1level and flow\$1schema configuration in the Kubernetes API Priority and Fairness [feature](https://kubernetes.io/docs/concepts/cluster-administration/flow-control/).  **Units:** Count  **Valid statistics:** Sum  | 
|   `scheduler_schedule_attempts_total`   |  The number of total attempts by the scheduler to schedule Pods in the cluster for a given period. This metric helps monitor the scheduler’s workload and can indicate scheduling pressure or potential issues with Pod placement.  **Units:** Count  **Valid statistics:** Sum  | 
|   `scheduler_schedule_attempts_SCHEDULED`   |  The number of successful attempts by the scheduler to schedule Pods to nodes in the cluster for a given period.  **Units:** Count  **Valid statistics:** Sum  | 
|   `scheduler_schedule_attempts_UNSCHEDULABLE`   |  The number of attempts to schedule Pods that were unschedulable for a given period due to valid constraints, such as insufficient CPU or memory on a node.  **Units:** Count  **Valid statistics:** Sum  | 
|   `scheduler_schedule_attempts_ERROR`   |  The number of attempts to schedule Pods that failed for a given period due to an internal problem with the scheduler itself, such as API Server connectivity issues.  **Units:** Count  **Valid statistics:** Sum  | 
|   `scheduler_pending_pods`   |  The number of total pending Pods to be scheduled by the scheduler in the cluster for a given period.  **Units:** Count  **Valid statistics:** Sum  | 
|   `scheduler_pending_pods_ACTIVEQ`   |  The number of pending Pods in activeQ, that are waiting to be scheduled in the cluster for a given period.  **Units:** Count  **Valid statistics:** Sum  | 
|   `scheduler_pending_pods_UNSCHEDULABLE`   |  The number of pending Pods that the scheduler attempted to schedule and failed, and are kept in an unschedulable state for retry.  **Units:** Count  **Valid statistics:** Sum  | 
|   `scheduler_pending_pods_BACKOFF`   |  The number of pending Pods in `backoffQ` in a backoff state that are waiting for their backoff period to expire.  **Units:** Count  **Valid statistics:** Sum  | 
|   `scheduler_pending_pods_GATED`   |  The number of pending Pods that are currently waiting in a gated state as they cannot be scheduled until they meet required conditions.  **Units:** Count  **Valid statistics:** Sum  | 
|   `apiserver_request_total`   |  The number of HTTP requests made across all the API servers in the cluster.  **Units:** Count  **Valid statistics:** Sum  | 
|   `apiserver_request_total_4XX`   |  The number of HTTP requests made to all the API servers in the cluster that resulted in `4XX` (client error) status codes.  **Units:** Count  **Valid statistics:** Sum  | 
|   `apiserver_request_total_429`   |  The number of HTTP requests made to all the API servers in the cluster that resulted in `429` status code, which occurs when clients exceed the rate limiting thresholds.  **Units:** Count  **Valid statistics:** Sum  | 
|   `apiserver_request_total_5XX`   |  The number of HTTP requests made to all the API servers in the cluster that resulted in `5XX` (server error) status codes.  **Units:** Count  **Valid statistics:** Sum  | 
|   `apiserver_request_total_LIST_PODS`   |  The number of `LIST` Pods requests made to all the API servers in the cluster.  **Units:** Count  **Valid statistics:** Sum  | 
|   `apiserver_request_duration_seconds_PUT_P99`   |  The 99th percentile of latency for `PUT` requests calculated from all requests across all API servers in the cluster. Represents the response time below which 99% of all `PUT` requests are completed.  **Units:** Seconds  **Valid statistics:** Average  | 
|   `apiserver_request_duration_seconds_PATCH_P99`   |  The 99th percentile of latency for `PATCH` requests calculated from all requests across all API servers in the cluster. Represents the response time below which 99% of all `PATCH` requests are completed.  **Units:** Seconds  **Valid statistics:** Average  | 
|   `apiserver_request_duration_seconds_POST_P99`   |  The 99th percentile of latency for `POST` requests calculated from all requests across all API servers in the cluster. Represents the response time below which 99% of all `POST` requests are completed.  **Units:** Seconds  **Valid statistics:** Average  | 
|   `apiserver_request_duration_seconds_GET_P99`   |  The 99th percentile of latency for `GET` requests calculated from all requests across all API servers in the cluster. Represents the response time below which 99% of all `GET` requests are completed.  **Units:** Seconds  **Valid statistics:** Average  | 
|   `apiserver_request_duration_seconds_LIST_P99`   |  The 99th percentile of latency for `LIST` requests calculated from all requests across all API servers in the cluster. Represents the response time below which 99% of all `LIST` requests are completed.  **Units:** Seconds  **Valid statistics:** Average  | 
|   `apiserver_request_duration_seconds_DELETE_P99`   |  The 99th percentile of latency for `DELETE` requests calculated from all requests across all API servers in the cluster. Represents the response time below which 99% of all `DELETE` requests are completed.  **Units:** Seconds  **Valid statistics:** Average  | 
|   `apiserver_current_inflight_requests_MUTATING`   |  The number of mutating requests (`POST`, `PUT`, `DELETE`, `PATCH`) currently being processed across all API servers in the cluster. This metric represents requests that are in-flight and haven’t completed processing yet.  **Units:** Count  **Valid statistics:** Sum  | 
|   `apiserver_current_inflight_requests_READONLY`   |  The number of read-only requests (`GET`, `LIST`) currently being processed across all API servers in the cluster. This metric represents requests that are in-flight and haven’t completed processing yet.  **Units:** Count  **Valid statistics:** Sum  | 
|   `apiserver_admission_webhook_request_total`   |  The number of admission webhook requests made across all API servers in the cluster.  **Units:** Count  **Valid statistics:** Sum  | 
|   `apiserver_admission_webhook_request_total_ADMIT`   |  The number of mutating admission webhook requests made across all API servers in the cluster.  **Units:** Count  **Valid statistics:** Sum  | 
|   `apiserver_admission_webhook_request_total_VALIDATING`   |  The number of validating admission webhook requests made across all API servers in the cluster.  **Units:** Count  **Valid statistics:** Sum  | 
|   `apiserver_admission_webhook_rejection_count`   |  The number of admission webhook requests made across all API servers in the cluster that were rejected.  **Units:** Count  **Valid statistics:** Sum  | 
|   `apiserver_admission_webhook_rejection_count_ADMIT`   |  The number of mutating admission webhook requests made across all API servers in the cluster that were rejected.  **Units:** Count  **Valid statistics:** Sum  | 
|   `apiserver_admission_webhook_rejection_count_VALIDATING`   |  The number of validating admission webhook requests made across all API servers in the cluster that were rejected.  **Units:** Count  **Valid statistics:** Sum  | 
|   `apiserver_admission_webhook_admission_duration_seconds`   |  The 99th percentile of latency for third-party admission webhook requests calculated from all requests across all API servers in the cluster. Represents the response time below which 99% of all third-party admission webhook requests are completed.  **Units:** Seconds  **Valid statistics:** Average  | 
|   `apiserver_admission_webhook_admission_duration_seconds_ADMIT_P99`   |  The 99th percentile of latency for third-party mutating admission webhook requests calculated from all requests across all API servers in the cluster. Represents the response time below which 99% of all third-party mutating admission webhook requests are completed.  **Units:** Seconds  **Valid statistics:** Average  | 
|   `apiserver_admission_webhook_admission_duration_seconds_VALIDATING_P99`   |  The 99th percentile of latency for third-party validating admission webhook requests calculated from all requests across all API servers in the cluster. Represents the response time below which 99% of all third-party validating admission webhook requests are completed.  **Units:** Seconds  **Valid statistics:** Average  | 
|   `apiserver_storage_size_bytes`   |  The physical size in bytes of the etcd storage database file used by the API servers in the cluster. This metric represents the actual disk space allocated for the storage.  **Units:** Bytes  **Valid statistics:** Maximum  | 

## Amazon CloudWatch Observability Operator
<a name="cloudwatch-operator"></a>

Amazon CloudWatch Observability collects real-time logs, metrics, and trace data. It sends them to [Amazon CloudWatch](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/WhatIsCloudWatch.html) and [AWS X-Ray](https://docs.aws.amazon.com/xray/latest/devguide/aws-xray.html). You can install this add-on to enable both CloudWatch Application Signals and CloudWatch Container Insights with enhanced observability for Amazon EKS. This helps you monitor the health and performance of your infrastructure and containerized applications. The Amazon CloudWatch Observability Operator is designed to install and configure the necessary components.

Amazon EKS supports the CloudWatch Observability Operator as an [Amazon EKS add-on](eks-add-ons.md). The add-on allows Container Insights on both Linux and Windows worker nodes in the cluster. To enable Container Insights on Windows, the Amazon EKS add-on version must be `1.5.0` or higher. Currently, CloudWatch Application Signals isn’t supported on Amazon EKS Windows.

The topics below describe how to get started using CloudWatch Observability Operator for your Amazon EKS cluster.
+ For instructions on installing this add-on, see [Install the CloudWatch agent with the Amazon CloudWatch Observability EKS add-on or the Helm chart](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/install-CloudWatch-Observability-EKS-addon.html) in the *Amazon CloudWatch User Guide*.
+ For more information about CloudWatch Application Signals, see [Application Signals](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Application-Monitoring-Sections.html) in the *Amazon CloudWatch User Guide*.
+ For more information about Container Insights, see [Using Container Insights](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/ContainerInsights.html) in the *Amazon CloudWatch User Guide*.

# Send control plane logs to CloudWatch Logs
<a name="control-plane-logs"></a>

Amazon EKS control plane logging provides audit and diagnostic logs directly from the Amazon EKS control plane to CloudWatch Logs in your account. These logs make it easy for you to secure and run your clusters. You can select the exact log types you need, and logs are sent as log streams to a group for each Amazon EKS cluster in CloudWatch. You can use CloudWatch subscription filters to do real time analysis on the logs or to forward them to other services (the logs will be Base64 encoded and compressed with the gzip format). For more information, see [Amazon CloudWatch logging](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/WhatIsCloudWatchLogs.html).

**Note**  
Amazon EKS control plane logs are delivered to CloudWatch Logs within a few minutes. However, log delivery is best effort.

You can start using Amazon EKS control plane logging by choosing which log types you want to enable for each new or existing Amazon EKS cluster. You can enable or disable each log type on a per-cluster basis using the AWS Management Console, AWS CLI (version `1.16.139` or higher), or through the Amazon EKS API. When enabled, logs are automatically sent from the Amazon EKS cluster to CloudWatch Logs in the same account.

When you use Amazon EKS control plane logging, you’re charged standard Amazon EKS pricing for each cluster that you run. You are charged the standard CloudWatch Logs data ingestion and storage costs for any logs sent to CloudWatch Logs from your clusters. You are also charged for any AWS resources, such as Amazon EC2 instances or Amazon EBS volumes, that you provision as part of your cluster.

The following cluster control plane log types are available. Each log type corresponds to a component of the Kubernetes control plane. To learn more about these components, see [Kubernetes Components](https://kubernetes.io/docs/concepts/overview/components/) in the Kubernetes documentation.

 **API server (`api`)**   
Your cluster’s API server is the control plane component that exposes the Kubernetes API. If you enable API server logs when you launch the cluster, or shortly thereafter, the logs include API server flags that were used to start the API server. For more information, see [kube-apiserver](https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/) and the [audit policy](https://github.com/kubernetes/kubernetes/blob/master/cluster/gce/gci/configure-helper.sh#L1129-L1255) in the Kubernetes documentation.

 **Audit (`audit`)**   
Kubernetes audit logs provide a record of the individual users, administrators, or system components that have affected your cluster. For more information, see [Auditing](https://kubernetes.io/docs/tasks/debug-application-cluster/audit/) in the Kubernetes documentation.

 **Authenticator (`authenticator`)**   
Authenticator logs are unique to Amazon EKS. These logs represent the control plane component that Amazon EKS uses for Kubernetes [Role Based Access Control](https://kubernetes.io/docs/reference/access-authn-authz/rbac/) (RBAC) authentication using IAM credentials. For more information, see [Organize and monitor cluster resources](eks-managing.md).

 **Controller manager (`controllerManager`)**   
The controller manager manages the core control loops that are shipped with Kubernetes. For more information, see [kube-controller-manager](https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/) in the Kubernetes documentation.

 **Scheduler (`scheduler`)**   
The scheduler component manages when and where to run Pods in your cluster. For more information, see [kube-scheduler](https://kubernetes.io/docs/reference/command-line-tools-reference/kube-scheduler/) in the Kubernetes documentation.

## Enable or disable control plane logs
<a name="enabling-control-plane-log-export"></a>

By default, cluster control plane logs aren’t sent to CloudWatch Logs. You must enable each log type individually to send logs for your cluster. CloudWatch Logs ingestion, archive storage, and data scanning rates apply to enabled control plane logs. For more information, see [CloudWatch pricing](https://aws.amazon.com/cloudwatch/pricing/).

To update the control plane logging configuration, Amazon EKS requires up to five available IP addresses in each subnet. When you enable a log type, the logs are sent with a log verbosity level of `2`.

You can enable or disable control plane logs with either the [AWS Management Console](#control-plane-console) or the [AWS CLI](#control-plane-cli).

### AWS Management Console
<a name="control-plane-console"></a>

1. Open the [Amazon EKS console](https://console.aws.amazon.com/eks/home#/clusters).

1. Choose the name of the cluster to display your cluster information.

1. Choose the **Observability** tab.

1. In the **Control plane logging** section, choose **Manage logging**.

1. For each individual log type, choose whether the log type should be turned on or turned off. By default, each log type is turned off.

1. Choose **Save changes** to finish.

### AWS CLI
<a name="control-plane-cli"></a>

1. Check your AWS CLI version with the following command.

   ```
   aws --version
   ```

   If your AWS CLI version is earlier than `1.16.139`, you must first update to the latest version. To install or upgrade the AWS CLI, see [Installing the AWS Command Line Interface](https://docs.aws.amazon.com/cli/latest/userguide/installing.html) in the * AWS Command Line Interface User Guide*.

1. Update your cluster’s control plane log export configuration with the following AWS CLI command. Replace *my-cluster* with your cluster name and specify your desired endpoint access values.
**Note**  
The following command sends all available log types to CloudWatch Logs.

   ```
   aws eks update-cluster-config \
       --region region-code \
       --name my-cluster \
       --logging '{"clusterLogging":[{"types":["api","audit","authenticator","controllerManager","scheduler"],"enabled":true}]}'
   ```

   An example output is as follows.

   ```
   {
       "update": {
           "id": "883405c8-65c6-4758-8cee-2a7c1340a6d9",
           "status": "InProgress",
           "type": "LoggingUpdate",
           "params": [
               {
                   "type": "ClusterLogging",
                   "value": "{\"clusterLogging\":[{\"types\":[\"api\",\"audit\",\"authenticator\",\"controllerManager\",\"scheduler\"],\"enabled\":true}]}"
               }
           ],
           "createdAt": 1553271814.684,
           "errors": []
       }
   }
   ```

1. Monitor the status of your log configuration update with the following command, using the cluster name and the update ID that were returned by the previous command. Your update is complete when the status appears as `Successful`.

   ```
   aws eks describe-update \
       --region region-code\
       --name my-cluster \
       --update-id 883405c8-65c6-4758-8cee-2a7c1340a6d9
   ```

   An example output is as follows.

   ```
   {
       "update": {
           "id": "883405c8-65c6-4758-8cee-2a7c1340a6d9",
           "status": "Successful",
           "type": "LoggingUpdate",
           "params": [
               {
                   "type": "ClusterLogging",
                   "value": "{\"clusterLogging\":[{\"types\":[\"api\",\"audit\",\"authenticator\",\"controllerManager\",\"scheduler\"],\"enabled\":true}]}"
               }
           ],
           "createdAt": 1553271814.684,
           "errors": []
       }
   }
   ```

## View cluster control plane logs
<a name="viewing-control-plane-logs"></a>

After you have enabled any of the control plane log types for your Amazon EKS cluster, you can view them on the CloudWatch console.

To learn more about viewing, analyzing, and managing logs in CloudWatch, see the [Amazon CloudWatch Logs User Guide](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/).

1. Open the [CloudWatch console](https://console.aws.amazon.com/cloudwatch/home#logs:prefix=/aws/eks). The link opens the console and displays your current available log groups and filters them with the `/aws/eks` prefix.

1. Choose the cluster that you want to view logs for. The log group name format is `/aws/eks/my-cluster/cluster`.

1. Choose the log stream to view. The following list describes the log stream name format for each log type.
**Note**  
As log stream data grows, the log stream names are rotated. When multiple log streams exist for a particular log type, you can view the latest log stream by looking for the log stream name with the latest **Last event time**.
   +  **Kubernetes API server component logs (`api`)** – `kube-apiserver-1234567890abcdef01234567890abcde ` 
   +  **Audit (`audit`)** – `kube-apiserver-audit-1234567890abcdef01234567890abcde ` 
   +  **Authenticator (`authenticator`)** – `authenticator-1234567890abcdef01234567890abcde ` 
   +  **Controller manager (`controllerManager`)** – `kube-controller-manager-1234567890abcdef01234567890abcde ` 
   +  **Scheduler (`scheduler`)** – `kube-scheduler-1234567890abcdef01234567890abcde ` 

1. Look through the events of the log stream.

   For example, you should see the initial API server flags for the cluster when viewing the top of `kube-apiserver-1234567890abcdef01234567890abcde `.
**Note**  
If you don’t see the API server logs at the beginning of the log stream, then it is likely that the API server log file was rotated on the server before you enabled API server logging on the server. Any log files that are rotated before API server logging is enabled can’t be exported to CloudWatch.

However, you can create a new cluster with the same Kubernetes version and enable the API server logging when you create the cluster. Clusters with the same platform version have the same flags enabled, so your flags should match the new cluster’s flags. When you finish viewing the flags for the new cluster in CloudWatch, you can delete the new cluster.

# Log API calls as AWS CloudTrail events
<a name="logging-using-cloudtrail"></a>

Amazon EKS is integrated with AWS CloudTrail. CloudTrail is a service that provides a record of actions by a user, role, or an AWS service in Amazon EKS. CloudTrail captures all API calls for Amazon EKS as events. This includes calls from the Amazon EKS console and from code calls to the Amazon EKS API operations.

If you create a trail, you can enable continuous delivery of CloudTrail events to an Amazon S3 bucket. This includes events for Amazon EKS. If you don’t configure a trail, you can still view the most recent events in the CloudTrail console in **Event history**. Using the information that CloudTrail collects, you can determine several details about a request. For example, you can determine when the request was made to Amazon EKS, the IP address where the request was made from, and who made the request.

To learn more about CloudTrail, see the [AWS CloudTrail User Guide](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/).

**Topics**
+ [View helpful references for AWS CloudTrail](service-name-info-in-cloudtrail.md)
+ [Analyze AWS CloudTrail log file entries](understanding-service-name-entries.md)
+ [View metrics for Amazon EC2 Auto Scaling groups](enable-asg-metrics.md)

# View helpful references for AWS CloudTrail
<a name="service-name-info-in-cloudtrail"></a>

When you create your AWS account, CloudTrail is also enabled on your AWS account. When any activity occurs in Amazon EKS, that activity is recorded in a CloudTrail event along with other AWS service events in **Event history**. You can view, search, and download recent events in your AWS account. For more information, see [Viewing events with CloudTrail event history](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/view-cloudtrail-events.html).

For an ongoing record of events in your AWS account, including events for Amazon EKS, create a trail. A *trail* enables CloudTrail to deliver log files to an Amazon S3 bucket. By default, when you create a trail in the console, the trail applies to all AWS Regions. The trail logs events from all AWS Regions in the AWS partition and delivers the log files to the Amazon S3 bucket that you specify. Additionally, you can configure other AWS services to further analyze and act upon the event data that’s collected in CloudTrail logs. For more information, see the following resources.
+  [Overview for creating a trail](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-create-and-update-a-trail.html) 
+  [CloudTrail supported services and integrations](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-aws-service-specific-topics.html#cloudtrail-aws-service-specific-topics-integrations) 
+  [Configuring Amazon SNS notifications for CloudTrail](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/getting_notifications_top_level.html) 
+  [Receiving CloudTrail log files from multiple regions](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/receive-cloudtrail-log-files-from-multiple-regions.html) and [Receiving CloudTrail log files from multiple accounts](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-receive-logs-from-multiple-accounts.html) 

All Amazon EKS actions are logged by CloudTrail and are documented in the [Amazon EKS API Reference](https://docs.aws.amazon.com/eks/latest/APIReference/). For example, calls to the [CreateCluster](https://docs.aws.amazon.com/eks/latest/APIReference/API_CreateCluster.html), [ListClusters](https://docs.aws.amazon.com/eks/latest/APIReference/API_ListClusters.html) and [DeleteCluster](https://docs.aws.amazon.com/eks/latest/APIReference/API_DeleteCluster.html) sections generate entries in the CloudTrail log files.

Every event or log entry contains information about the type of IAM identity that made the request, and which credentials were used. If temporary credentials were used, the entry shows how the credentials were obtained.

For more information, see the [CloudTrail userIdentity element](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-event-reference-user-identity.html).

# Analyze AWS CloudTrail log file entries
<a name="understanding-service-name-entries"></a>

A trail is a configuration that enables delivery of events as log files to an Amazon S3 bucket that you specify. CloudTrail log files contain one or more log entries. An event represents a single request from any source and includes information about the requested action. This include information such as the date and time of the action and the request parameters that were used. CloudTrail log files aren’t an ordered stack trace of the public API calls, so they don’t appear in any specific order.

The following example shows a CloudTrail log entry that demonstrates the [https://docs.aws.amazon.com/eks/latest/APIReference/API_CreateCluster.html](https://docs.aws.amazon.com/eks/latest/APIReference/API_CreateCluster.html) action.

```
{
  "eventVersion": "1.05",
  "userIdentity": {
    "type": "IAMUser",
    "principalId": "AKIAIOSFODNN7EXAMPLE",
    "arn": "arn:aws:iam::111122223333:user/username",
    "accountId": "111122223333",
    "accessKeyId": "AKIAIOSFODNN7EXAMPLE",
    "userName": "username"
  },
  "eventTime": "2018-05-28T19:16:43Z",
  "eventSource": "eks.amazonaws.com",
  "eventName": "CreateCluster",
  "awsRegion": "region-code",
  "sourceIPAddress": "205.251.233.178",
  "userAgent": "PostmanRuntime/6.4.0",
  "requestParameters": {
    "resourcesVpcConfig": {
      "subnetIds": [
        "subnet-a670c2df",
        "subnet-4f8c5004"
      ]
    },
    "roleArn": "arn:aws:iam::111122223333:role/AWSServiceRoleForAmazonEKS-CAC1G1VH3ZKZ",
    "clusterName": "test"
  },
  "responseElements": {
    "cluster": {
      "clusterName": "test",
      "status": "CREATING",
      "createdAt": 1527535003.208,
      "certificateAuthority": {},
      "arn": "arn:aws:eks:region-code:111122223333:cluster/test",
      "roleArn": "arn:aws:iam::111122223333:role/AWSServiceRoleForAmazonEKS-CAC1G1VH3ZKZ",
      "version": "1.10",
      "resourcesVpcConfig": {
        "securityGroupIds": [],
        "vpcId": "vpc-21277358",
        "subnetIds": [
          "subnet-a670c2df",
          "subnet-4f8c5004"
        ]
      }
    }
  },
  "requestID": "a7a0735d-62ab-11e8-9f79-81ce5b2b7d37",
  "eventID": "eab22523-174a-499c-9dd6-91e7be3ff8e3",
  "readOnly": false,
  "eventType": "AwsApiCall",
  "recipientAccountId": "111122223333"
}
```

## Log Entries for Amazon EKS Service Linked Roles
<a name="eks-service-linked-role-ct"></a>

The Amazon EKS service linked roles make API calls to AWS resources. CloudTrail log entries with `username: AWSServiceRoleForAmazonEKS` and `username: AWSServiceRoleForAmazonEKSNodegroup` appears for calls made by the Amazon EKS service linked roles. For more information about Amazon EKS and service linked roles, see [Using service-linked roles for Amazon EKS](using-service-linked-roles.md).

The following example shows a CloudTrail log entry that demonstrates a [https://docs.aws.amazon.com/IAM/latest/APIReference/API_DeleteInstanceProfile.html](https://docs.aws.amazon.com/IAM/latest/APIReference/API_DeleteInstanceProfile.html) action that’s made by the `AWSServiceRoleForAmazonEKSNodegroup` service linked role, noted in the `sessionContext`.

```
{
    "eventVersion": "1.05",
    "userIdentity": {
        "type": "AssumedRole",
        "principalId": "AROA3WHGPEZ7SJ2CW55C5:EKS",
        "arn": "arn:aws:sts::111122223333:assumed-role/AWSServiceRoleForAmazonEKSNodegroup/EKS",
        "accountId": "111122223333",
        "accessKeyId": "AKIAIOSFODNN7EXAMPLE",
        "sessionContext": {
            "sessionIssuer": {
                "type": "Role",
                "principalId": "AROA3WHGPEZ7SJ2CW55C5",
                "arn": "arn:aws:iam::111122223333:role/aws-service-role/eks-nodegroup.amazonaws.com/AWSServiceRoleForAmazonEKSNodegroup",
                "accountId": "111122223333",
                "userName": "AWSServiceRoleForAmazonEKSNodegroup"
            },
            "webIdFederationData": {},
            "attributes": {
                "mfaAuthenticated": "false",
                "creationDate": "2020-02-26T00:56:33Z"
            }
        },
        "invokedBy": "eks-nodegroup.amazonaws.com"
    },
    "eventTime": "2020-02-26T00:56:34Z",
    "eventSource": "iam.amazonaws.com",
    "eventName": "DeleteInstanceProfile",
    "awsRegion": "region-code",
    "sourceIPAddress": "eks-nodegroup.amazonaws.com",
    "userAgent": "eks-nodegroup.amazonaws.com",
    "requestParameters": {
        "instanceProfileName": "eks-11111111-2222-3333-4444-abcdef123456"
    },
    "responseElements": null,
    "requestID": "11111111-2222-3333-4444-abcdef123456",
    "eventID": "11111111-2222-3333-4444-abcdef123456",
    "eventType": "AwsApiCall",
    "recipientAccountId": "111122223333"
}
```

# View metrics for Amazon EC2 Auto Scaling groups
<a name="enable-asg-metrics"></a>

Amazon EKS managed node groups have Amazon EC2 Auto Scaling group metrics enabled by default with no additional charge. The Auto Scaling group sends sampled data to Amazon CloudWatch every minute. These metrics can be refined by the name of the Auto Scaling groups. They give you continuous visibility into the history of the Auto Scaling group powering your managed node groups, such as changes in the size of the group over time. Auto Scaling group metrics are available in the [Amazon CloudWatch](https://aws.amazon.com/cloudwatch) or Auto Scaling console. For more information, see [Monitor CloudWatch metrics for your Auto Scaling groups and instances](https://docs.aws.amazon.com/autoscaling/ec2/userguide/ec2-auto-scaling-cloudwatch-monitoring.html).

With Auto Scaling group metrics collection, you’re able to monitor the scaling of managed node groups. Auto Scaling group metrics report the minimum, maximum, and desired size of an Auto Scaling group. You can create an alarm if the number of nodes in a node group falls below the minimum size, which would indicate an unhealthy node group. Tracking node group size is also useful in adjusting the maximum count so that your data plane doesn’t run out of capacity.

If you would prefer to not have these metrics collected, you can choose to disable all or only some of them. For example, you can do this to avoid noise in your CloudWatch dashboards. For more information, see [Amazon CloudWatch metrics for Amazon EC2 Auto Scaling](https://docs.aws.amazon.com/autoscaling/ec2/userguide/ec2-auto-scaling-cloudwatch-monitoring.html).

# Send metric and trace data with ADOT Operator
<a name="opentelemetry"></a>

Amazon EKS supports using the AWS Management Console, AWS CLI and Amazon EKS API to install and manage the [AWS Distro for OpenTelemetry (ADOT)](https://aws-otel.github.io/) Operator. This makes it easier to enable your applications running on Amazon EKS to send metric and trace data to multiple monitoring service options like [Amazon CloudWatch](https://console.aws.amazon.com/cloudwatch), [Prometheus](https://console.aws.amazon.com/prometheus), and [X-Ray](https://console.aws.amazon.com/xray).

For more information, see [Getting Started with AWS Distro for OpenTelemetry using EKS Add-Ons](https://aws-otel.github.io/docs/getting-started/adot-eks-add-on) in the AWS Distro for OpenTelemetry documentation.

# View aggregated data about cluster resources with the EKS Dashboard
<a name="cluster-dashboard"></a>

## What is the Amazon EKS Dashboard?
<a name="_what_is_the_amazon_eks_dashboard"></a>

The Amazon EKS Dashboard provides consolidated visibility into your Kubernetes clusters across multiple AWS Regions and AWS Accounts. With this dashboard, you can:
+ Track clusters scheduled for end-of-support auto-upgrades within the next 90 days.
+ Project EKS control plane costs for clusters in extended support.
+ Review clusters with insights that need attention before upgrading.
+ Identify managed node groups running specific AMI versions.
+ Monitor cluster support type distribution (standard compared to extended support).

The EKS dashboard integrates with EKS Cluster Insights to surface issues with your clusters, such as use of deprecated Kubernetes APIs. For more information, see [Prepare for Kubernetes version upgrades and troubleshoot misconfigurations with cluster insights](cluster-insights.md).

**Note**  
The EKS Dashboard is not real-time and updates every 12 hours. For real-time cluster monitoring, see [Monitor your cluster performance and view logs](eks-observe.md) 

## How does the dashboard use AWS Organizations?
<a name="how_does_the_dashboard_use_shared_aws_organizations"></a>

The Amazon EKS dashboard requires AWS Organizations integration for functionality. It leverages AWS Organizations to securely gather cluster information across accounts. This integration provides centralized management and governance as your AWS infrastructure scales.

If AWS Organizations isn’t enabled for your infrastructure, see the AWS [Organizations User Guide](https://docs.aws.amazon.com/organizations/latest/userguide/orgs_introduction.html) for setup instructions.

### Cross-region and cross-account access
<a name="_cross_region_and_cross_account_access"></a>

The EKS Dashboard can see cluster resources in any account that is a member of the AWS organization. To generate a list of AWS accounts in your organization, see [Export details for all accounts in an organization](https://docs.aws.amazon.com/organizations/latest/userguide/orgs_manage_accounts_export.html).

The us-east-1 AWS region generates the dashboard. You must log in to this region to see the dashboard. The dashboard aggregates data across AWS regions, but this does not include `GovCloud` or China regions.

### Key terms
<a name="_key_terms"></a>
+  ** AWS Organization**: A unified management structure for multiple AWS accounts.
+  **Management account**: The primary account that controls the AWS Organization.
+  **Member account**: Any account within the organization except the management account.
+  **Delegated administrator**: A member account granted specific cross-account administrative permissions. Within the management account, you can select one delegated administrator account per AWS Service.
+  **Trusted access**: Authorization for the EKS Dashboard to access cluster information across organizational accounts.
+  **Service-Linked Role (SLRs)**: A unique type of IAM role directly linked to an AWS service. The EKS Dashboard uses a SLR to read information about your accounts and organizations.
+ For more information, see [Terminology and concepts](https://docs.aws.amazon.com/organizations/latest/userguide/orgs_getting-started_concepts.html) in the AWS Organizations User Guide.

### General overview
<a name="_general_overview"></a>

1. Access the management account of your AWS Organization.
   + The steps to access the management account depend on how you have configured your AWS Organization. For example, you might access the management account via AWS [Identity Center](https://aws.amazon.com/iam/identity-center/) or [Okta](https://www.okta.com/partners/aws/).

1. Enable Trusted access through the EKS Console.

1. Assign a Delegated administrator using their AWS Account ID.

1. Switch to the Delegated administrator account.

1. Access the enhanced EKS Console with organization-wide visibility.

## Enable the EKS Dashboard using the AWS console
<a name="enable_the_eks_dashboard_using_the_shared_aws_console"></a>

**Important**  
You must be logged in to the Management Account of your AWS Organization to enable the EKS Dashboard.

### Access EKS Dashboard settings
<a name="_access_eks_dashboard_settings"></a>

1. Confirm the following:

   1. You have AWS Organizations enabled and configured.

   1. You are logged into the Management account of the organization.

   1. You are viewing the AWS Management Console in the us-east-1 region.

1. Navigate to the EKS console.

1. In the left sidebar, open Dashboard Settings.

### Set up access to the Amazon EKS Dashboard
<a name="_set_up_access_to_the_amazon_eks_dashboard"></a>

1. Find the AWS Account ID of the AWS Account you want to allow to view the EKS Dashboard.

   1. This step is optional, but suggested. If you don’t, you can only access the dashboard from the management account. As a best practice, you should limit access to the management account.

1. Click **Enable trusted access**.

   1. You can now view the dashboard from the management account.

1. Click **Register delegated administrator** and input the Account ID of the AWS Account you will use to view the dashboard.

   1. You can now view the dashboard from the delegated administrator account or the management account.

For information about permissions required to enable the dashboard, see [Minimum IAM policies required](cluster-dashboard-orgs.md#dashboard-iam-policy).

## View the EKS dashboard
<a name="_view_the_eks_dashboard"></a>

1. Log in to the delegated administrator account (suggested) or the management account.

1. Log in to the us-east-1 region.

1. Go to the EKS service, and select Dashboard from the left sidebar.

**Note**  
 [Review the IAM permissions required to view the EKS dashboard.](cluster-dashboard-orgs.md#eks-dashboard-view-policy) 

## Configure the dashboard
<a name="_configure_the_dashboard"></a>

You can configure the view of the dashboard, and filter resources.

### Available resources
<a name="_available_resources"></a>
+  **Clusters**: View aggregated information about the status and location of EKS Clusters.
  + Clusters with health issues.
  + Clusters on EKS Extended Support.
  + Breakdown of clusters by Kubernetes version.
+  **Managed node groups**: Review Managed node groups and EC2 Instances.
  + Node groups by AMI type, such as Amazon Linux or Bottlerocket.
  + Node group health issues.
  + Instance type distribution.
+  **Add-ons**: Learn about what Amazon EKS Add-ons you have installed, and their status.
  + Number of installations per add-on.
  + Add-ons with health issues.
  + Version distribution per add-on.

### Available views
<a name="_available_views"></a>
+  **Graph view** 
  + A customizable widget view displaying graphs and visualizations of the selected resource.
  + Changes to the Graph view, such as removing a widget, are visible to all users of the EKS Dashboard.
+  **Resource view** 
  + A list view of the selected resource, supporting filters.
+  **Map View** 
  + View the geographic distribution of the selected resource.

### Filter the EKS dashboard
<a name="_filter_the_eks_dashboard"></a>

You can filter the EKS Dashboard by:
+  AWS Account
+ Organizational unit, defined by AWS Organizations
+  AWS Region

## Disable the EKS dashboard using the AWS console
<a name="disable_the_eks_dashboard_using_the_shared_aws_console"></a>

1. Confirm the following:

   1. You have AWS Organizations enabled and configured.

   1. You are logged into the Management account of the organization.

   1. You are viewing the AWS Management Console in the us-east-1 region.

1. Navigate to the EKS console.

1. In the left sidebar, open Dashboard Settings.

1. Click **Disable trusted access**.

## Troubleshoot the EKS dashboard
<a name="_troubleshoot_the_eks_dashboard"></a>

### Issue enabling EKS dashboard
<a name="_issue_enabling_eks_dashboard"></a>
+ You must be logged in to the management account of an AWS Organization.
  + If you do not have an AWS Organization, create one. Learn how to [Create and configure an organization](https://docs.aws.amazon.com/organizations/latest/userguide/orgs_tutorials_basic.html).
  + If your AWS account is already a member of an AWS Organization, identify the administrator of the organization.
+ You must be logged in to the AWS account with sufficient IAM permissions to create and update AWS Organizations resources.

### Issue viewing the EKS dashboard
<a name="_issue_viewing_the_eks_dashboard"></a>
+ You must be logged in to one of the following AWS accounts:
  + The management account of the AWS Organization
  + A delegated administrator account, identified in the EKS dashboard settings of the management account.
+ If you’ve recently enabled the EKS Dashboard, please note that initial data population may take up to 12 hours.
+  [Try re-enabling the dashboard using the CLI](cluster-dashboard-orgs.md#dashboard-enable-cli), including creating the service linked role.

### Dashboard widgets move unexpectedly
<a name="_dashboard_widgets_move_unexpectedly"></a>
+ The EKS Dashboard saves the configurable widget view at the AWS Account level. If you change the widget view, other people using the same AWS account will see the changes.

# Configure EKS Dashboard integration with AWS Organizations
<a name="cluster-dashboard-orgs"></a>

This section provides step-by-step instructions for configuring the EKS Dashboard’s integration with AWS Organizations. You’ll learn how to enable and disable trusted access between services, as well as how to register and deregister delegated administrator accounts. Each configuration task can be performed using either the AWS console or the AWS CLI.

## Enable trusted access
<a name="_enable_trusted_access"></a>

Trusted access authorizes the EKS Dashboard to securely access cluster information across all accounts in your organization.

### Using the AWS console
<a name="using_the_shared_aws_console"></a>

1. Log in to the management account of your AWS Organization.

1. Navigate to the EKS console in the us-east-1 region.

1. In the left sidebar, select Dashboard Settings.

1. Click **Enable trusted access**.

**Note**  
When you enable trusted access through the EKS console, the system automatically creates the `AWSServiceRoleForAmazonEKSDashboard` service-linked role. This automatic creation does not occur if you enable trusted access using the AWS CLI or AWS Organizations console.

### Using the AWS CLI
<a name="dashboard-enable-cli"></a>

1. Log in to the management account of your AWS Organization.

1. Run the following commands:

   ```
   aws iam create-service-linked-role --aws-service-name dashboard.eks.amazonaws.com
   aws organizations enable-aws-service-access --service-principal eks.amazonaws.com
   ```

## Disable trusted access
<a name="_disable_trusted_access"></a>

Disabling trusted access revokes the EKS Dashboard’s permission to access cluster information across your organization’s accounts.

### Using the AWS console
<a name="using_the_shared_aws_console"></a>

1. Log in to the management account of your AWS Organization.

1. Navigate to the EKS Console in the us-east-1 region.

1. In the left sidebar, select Dashboard Settings.

1. Click **Disable trusted access**.

### Using the AWS CLI
<a name="using_the_shared_aws_cli"></a>

1. Log in to the management account of your AWS Organization.

1. Run the following command:

   ```
   aws organizations disable-aws-service-access --service-principal eks.amazonaws.com
   ```

## Enable a delegated administrator account
<a name="_enable_a_delegated_administrator_account"></a>

A delegated administrator is a member account that’s granted permission to access the EKS Dashboard.

### Using the AWS console
<a name="using_the_shared_aws_console"></a>

1. Log in to the management account of your AWS Organization.

1. Navigate to the EKS console in the us-east-1 region.

1. In the left sidebar, select Dashboard Settings.

1. Click **Register delegated administrator**.

1. Enter the Account ID of the AWS Account you want to choose as delegated administrator.

1. Confirm the registration.

### Using the AWS CLI
<a name="using_the_shared_aws_cli"></a>

1. Log in to the management account of your AWS Organization.

1. Run the following command, replacing `123456789012` with your account ID:

   ```
   aws organizations register-delegated-administrator --account-id 123456789012 --service-principal eks.amazonaws.com
   ```

## Disable a delegated administrator account
<a name="_disable_a_delegated_administrator_account"></a>

Disabling a delegated administrator removes the account’s permission to access the EKS Dashboard.

### Using the AWS console
<a name="using_the_shared_aws_console"></a>

1. Log in to the management account of your AWS Organization.

1. Navigate to the EKS console in the us-east-1 region.

1. In the left sidebar, select Dashboard Settings.

1. Locate the delegated administrator in the list.

1. Click **Deregister** next to the account you want to remove as delegated administrator.

### Using the AWS CLI
<a name="using_the_shared_aws_cli"></a>

1. Log in to the management account of your AWS Organization.

1. Run the following command, replacing `123456789012` with the account ID of the delegated administrator:

   ```
   aws organizations deregister-delegated-administrator --account-id 123456789012 --service-principal eks.amazonaws.com
   ```

## Minimum IAM policies required
<a name="dashboard-iam-policy"></a>

This section outlines the minimum IAM policies required to enable trusted access and delegate an administrator for the EKS Dashboard integration with AWS Organizations.

### Policy for enabling trusted access
<a name="_policy_for_enabling_trusted_access"></a>

To enable trusted access between EKS Dashboard and AWS Organizations, you need the following permissions:

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "organizations:EnableAWSServiceAccess",
                "organizations:DescribeOrganization",
                "organizations:ListAWSServiceAccessForOrganization"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "iam:CreateServiceLinkedRole"
            ],
            "Resource": "arn:aws:iam::*:role/aws-service-role/dashboard.eks.amazonaws.com/AWSServiceRoleForAmazonEKSDashboard"
        }
    ]
}
```

### Policy for delegating an administrator
<a name="_policy_for_delegating_an_administrator"></a>

To register or deregister a delegated administrator for the EKS Dashboard, you need the following permissions:

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "organizations:RegisterDelegatedAdministrator",
                "organizations:DeregisterDelegatedAdministrator",
                "organizations:ListDelegatedAdministrators"
            ],
            "Resource": "*"
        }
    ]
}
```

### Policy to view EKS Dashboard
<a name="eks-dashboard-view-policy"></a>

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "AmazonEKSDashboardReadOnly",
            "Effect": "Allow",
            "Action": [
                "eks:ListDashboardData",
                "eks:ListDashboardResources",
                "eks:DescribeClusterVersions"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AmazonOrganizationsReadOnly",
            "Effect": "Allow",
            "Action": [
                "organizations:DescribeOrganization",
                "organizations:ListAWSServiceAccessForOrganization",
                "organizations:ListRoots",
                "organizations:ListAccountsForParent",
                "organizations:ListOrganizationalUnitsForParent"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AmazonOrganizationsDelegatedAdmin",
            "Effect": "Allow",
            "Action": [
                "organizations:ListDelegatedAdministrators"
            ],
            "Resource": [
                "*"
            ],
            "Condition": {
                "StringEquals": {
                    "organizations:ServicePrincipal": "eks.amazonaws.com"
                }
            }
        }
    ]
}
```

**Note**  
These policies must be attached to the IAM principal (user or role) in the management account of your AWS Organization. Member accounts cannot enable trusted access or delegate administrators.