

# Amazon EKS
<a name="automation-ref-eks"></a>

 AWS Systems Manager Automation provides predefined runbooks for Amazon Elastic Kubernetes Service. For more information about runbooks, see [Working with runbooks](https://docs.aws.amazon.com/systems-manager/latest/userguide/automation-documents.html). For information about how to view runbook content, see [View runbook content](automation-runbook-reference.md#view-automation-json). 

**Topics**
+ [`AWS-CreateEKSClusterWithFargateProfile`](aws-createeksclusterwithfargateprofile.md)
+ [`AWS-CreateEKSClusterWithNodegroup`](aws-createeksclusterwithnodegroup.md)
+ [`AWS-DeleteEKSCluster`](automation-aws-deleteekscluster.md)
+ [`AWS-MigrateToNewEKSSelfManagedNodeGroup`](aws-migratetoneweksselfmanagedlinuxnodegroup.md)
+ [`AWSPremiumSupport-TroubleshootEKSCluster`](automation-awspremiumsupport-troubleshootekscluster.md)
+ [`AWSSupport-TroubleshootEKSWorkerNode`](automation-awssupport-troubleshooteksworkernode.md)
+ [`AWS-UpdateEKSCluster`](automation-updateekscluster.md)
+ [`AWS-UpdateEKSManagedNodeGroup`](aws-updateeksmanagednodegroup.md)
+ [`AWS-UpdateEKSSelfManagedLinuxNodeGroups`](aws-updateeksselfmanagedlinuxnodegroup.md)
+ [`AWSSupport-CollectEKSLinuxNodeStatistics`](automation-awssupport-collectekslinuxnodestatistics.md)
+ [`AWSSupport-CollectEKSInstanceLogs`](automation-awssupport-collecteksinstancelogs.md)
+ [`AWSSupport-SetupK8sApiProxyForEKS`](automation-awssupport-setupk8sapiproxyforeks.md)
+ [`AWSSupport-TroubleshootEbsCsiDriversForEks`](automation-awssupport-troubleshoot-ebs-csi-drivers-for-eks.md)
+ [`AWSSupport-TroubleshootEKSALBControllerIssues`](automation-awssupport-troubleshoot-eks-alb-controller-issues.md)
+ [`AWSSupport-TroubleshootEKSDNSFailure`](automation-awssupport-troubleshooteksdnsfailure.md)

# `AWS-CreateEKSClusterWithFargateProfile`
<a name="aws-createeksclusterwithfargateprofile"></a>

 **Description** 

 The `AWS-CreateEKSClusterWithFargateProfile` runbook creates an Amazon Elastic Kubernetes Service (Amazon EKS) cluster using an AWS Fargate.

 [Run this Automation (console)](https://console.aws.amazon.com/systems-manager/automation/execute/AWS-CreateEKSClusterWithFargateProfile) 

**Document type**

Automation

**Owner**

Amazon

**Platforms**

Linux, macOS, Windows

**Parameters**
+ AutomationAssumeRole

  Type: String

  Description: (Optional) The Amazon Resource Name (ARN) of the AWS Identity and Access Management (IAM) role that allows Systems Manager Automation to perform the actions on your behalf. If no role is specified, Systems Manager Automation uses the permissions of the user that starts this runbook.
+ ClusterName

  Type: String

  Description: (Required) A unique name for the cluster.
+ ClusterRoleArn

  Type: String

  Description: (Required) The ARN of the IAM role that provides permissions for the Kubernetes control plane to make calls to AWS API operations on your behalf.
+ FargateProfileName

  Type: String

  Description: (Required) The name of the Fargate profile.
+ FargateProfileRoleArn

  Type: String

  Description: (Required) The ARN of the Amazon EKS Pod execution IAM role.
+ FargateProfileSelectors

  Type: String

  Description: (Required) The selectors to match pods to the Fargate profile.
+ SubnetIds

  Type: StringList

  Description: (Required) The IDs of the subnets you want to use for your Amazon EKS cluster. Amazon EKS creates elastic network interfaces in these subnets for communication between your nodes and the Kubernetes control plane. You must specify at least two subnet IDs.
+ EKSEndpointPrivateAccess

  Type: Boolean

  Default: True

  Description: (Optional) Set this value to `True` to allow private access for your cluster's Kubernetes API server endpoint. If you enable private access, Kubernetes API requests from within your cluster's VPC use the private VPC endpoint. If you disable private access and you have nodes or AWS Fargate pods in the cluster, then ensure that `publicAccessCidrs` include the necessary CIDR blocks for communication with the nodes or Fargate pods.
+ EKSEndpointPublicAccess

  Type: Boolean

  Default: False

  Description: (Optional) Set this value to `False` to disable public access to your cluster's Kubernetes API server endpoint. If you disable public access, your cluster's Kubernetes API server can only receive requests from within the VPC where it was launched.
+ PublicAccessCIDRs

  Type: StringList

  Description: (Optional) The CIDR blocks that are allowed access to your cluster's public Kubernetes API server endpoint. Communication to the endpoint from addresses outside of the CIDR blocks that you specify is denied. If you've disabled private endpoint access and you have nodes or Fargate pods in the cluster, then ensure that you specify the necessary CIDR blocks.
+ SecurityGroupIds

  Type: StringList

  Description: (Optional) Specify one or more security groups to associate with the elastic network interfaces created in your account by Amazon EKS.

**Required IAM permissions**

The `AutomationAssumeRole` parameter requires the following actions to use the runbook successfully.
+ `ssm:GetAutomationExecution`
+ `ssm:StartAutomationExecution`
+ `ec2:DescribeRouteTables`
+ `ec2:DescribeSubnets`
+ `ec2:DescribeVpcs`
+ `eks:CreateCluster`
+ `eks:CreateFargateProfile`
+ `eks:DescribeCluster`
+ `eks:DescribeFargateProfile`
+ `iam:CreateServiceLinkedRole`
+ `iam:GetRole`
+ `iam:ListAttachedRolePolicies`
+ `iam:PassRole`

 **Document Steps** 
+ CreateEKSCluster (aws:executeAwsApi) - Creates an Amazon EKS cluster.
+ VerifyEKSClusterIsActive (aws:waitForAwsResourceProperty) - Verifies the cluster state is `ACTIVE`.
+ CreateFargateProfile (aws:executeAwsApi) - Creates a Fargate for the cluster.
+ VerifyFargateProfileIsActive (aws:waitForAwsResourceProperty) - Verifies the Fargate profile state is `ACTIVE`.

 **Outputs** 

 `CreateEKSCluster.CreateClusterResponse`   
Description: Response received from the `CreateCluster` API call.

 `CreateFargateProfile.CreateFargateProfileResponse`   
Description: Response received from the `CreateFargateProfile` API call.

# `AWS-CreateEKSClusterWithNodegroup`
<a name="aws-createeksclusterwithnodegroup"></a>

 **Description** 

 The `AWS-CreateEKSClusterWithNodegroup` runbook creates an Amazon Elastic Kubernetes Service (Amazon EKS) cluster using a node group for capacity.

 [Run this Automation (console)](https://console.aws.amazon.com/systems-manager/automation/execute/AWS-CreateEKSClusterWithNodegroup) 

**Document type**

Automation

**Owner**

Amazon

**Platforms**

Linux, macOS, Windows

**Parameters**
+ AutomationAssumeRole

  Type: String

  Description: (Optional) The Amazon Resource Name (ARN) of the AWS Identity and Access Management (IAM) role that allows Systems Manager Automation to perform the actions on your behalf. If no role is specified, Systems Manager Automation uses the permissions of the user that starts this runbook.
+ ClusterName

  Type: String

  Description: (Required) A unique name for the cluster.
+ ClusterRoleArn

  Type: String

  Description: (Required) The ARN of the IAM role that provides permissions for the Kubernetes control plane to make calls to AWS API operations on your behalf.
+ NodegroupName

  Type: String

  Description: (Required) A unique name for the node group.
+ NodegroupRoleArn

  Type: String

  Description: (Required) The ARN of the IAM role to associate with your node group. The Amazon EKS worker node kubelet daemon makes calls to AWS APIs on your behalf. Nodes receive permissions for these API calls through an IAM instance profile and associated policies. Before you can launch nodes and register them into a cluster, you must create an IAM role for those nodes to use when they are launched.
+ SubnetIds

  Type: StringList

  Description: (Required) The IDs of the subnets you want to use for your Amazon EKS cluster. Amazon EKS creates elastic network interfaces in these subnets for communication between your nodes and the Kubernetes control plane. You must specify at least two subnet IDs.
+ EKSEndpointPrivateAccess

  Type: Boolean

  Default: True

  Description: (Optional) Set this value to `True` to allow private access for your cluster's Kubernetes API server endpoint. If you enable private access, Kubernetes API requests from within your cluster's VPC use the private VPC endpoint. If you disable private access and you have nodes or AWS Fargate pods in the cluster, then ensure that `publicAccessCidrs` include the necessary CIDR blocks for communication with the nodes or Fargate pods.
+ EKSEndpointPublicAccess

  Type: Boolean

  Default: False

  Description: (Optional) Set this value to `False` to disable public access to your cluster's Kubernetes API server endpoint. If you disable public access, your cluster's Kubernetes API server can only receive requests from within the VPC where it was launched.
+ PublicAccessCIDRs

  Type: StringList

  Description: (Optional) The CIDR blocks that are allowed access to your cluster's public Kubernetes API server endpoint. Communication to the endpoint from addresses outside of the CIDR blocks that you specify is denied. If you've disabled private endpoint access and you have nodes or Fargate pods in the cluster, then ensure that you specify the necessary CIDR blocks.
+ SecurityGroupIds

  Type: StringList

  Description: (Optional) Specify one or more security groups to associate with the elastic network interfaces created in your account by Amazon EKS.

**Required IAM permissions**

The `AutomationAssumeRole` parameter requires the following actions to use the runbook successfully.
+ `ssm:StartAutomationExecution`
+ `ssm:GetAutomationExecution`
+ `ec2:DescribeSubnets`
+ `eks:CreateCluster`
+ `eks:CreateNodegroup`
+ `eks:DescribeCluster`
+ `eks:DescribeNodegroup`
+ `iam:CreateServiceLinkedRole`
+ `iam:GetRole`
+ `iam:ListAttachedRolePolicies`
+ `iam:PassRole`

 **Document Steps** 
+ CreateEKSCluster (aws:executeAwsApi) - Creates an Amazon EKS cluster.
+ VerifyEKSClusterIsActive (aws:waitForAwsResourceProperty) - Verifies the cluster state is `ACTIVE`.
+ CreateNodegroup (aws:executeAwsApi) - Creates a node group for the cluster.
+ VerifyNodegroupIsActive (aws:waitForAwsResourceProperty) - Verifies the node group state is `ACTIVE`.

 **Outputs** 
+ `CreateEKSCluster.CreateClusterResponse`: Response received from the `CreateCluster` API call.
+ `CreateNodegroup.CreateNodegroupResponse`: Response received from the `CreateNodegroup` API call.

# `AWS-DeleteEKSCluster`
<a name="automation-aws-deleteekscluster"></a>

 **Description** 

 This runbook deletes the resources associated with an Amazon EKS cluster, including node groups and Fargate profiles. Optionally, you can choose to delete all self-managed nodes, the CloudFormation stacks used to create the nodes, and the VPC CloudFormation stack for your cluster. For more information about deleting a cluster, see [Deleting a cluster](https://docs.aws.amazon.com/eks/latest/userguide/delete-cluster.html) in the *Amazon EKS User Guide*. 

**Note**  
 If you have active services in your cluster that are associated with a load balancer, you must delete those services before deleting the cluster. If you don't, the system can't delete the load balancers. Use the following procedure to find and delete services before you run the `AWS-DeleteEKSCluster` runbook. 

**To locate and delete services in your cluster**

1.  Install the Kubernetes command line utility, `kubectl` . For more information, see [Installing kubectl](https://docs.aws.amazon.com/eks/latest/userguide/install-kubectl.html) in the *Amazon EKS User Guide*. 

1. Run the following command to list all services running in your cluster.

   ```
   kubectl get svc --all-namespaces
   ```

1. Run the following command to delete any services that have an associated EXTERNAL-IP value. These services are fronted by a load balancer, and you must delete them in Kubernetes to allow the load balancer and associated resources to be properly released.

   ```
                           kubectl delete svc
                           service-name
   ```

 You can now run the `AWS-DeleteEKSCluster` runbook. 

 [Run this Automation (console)](https://console.aws.amazon.com/systems-manager/automation/execute/AWS-DeleteEKSCluster) 

**Document type**

Automation

**Owner**

Amazon

**Platforms**

Linux, macOS, Windows

**Parameters**
+ AutomationAssumeRole

  Type: String

  Description: (Optional) The Amazon Resource Name (ARN) of the AWS Identity and Access Management (IAM) role that allows Systems Manager Automation to perform the actions on your behalf. If no role is specified, Systems Manager Automation uses the permissions of the user that starts this runbook.
+ EKSClusterName

  Type: String

  Description: (Required) The name of the Amazon EKS Cluster to be deleted.
+ VPCCloudFormationStack

  Type: String

  Description: (Optional) CloudFormation stack name for VPC for the EKS cluster being deleted. This deletes the CloudFormation stack for VPC and any resources created by the stack.
+ VPCCloudFormationStackRole

  Type: String

  Description: (Optional) The ARN of an IAM role that CloudFormation assumes to delete the VPC CloudFormation stack. CloudFormation uses the role's credentials to make calls on your behalf.
+ SelfManagedNodeStacks

  Type: String

  Description: (Optional) Comma-separated list of CloudFormation stack names for self-managed nodes, This will delete the CloudFormation stacks for self-managed nodes.
+ SelfManagedNodeStacksRole

  Type: String

  Description: (Optional) The ARN of an IAM role that CloudFormation assumes to delete the Self-managed Node Stacks. CloudFormation uses the role's credentials to make calls on your behalf.

**Required IAM permissions**

The `AutomationAssumeRole` parameter requires the following actions to use the runbook successfully.
+  `sts:AssumeRole` 
+  `eks:ListNodegroups` 
+  `eks:DeleteNodegroup` 
+  `eks:ListFargateProfiles` 
+  `eks:DeleteFargateProfile` 
+  `eks:DeleteCluster` 
+  `cfn:DescribeStacks` 
+  `cfn:DeleteStack` 

 **Document Steps** 
+  `aws:executeScript` - DeleteNodeGroups: Find and delete all node groups in the EKS cluster. 
+  `aws:executeScript` - DeleteFargateProfiles: Find and delete all Fargate profiles in the EKS cluster. 
+  `aws:executeScript` - DeleteSelfManagedNodes: Delete all self-managed nodes and the CloudFormation stacks used to create the nodes. 
+  `aws:executeScript` - DeleteEKSCluster: Delete EKS cluster. 
+  `aws:executeScript` - DeleteVPCCloudFormationStack: Delete the VPC CloudFormation stack. 

# `AWS-MigrateToNewEKSSelfManagedNodeGroup`
<a name="aws-migratetoneweksselfmanagedlinuxnodegroup"></a>

 **Description** 

 The `AWS-MigrateToNewEKSSelfManagedNodeGroup` runbook helps you create a new Amazon Elastic Kubernetes Service (Amazon EKS) Linux node group to migrate your existing application to. For more information, see [Migrating to a new node group](https://docs.aws.amazon.com/eks/latest/userguide/migrate-stack.html) in the **Amazon EKS User Guide**.

 [Run this Automation (console)](https://console.aws.amazon.com/systems-manager/automation/execute/AWS-MigrateToNewEKSSelfManagedLinuxNodeGroup) 

**Document type**

Automation

**Owner**

Amazon

**Platforms**

Linux

**Parameters**
+ AutomationAssumeRole

  Type: String

  Description: (Optional) The Amazon Resource Name (ARN) of the AWS Identity and Access Management (IAM) role that allows Systems Manager Automation to perform the actions on your behalf. If no role is specified, Systems Manager Automation uses the permissions of the user that starts this runbook.
+ OldStackName

  Type: String

  Description: (Required) The name or stack ID of your existing CloudFormation stack.
+ NewStackName

  Type: String

  Description: (Optional) The name of the new CloudFormation stack that is created for your new node group. If you don't specify a value for this parameter, the stack name is created using the format: `NewNodeGroup-ClusterName-AutomationExecutionID`.
+ ClusterControlPlaneSecurityGroup

  Type: String

  Description: (Optional) The ID of the security group you want nodes to use to communicate with the Amazon EKS control plane. If you don't specify a value for this parameter, the security group specified in your existing CloudFormation stack is used.
+ NodeInstanceType

  Type: String

  Description: (Optional) The instance type that you want to use for the new node group. If you don't specify a value for this parameter, the instance type specified in your existing CloudFormation stack is used.
+ NodeGroupName

  Type: String

  Description: (Optional) The name of your new node group. If you don't specify a value for this parameter, the node group name specified in your existing CloudFormation stack is used.
+ NodeAutoScalingGroupDesiredCapacity

  Type: String

  Description: (Optional) The desired number of nodes to scale to when your new stack is created. This number must be greater than or equal to the `NodeAutoScalingGroupMinSize` value and less than or equal to the `NodeAutoScalingGroupMaxSize`. If you don't specify a value for this parameter, the node group desired capacity specified in your existing CloudFormation stack is used.
+ NodeAutoScalingGroupMaxSize

  Type: String

  Description: (Optional) The maximum number of nodes that your node group can scale out to. If you don't specify a value for this parameter, the node group maximum size specified in your existing CloudFormation stack is used.
+ NodeAutoScalingGroupMinSize

  Type: String

  Description: (Optional) The minimum number of nodes that your node group can scale in to. If you don't specify a value for this parameter, the node group minimum size specified in your existing CloudFormation stack is used.
+ NodeImageId

  Type: String

  Description: (Optional) The ID of the Amazon Machine Image (AMI) that you want the node group to use.
+ NodeImageIdSSMParam

  Type: String

  Description: (Optional) The public Systems Manager parameter for the AMI that you want the node group to use.
+ NodeVolumeSize

  Type: String

  Description: (Optional) The size of the root volume for your nodes in GiB. If you don't specify a value for this parameter, the node volume size specified in your existing CloudFormation stack is used.
+ NodeVolumeType

  Type: String

  Description: (Optional) The type of Amazon EBS volume you want to use for the root volume of your nodes. If you don't specify a value for this parameter, the volume type specified in your existing CloudFormation stack is used.
+ KeyName

  Type: String

  Description: (Optional) The key pair you want to assign to your nodes. If you don't specify a value for this parameter, the key pair specified in your existing CloudFormation stack is used.
+ Subnets

  Type: StringList

  Description: (Optional) A comma-separated list of the subnet IDs that you want to use for your new node group. If you don't specify a value for this parameter, the subnets specified in your existing CloudFormation stack is used.
+ DisableIMDSv1

  Type: Boolean

  Description: (Optional) Specify `true` to disable Instance Metadata Service Version 1 (IMDSv1). By default, nodes support IMDSv1 and IMDSv2.
+ BootstrapArguments

  Type: String

  Description: (Optional) Additional arguments you want to pass to the node bootstrap script.

**Required IAM permissions**

The `AutomationAssumeRole` parameter requires the following actions to use the runbook successfully.
+ `ssm:StartAutomationExecution`
+ `ssm:GetAutomationExecution`
+ `ssm:GetParameters`
+ `autoscaling:CreateAutoScalingGroup`
+ `autoscaling:CreateOrUpdateTags`
+ `autoscaling:DeleteTags`
+ `autoscaling:DescribeAutoScalingGroups`
+ `autoscaling:DescribeScalingActivities`
+ `autoscaling:DescribeScheduledActions`
+ `autoscaling:SetDesiredCapacity`
+ `autoscaling:TerminateInstanceInAutoScalingGroup`
+ `autoscaling:UpdateAutoScalingGroup`
+ `cloudformation:CreateStack`
+ `cloudformation:DescribeStackResource`
+ `cloudformation:DescribeStacks`
+ `cloudformation:UpdateStack`
+ `ec2:AuthorizeSecurityGroupEgress`
+ `ec2:AuthorizeSecurityGroupIngress`
+ `ec2:CreateLaunchTemplateVersion`
+ `ec2:CreateLaunchTemplate`
+ `ec2:CreateSecurityGroup`
+ `ec2:CreateTags`
+ `ec2:DeleteLaunchTemplate`
+ `ec2:DeleteSecurityGroup`
+ `ec2:DescribeAvailabilityZones`
+ `ec2:DescribeImages`
+ `ec2:DescribeInstanceAttribute`
+ `ec2:DescribeInstanceStatus`
+ `ec2:DescribeInstances`
+ `ec2:DescribeKeyPairs`
+ `ec2:DescribeLaunchTemplateVersions`
+ `ec2:DescribeLaunchTemplates`
+ `ec2:DescribeSecurityGroups`
+ `ec2:DescribeSubnets`
+ `ec2:DescribeVpcs`
+ `ec2:RevokeSecurityGroupEgress`
+ `ec2:RevokeSecurityGroupIngress`
+ `ec2:RunInstances`
+ `ec2:TerminateInstances`
+ `iam:AddRoleToInstanceProfile`
+ `iam:AttachRolePolicy`
+ `iam:CreateInstanceProfile`
+ `iam:CreateRole`
+ `iam:GetInstanceProfile`
+ `iam:GetRole`
+ `iam:PassRole`

 **Document Steps** 
+ DetermineParameterValuesForNewNodeGroup (aws:executeScript) - Gathers the parameter values to use for the new node group.
+ CreateStack (aws:createStack) - Creates the CloudFormation stack for the new node group. 
+ GetNewStackNodeInstanceRole (aws:executeAwsApi) - Gets the node instance role. 
+ GetNewStackSecurityGroup (aws:executeAwsApi) - The step gets the node security group. 
+ AddIngressRulesToNewNodeSecurityGroup (aws:executeAwsApi) - Adds ingress rules to the newly created security group so it can accept traffic from the one assigned to your previous node group. 
+ AddIngressRulesToOldNodeSecurityGroup (aws:executeAwsApi) - Adds ingress rules to the previous security group so it can accept traffic from the one assigned to your newly created node group. 
+ VerifyStackComplete (aws:assertAwsResourceProperty) - Verifies the new stack status is `CREATE_COMPLETE`. 

 **Outputs** 

DetermineParameterValuesForNewNodeGroup.NewStackParameters - The parameters used to create the new stack.

GetNewStackNodeInstanceRole.NewNodeInstanceRole - The node instance role for the new node group.

GetNewStackSecurityGroup.NewNodeSecurityGroup - The ID of the security group for the new node group.

DetermineParameterValuesForNewNodeGroup.NewStackName - The CloudFormation stack name for the new node group.

CreateStack.StackId - The CloudFormation stack ID for the new node group.

# `AWSPremiumSupport-TroubleshootEKSCluster`
<a name="automation-awspremiumsupport-troubleshootekscluster"></a>

 **Description** 

 The `AWSPremiumSupport-TroubleshootEKSCluster` runbook diagnoses common issues with an Amazon Elastic Kubernetes Service (Amazon EKS) cluster, underlying infrastructure, and provides recommended remediation steps. 

**Important**  
Access to `AWSPremiumSupport-*` runbooks requires a Business \$1 Support, Enterprise Support or Unified Operations Subscription. For more information, see [Compare AWS Support Plans](https://aws.amazon.com/premiumsupport/plans/).

 If you specify a value for the `S3BucketName` parameter, the automation evaluates the policy status of the Amazon Simple Storage Service (Amazon S3) bucket you specify. To help with the security of the logs gathered from your EC2 instance, if the policy status `isPublic` is set to `true` , or if the access control list (ACL) grants `READ|WRITE` permissions to the `All Users` Amazon S3 predefined group, the logs are not uploaded. For more information about Amazon S3 predefined groups, see [ Amazon S3 predefined groups](https://docs.aws.amazon.com/AmazonS3/latest/userguide/acl-overview.html#specifying-grantee-predefined-groups) in the *Amazon Simple Storage Service User Guide* . 

 [Run this Automation (console)](https://console.aws.amazon.com/systems-manager/automation/execute/AWSPremiumSupport-TroubleshootEKSCluster) 

**Document type**

Automation

**Owner**

Amazon

**Platforms**

Linux, macOS, Windows

**Parameters**
+ AutomationAssumeRole

  Type: String

  Description: (Optional) The Amazon Resource Name (ARN) of the AWS Identity and Access Management (IAM) role that allows Systems Manager Automation to perform the actions on your behalf. If no role is specified, Systems Manager Automation uses the permissions of the user that starts this runbook.
+ ClusterName

  Type: String

  Description: (Required) The name of the Amazon EKS cluster that you want to troubleshoot.
+ S3BucketName

  Type: String

  Description: (Required) The name of the private Amazon S3 bucket where the report generated by the runbook should be uploaded.

**Required IAM permissions**

The `AutomationAssumeRole` parameter requires the following actions to use the runbook successfully.
+  `ssm:StartAutomationExecution` 
+  `ssm:GetAutomationExecution` 
+  `ec2:DescribeInstances` 
+  `ec2:DescribeInstanceTypes` 
+  `ec2:DescribeSubnets` 
+  `ec2:DescribeSecurityGroups` 
+  `ec2:DescribeRouteTables` 
+  `ec2:DescribeNatGateways` 
+  `ec2:DescribeVpcs` 
+  `ec2:DescribeNetworkAcls` 
+  `iam:GetInstanceProfile` 
+  `iam:ListInstanceProfiles` 
+  `iam:ListAttachedRolePolicies` 
+  `eks:DescribeCluster` 
+  `eks:ListNodegroups` 
+  `eks:DescribeNodegroup` 
+  `autoscaling:DescribeAutoScalingGroups` 

 In addition, the AWS Identity and Access Management (IAM) policy attached to the user or role that starts the automation must allow the `ssm:GetParameter` operation to the following public AWS Systems Manager parameters to get the latest recommended Amazon EKS Amazon Machine Image (AMI) for the worker nodes. 
+  `arn:aws:ssm:::parameter/aws/service/eks/optimized-ami/*/amazon-linux-2/recommended/image_id` 
+  `arn:aws:ssm:::parameter/aws/service/ami-windows-latest/Windows_Server-2019-English-Core-EKS_Optimized-*/image_id` 
+  `arn:aws:ssm:::parameter/aws/service/ami-windows-latest/Windows_Server-2019-English-Full-EKS_Optimized-*/image_id` 
+  `arn:aws:ssm:::parameter/aws/service/ami-windows-latest/Windows_Server-1909-English-Core-EKS_Optimized-*/image_id` 
+  `arn:aws:ssm:::parameter/aws/service/eks/optimized-ami/*/amazon-linux-2-gpu/recommended/image_id` 

To upload the report generated by the runbook to an Amazon S3 bucket, the following permissions are required for the specified Amazon S3 bucket you specify.
+  `s3:GetBucketPolicyStatus` 
+  `s3:GetBucketAcl` 
+  `s3:PutObject` 

 **Document Steps** 
+  `aws:executeAwsApi` - Gathers details for the specified Amazon EKS cluster. 
+  `aws:executeScript` - Gathers details of the Amazon Elastic Compute Cloud (Amazon EC2) instances, Auto Scaling groups, AMIs, and Amazon EC2 GPU graphic instance types. 
+  `aws:executeScript` - Gathers details of the virtual private cloud (VPC), subnets, network address translation (NAT) gateways, subnet routes, security groups and network access control lists (ACLs) of the Amazon EKS cluster. 
+  `aws:executeScript` - Gathers details of attached IAM instance profiles and role policies. 
+  `aws:executeScript` - Gathers details of the Amazon S3 bucket you specify in the `S3BucketName` parameter. 
+  `aws:executeScript` - Classifies the Amazon VPC subnets as public or private. 
+  `aws:executeScript` - Checks the Amazon VPC subnets for tags that are required as part of an Amazon EKS cluster. 
+  `aws:executeScript` - Checks the Amazon VPC subnets for the tags that are required for Elastic Load Balancing subnets. 
+  `aws:executeScript` - Checks if the worker node Amazon EC2 instances use the latest Amazon EKS optimized AMIs 
+  `aws:executeScript` - Checks if the Amazon VPC security groups attached to worker nodes for the tags that are required. 
+  `aws:executeScript` - Checks the Amazon EKS cluster and worker node Amazon VPC security group rules for the recommended ingress rules to the Amazon EKS cluster. 
+  `aws:executeScript` - Checks the Amazon EKS cluster and worker node Amazon VPC security group rules for the recommended egress rules from the Amazon EKS cluster. 
+  `aws:executeScript` - Checks the network ACL configuration of the Amazon VPC subnets. 
+  `aws:executeScript` - Checks if the worker node Amazon EC2 instances have the required managed policies. 
+  `aws:executeScript` - Checks if the Auto Scaling groups have the necessary tags for cluster autoscaling. 
+  `aws:executeScript` - Checks if the worker node Amazon EC2 instances are connected to the internet. 
+  `aws:executeScript` - Generates a report based on the outputs from the previous steps. If a value is specified for the `S3BucketName` parameter, the generated report is uploaded to the Amazon S3 bucket. 

# `AWSSupport-TroubleshootEKSWorkerNode`
<a name="automation-awssupport-troubleshooteksworkernode"></a>

 **Description** 

 The `AWSSupport-TroubleshootEKSWorkerNode` runbook analyzes an Amazon Elastic Compute Cloud (Amazon EC2) worker node and Amazon Elastic Kubernetes Service (Amazon EKS) cluster to help you identify and troubleshoot common causes that prevent worker nodes from joining a cluster. The runbook outputs guidance to help you resolve any issues that are identified. 

**Important**  
 To successfully run this automation, the state of your Amazon EC2 worker node must be `running` , and the Amazon EKS cluster state must be `ACTIVE` . 

 [Run this Automation (console)](https://console.aws.amazon.com/systems-manager/automation/execute/AWSSupport-TroubleshootEKSWorkerNode) 

**Document type**

Automation

**Owner**

Amazon

**Platforms**

Linux

**Parameters**
+ AutomationAssumeRole

  Type: String

  Description: (Optional) The Amazon Resource Name (ARN) of the AWS Identity and Access Management (IAM) role that allows Systems Manager Automation to perform the actions on your behalf. If no role is specified, Systems Manager Automation uses the permissions of the user that starts this runbook.
+ ClusterName

  Type: String

  Description: (Required) The name of the Amazon EKS cluster.
+ WorkerID

  Type: String

  Description: (Required) The ID of the Amazon EC2 worker node that failed to join the cluster.

**Required IAM permissions**

The `AutomationAssumeRole` parameter requires the following actions to use the runbook successfully.
+  `ec2:DescribeDhcpOptions` 
+  `ec2:DescribeImages` 
+  `ec2:DescribeInstanceAttribute` 
+  `ec2:DescribeInstances` 
+  `ec2:DescribeInstanceStatus` 
+  `ec2:DescribeNatGateways` 
+  `ec2:DescribeNetworkAcls` 
+  `ec2:DescribeNetworkInterfaces` 
+  `ec2:DescribeRouteTables` 
+  `ec2:DescribeSecurityGroups` 
+  `ec2:DescribeSubnets` 
+  `ec2:DescribeVpcAttribute` 
+  `ec2:DescribeVpcEndpoints` 
+  `ec2:DescribeVpcs` 
+  `eks:DescribeCluster` 
+  `iam:GetInstanceProfile` 
+  `iam:GetRole` 
+  `iam:ListAttachedRolePolicies` 
+  `ssm:DescribeInstanceInformation` 
+  `ssm:ListCommandInvocations` 
+  `ssm:ListCommands` 
+  `ssm:SendCommand` 

 **Document Steps** 
+  `aws:assertAwsResourceProperty` - Confirms that the Amazon EKS cluster you specify in the `ClusterName` parameter exists and is in an `ACTIVE` state. 
+  `aws:assertAwsResourceProperty` - Confirms that the Amazon EC2 worker node you specify in the `WorkerID` parameter exists and is in a `running` state. 
+  `aws:executeScript` - Runs a Python script that helps identify possible causes for the worker node failing to join the cluster. 

# `AWS-UpdateEKSCluster`
<a name="automation-updateekscluster"></a>

 **Description** 

 The `AWS-UpdateEKSCluster` runbook helps you update your Amazon Elastic Kubernetes Service (Amazon EKS) cluster to the Kubernetes version that you want to use. 

 [Run this Automation (console)](https://console.aws.amazon.com/systems-manager/automation/execute/AWS-UpdateEKSCluster) 

**Document type**

Automation

**Owner**

Amazon

**Platforms**

Linux, macOS, Windows

**Parameters**
+ AutomationAssumeRole

  Type: String

  Description: (Optional) The Amazon Resource Name (ARN) of the AWS Identity and Access Management (IAM) role that allows Systems Manager Automation to perform the actions on your behalf. If no role is specified, Systems Manager Automation uses the permissions of the user that starts this runbook.
+ ClusterName

  Type: String

  Description: (Required) The name of your Amazon EKS cluster.
+ Version

  Type: String

  Description: (Required) The Kubernetes version that you want to update your cluster to.

**Required IAM permissions**

The `AutomationAssumeRole` parameter requires the following actions to use the runbook successfully.
+  `eks:DescribeUpdate` 
+  `eks:UpdateClusterVersion` 

 **Document Steps** 
+  `aws:executeAwsApi` - Updates the Kubernetes version that is used by your Amazon EKS cluster. 
+  `aws:waitForAwsResourceProperty` - Waits for the update status to be `Successful`. 

# `AWS-UpdateEKSManagedNodeGroup`
<a name="aws-updateeksmanagednodegroup"></a>

 **Description** 

The `AWS-UpdateEKSManagedNodeGroup` runbook helps you update an Amazon Elastic Kubernetes Service (Amazon EKS) managed node group. You can either choose a `Version` or `Configuration` update.

 [Run this Automation (console)](https://console.aws.amazon.com/systems-manager/automation/execute/AWS-UpdateEKSManagedNodeGroup) 

**Document type**

Automation

**Owner**

Amazon

**Platforms**

Linux, macOS, Windows

**Parameters**
+ AutomationAssumeRole

  Type: String

  Description: (Optional) The Amazon Resource Name (ARN) of the AWS Identity and Access Management (IAM) role that allows Systems Manager Automation to perform the actions on your behalf. If no role is specified, Systems Manager Automation uses the permissions of the user that starts this runbook.
+ ClusterName

  Type: String

  Description: (Required) The name of the cluster whose node group you want to update.
+ NodeGroupName

  Type: String

  Description: (Required) The name of the node group to update.
+ UpdateType

  Type: String

  Valid values: Update Node Group Version \$1 Update Node Group Configurations

  Default: Update Node Group Version

  Description: (Required) The type of update that you want to perform on the node group.

The following parameters apply only to the `Version` update type:
+ AMIReleaseVersion

  Type: String

  Description: (Optional) The version of the Amazon EKS optimized AMI that you want to use. By default, the latest version is used.
+ ForceUpgrade

  Type: Boolean

  Description: (Optional) If true, the update won't fail in response to a pod disruption budget violation.
+ KubernetesVersion

  Type: String

  Description: (Optional) The Kubernetes version to update the node group to.
+ LaunchTemplateId

  Type: String

  Description: (Optional) The ID of the launch template.
+ LaunchTemplateName

  Type: String

  Description: (Optional) The name of the launch template.
+ LaunchTemplateVersion

  Type: String

  Description: (Optional) The Amazon Elastic Compute Cloud (Amazon EC2) launch template version. This parameter is only valid if a node group was created from a launch template.

The following parameters apply only to the `Configuration` update type:
+ AddOrUpdateNodeGroupLabels

  Type: StringMap

  Description: (Optional) Kubernetes labels that you want to add or update.
+ AddOrUpdateKubernetesTaintsEffect

  Type: StringList

  Description: (Optional) The Kubernetes taints that you want to add or update.
+ MaxUnavailableNodeGroups

  Type: Integer

  Default: 0

  Description: (Optional) The maximum number of nodes that are unavailable at once during a version update.
+ MaxUnavailablePercentageNodeGroup

  Type: Integer

  Default: 0

  Description: (Optional) The percentage of nodes that are unavailable during a version update.
+ NodeGroupDesiredSize

  Type: Integer

  Default: 0

  Description: (Optional) The number of nodes that the managed node group should maintain.
+ NodeGroupMaxSize

  Type: Integer

  Default: 0

  Description: (Optional) The maximum number of nodes that the managed node group can scale out to.
+ NodeGroupMinSize

  Type: Integer

  Default: 0

  Description: (Optional) The minimum number of nodes that the managed node group can scale in to.
+ RemoveKubernetesTaintsEffect

  Type: StringList

  Description: (Optional) The Kubernetes taints that you want to remove.
+ RemoveNodeGroupLabels

  Type: StringList

  Description: (Optional) A comma-separated list of labels that you want to remove.

**Required IAM permissions**

The `AutomationAssumeRole` parameter requires the following actions to use the runbook successfully.
+  `eks:UpdateNodegroupConfig` 
+  `eks:UpdateNodegroupVersion` 

 **Document Steps** 
+  `aws:executeScript` - Updates an Amazon EKS cluster node group according to the values that you specify for the runbook input parameters. 
+  `aws:waitForAwsResourceProperty` - Waits for the cluster update status to be `Successful`. 

# `AWS-UpdateEKSSelfManagedLinuxNodeGroups`
<a name="aws-updateeksselfmanagedlinuxnodegroup"></a>

 **Description** 

The `AWS-UpdateEKSSelfManagedLinuxNodeGroups` runbook updates self-managed managed node groups in your Amazon Elastic Kubernetes Service (Amazon EKS) cluster using an AWS CloudFormation stack.

If your cluster uses auto scaling, we recommend scaling the deployment down to two replicas before using this runbook.

**To scale a deployment to two replicas**

1.  Install the Kubernetes command line utility, `kubectl`. For more information, see [Installing kubectl](https://docs.aws.amazon.com/eks/latest/userguide/install-kubectl.html) in the *Amazon EKS User Guide*. 

1. Run the following command.

   ```
   kubectl scale deployments/cluster-autoscaler --replicas=2 -n kube-system
   ```

1. Run the `AWS-UpdateEKSSelfManagedLinuxNodeGroups` runbook. 

1. Scale the deployment back to the desired number of replicas by running the following command.

   ```
   kubectl scale deployments/cluster-autoscaler --replicas=number -n kube-system
   ```

 [Run this Automation (console)](https://console.aws.amazon.com/systems-manager/automation/execute/AWS-UpdateEKSSelfManagedLinuxNodeGroups) 

**Document type**

Automation

**Owner**

Amazon

**Platforms**

Linux, macOS, Windows

**Parameters**
+ AutomationAssumeRole

  Type: String

  Description: (Optional) The Amazon Resource Name (ARN) of the AWS Identity and Access Management (IAM) role that allows Systems Manager Automation to perform the actions on your behalf. If no role is specified, Systems Manager Automation uses the permissions of the user that starts this runbook.
+ ClusterName

  Type: String

  Description: (Required) The name of the Amazon EKS cluster.
+ NodeGroupName

  Type: String

  Description: (Required) The name of the managed node group.
+ ClusterControlPlaneSecurityGroup

  Type: String

  Description: (Required) The ID of the control plane security group.
+ DisableIMDSv1

  Type: Boolean

  Description: (Optional) Determines whether you want to allow Instance Metadata Service Version 1 (IMDSv1) and IMDSv2.
+ KeyName

  Type: String

  Description: (Optional) The key name for the instances.
+ NodeAutoScalingGroupDesiredCapacity

  Type: String

  Description: (Optional) The number of nodes that the node group should maintain.
+ NodeAutoScalingGroupMaxSize

  Type: String

  Description: (Optional) The maximum number of nodes that the node group can scale out to.
+ NodeAutoScalingGroupMinSize

  Type: String

  Description: (Optional) The minimum number of nodes that the node group can scale in to.
+ NodeInstanceType

  Type: String

  Default: t3.large

  Description: (Optional) The instance type that you want to use for the node group.
+ NodeImageId

  Type: String

  Description: (Optional) The ID of the Amazon Machine Image (AMI) that you want the node group to use.
+ NodeImageIdSSMParam

  Type: String

  Default: /aws/service/eks/optimized-ami/1.21/amazon-linux-2/recommended/image\$1id

  Description: (Optional) The public Systems Manager parameter for the AMI that you want the node group to use.
+ StackName

  Type: String

  Description: (Required) The name of the CloudFormation stack used to update the node group.
+ Subnets

  Type: String

  Description: (Required) A comma-separated list of the IDs for the subnets that you want your cluster to use.
+ VpcId

  Type: String

  Default: Default

  Description: (Required) The virtual private cloud (VPC) where your cluster is deployed.

**Required IAM permissions**

The `AutomationAssumeRole` parameter requires the following actions to use the runbook successfully.
+  `eks:CreateCluster` 
+  `eks:CreateNodegroup` 
+  `eks:DeleteNodegroup` 
+  `eks:DeleteCluster` 
+  `eks:DescribeCluster` 
+  `eks:DescribeNodegroup` 
+  `eks:ListClusters` 
+  `eks:ListNodegroups` 
+  `eks:UpdateClusterConfig` 
+  `eks:UpdateNodegroupConfig` 

 **Document Steps** 
+  `aws:executeScript` - Updates an Amazon EKS cluster node group according to the values that you specify for the runbook input parameters.
+  `aws:waitForAwsResourceProperty` - Waits for the CloudFormation stack update status to be returned. 

# `AWSSupport-CollectEKSLinuxNodeStatistics`
<a name="automation-awssupport-collectekslinuxnodestatistics"></a>

 **Description** 

The `AWSSupport-CollectEKSLinuxNodeStatistics` runbook collects Linux statistics from an Amazon EC2 instance that is part of an Amazon EKS cluster, and from a container running on the instance if a `containerd` container ID is specified. The Amazon EC2 instance has to be managed by AWS Systems Manager.

The host-level Linux statistics collected include:
+ OS information.
+ Network interface statistics - from `ethtool` and `/sys/class/net/interface/statistics` directory.
+ File descriptors counts.
+ Ephemeral ports counts.
+ A dump of `iptables` rules.
+ Check for a full conntrack table.

The container-level Linux statistics include:
+ Identifier information - image URI and labels.
+ Network interface statistics - from `ethtool` and `/sys/class/net/interface/statistics` directory.
+ Traceroute and DNS results if the `NetworkTargets` parameter is populated.
+ Packet capture analysis counts - TCP Retransmissions, Out of Order packets etc.

The runbook collects data from various Linux distributions including Amazon Linux 2, Amazon Linux 2023 and Debian/Ubuntu. It uses the latest versions of the following images from the Amazon ECR public gallery:
+ `amazon-ecs-network-sidecar` image to gain access to troubleshooting tools.
+ `aws-cli` image to upload the statistics report JSON file and packet capture files to the specified Amazon S3 bucket.

**Important**  
This runbook does not support Fargate instances. This runbook may fail if the instance is shutdown or disconnected during execution.

 **How does it work?** 

The runbook performs the following actions:
+ Verifies the target Amazon S3 bucket does not grant public read or write access.
+ Ensures the target Amazon EC2 instance is managed by Systems Manager and is in a running state.
+ Verifies the instance is running a Linux operating system.
+ Collects comprehensive Linux statistics from the Amazon EC2 instance and optionally from a specified container.
+ Uploads the collected statistics to the specified Amazon S3 bucket.

 [Run this Automation (console)](https://console.aws.amazon.com/systems-manager/automation/execute/AWSSupport-CollectEKSLinuxNodeStatistics) 

**Required IAM permissions**

The `AutomationAssumeRole` parameter requires the following actions to use the runbook successfully.

The `AutomationAssumeRole` parameter requires the following actions:
+ `s3:GetAccountPublicAccessBlock`
+ `s3:GetBucketPublicAccessBlock`
+ `s3:GetBucketAcl`
+ `s3:GetBucketPolicyStatus`
+ `s3:GetBucketLocation`
+ `s3:GetEncryptionConfiguration`
+ `s3:PutObject`
+ `ssm:DescribeInstanceInformation`
+ `ssm:SendCommand`
+ `ssm:GetCommandInvocation`
+ `ec2:DescribeInstances`

Example IAM policy:

```
{
    "Version": "2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetAccountPublicAccessBlock"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetBucketPublicAccessBlock",
                "s3:GetBucketAcl",
                "s3:GetBucketPolicyStatus",
                "s3:GetBucketLocation",
                "s3:GetEncryptionConfiguration"
            ],
            "Resource": "arn:aws:s3:::S3_BUCKET_NAME"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject"
            ],
            "Resource": "arn:aws:s3:::S3_BUCKET_NAME/*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ssm:DescribeInstanceInformation"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ssm:SendCommand"
            ],
            "Resource": [
                "arn:aws:ssm:*:*:document/AWS-RunShellScript",
                "arn:aws:ec2:*:111122223333:instance/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "ssm:GetCommandInvocation"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeInstances"
            ],
            "Resource": "*"
        }
    ]
}
```

 **Instructions** 

Follow these steps to configure the automation:

1. Navigate to [https://console.aws.amazon.com/systems-manager/documents/AWSSupport-CollectEKSLinuxNodeStatistics/description](https://console.aws.amazon.com/systems-manager/documents/AWSSupport-CollectEKSLinuxNodeStatistics/description) in Systems Manager under Documents.

1. Select Execute automation.

1. For the input parameters, enter the following:
   + **AutomationAssumeRole (Optional):**

     The Amazon Resource Name (ARN) of the IAM role that allows Systems Manager Automation to perform the actions on your behalf. If no role is specified, Systems Manager Automation uses the permissions of the user that starts this runbook.
   + **InstanceId (Required):**

     The ID of the Amazon EC2 Instance to collect statistics.
   + **S3BucketName (Required):**

     Name of the Amazon S3 bucket to export the JSON output from the Amazon EC2 instance as a file.
   + **S3KeyPrefix (Optional):**

     The Amazon S3 key prefix (sub-folder) to export the JSON output from the Amazon EC2 instance as a file. Default: `AWSSupport-CollectEKSLinuxNodeStatistics`.
   + **S3BucketOwnerRoleArn (Optional):**

     The ARN of the IAM role with permissions to get the Amazon S3 bucket and account block public access settings, bucket encryption configuration, the bucket ACLs, the bucket policy status, and upload objects to the bucket. If this parameter is not specified, the runbook uses the `AutomationAssumeRole` (if specified) or user that starts this runbook (if `AutomationAssumeRole` is not specified).
   + **S3BucketOwnerAccount (Optional):**

     The AWS account that owns the Amazon S3 bucket. If you do not specify this parameter, the runbook assumes that the bucket is in this account.
   + **ContainerId (Optional):**

     The ID of a container running on the specified Amazon EC2 instance.
   + **NetworkTargets (Optional):**

     A comma-separated list of IPv4 addresses and/or DNS names to test DNS resolution, and connectivity using traceroute.

1. Select Execute.

1. The automation initiates.

1. The document performs the following steps:
   + **`CheckBucketAccess`**:

     Checks if the target Amazon S3 bucket potentially grants read and/or write public access to its objects.
   + **`AssertInstanceIsSSMManaged`**:

     Ensures the target Amazon EC2 instance is managed by Systems Manager, otherwise the automation ends.
   + **`VerifyInstanceState`**:

     Verifies that the Amazon EC2 instance is in a running state before attempting to collect statistics.
   + **`BranchOnVerifyLinuxInstance`**:

     Verifies that the instance is a Linux instance before proceeding.
   + **`BranchOnVerifyInstanceRunning`**:

     Verifies that the instance is in a running state before proceeding.
   + **`CollectEKSLinuxNodeStatistics`**:

     Collects comprehensive Linux statistics from the Amazon EC2 instance including OS information, network interface statistics, file descriptors, ephemeral ports, firewall rules, and optionally container-level statistics.
   + **`GenerateStatisticsOutputS3Uri`**:

     Generates the full Amazon S3 URI to the Linux statistics files to be used as the automation document's output.

1. After completed, review the Outputs section for the detailed results of the execution.

**References**

Systems Manager Automation
+ [Run this Automation (console)](https://console.aws.amazon.com/systems-manager/documents/AWSSupport-CollectEKSLinuxNodeStatistics/description)
+ [Run an automation](https://docs.aws.amazon.com//systems-manager/latest/userguide/automation-working-executing.html)
+ [Setting up an Automation](https://docs.aws.amazon.com//systems-manager/latest/userguide/automation-setup.html)
+ [Support Automation Workflows](https://aws.amazon.com/premiumsupport/technology/saw/)

# `AWSSupport-CollectEKSInstanceLogs`
<a name="automation-awssupport-collecteksinstancelogs"></a>

 **Description** 

 The `AWSSupport-CollectEKSInstanceLogs` runbook gathers operating system and Amazon Elastic Kubernetes Service (Amazon EKS) related log files from an Amazon Elastic Compute Cloud (Amazon EC2) instance to help you troubleshoot common issues. While the automation is gathering the associated log files, changes are made to the file system structure including the creation of temporary directories, the copying of log files to the temporary directories, and compressing the log files into an archive. This activity can result in increased `CPUUtilization` on the Amazon EC2 instance. For more information about `CPUUtilization` , see [Instance metrics](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/viewing_metrics_with_cloudwatch.html#ec2-cloudwatch-metrics) in the *Amazon CloudWatch User Guide* . 

 If you specify a value for the `LogDestination` parameter, the automation evaluates the policy status of the Amazon Simple Storage Service (Amazon S3) bucket you specify. To help with the security of the logs gathered from your Amazon EC2 instance, if the policy status `isPublic` is set to `true` , or if the access control list (ACL) grants `READ|WRITE` permissions to the `All Users` Amazon S3 predefined group, the logs are not uploaded. For more information about Amazon S3 predefined groups, see [ Amazon S3 predefined groups](https://docs.aws.amazon.com/AmazonS3/latest/userguide/acl-overview.html#specifying-grantee-predefined-groups) in the *Amazon Simple Storage Service User Guide* . 

**Note**  
This automation requires at least 10 percent of available disk space on the root Amazon Elastic Block Store (Amazon EBS) volume attached to your Amazon EC2 instance. If there is not enough available disk space on the root volume, the automation stops.

 [Run this Automation (console)](https://console.aws.amazon.com/systems-manager/automation/execute/AWSSupport-CollectEKSInstanceLogs) 

**Document type**

Automation

**Owner**

Amazon

**Platforms**

Linux

**Parameters**
+ AutomationAssumeRole

  Type: String

  Description: (Optional) The Amazon Resource Name (ARN) of the AWS Identity and Access Management (IAM) role that allows Systems Manager Automation to perform the actions on your behalf. If no role is specified, Systems Manager Automation uses the permissions of the user that starts this runbook.
+ EKSInstanceId

  Type: String

  Description: (Required) ID of the Amazon EKS Amazon EC2 instance you want to collect logs from.
+ LogDestination

  Type: String

  Description: (Optional) The Amazon Simple Storage Service (Amazon S3) bucket in your account to upload the archived logs to.

**Required IAM permissions**

The `AutomationAssumeRole` parameter requires the following actions to use the runbook successfully.
+  `ssm:StartAutomationExecution` 
+  `ssm:GetAutomationExecution` 
+  `ssm:SendCommand`

 **Required IAM permissions for the Amazon EC2 instance profile** 

The instance profile used by the `EKSInstanceId` must have the **AmazonSSMManagedInstanceCore** Amazon managed policy attached to it. 

 It also has to be able to access the `LogDestination` Amazon S3 bucket so that it could upload the collected logs. Below is an example of an IAM policy that could be attached to that instance profile:

```
{
  "Version": "2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetBucketPolicyStatus",
        "s3:GetBucketAcl"
      ],
      "Resource": [
        "arn:aws:s3:::LogDestination/*",
        "arn:aws:s3:::LogDestination"
      ]
    }
  ]
}
```

If `LogDestination` uses AWS KMS encryption, then an additional statement must be added to the IAM policy, granting access to the AWS KMS key used in the encryption:

```
{
  "Version": "2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetBucketPolicyStatus",
        "s3:GetBucketAcl"
      ],
      "Resource": [
        "arn:aws:s3:::LogDestination/*",
        "arn:aws:s3:::LogDestination"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "kms:Decrypt",
        "kms:GenerateDataKey"
      ],
      "Resource": "arn:aws:kms:REGION:ACCOUNT:key/KMS-KEY-ID"
    }
  ]
}
```

 **Document Steps** 
+  `aws:assertAwsResourceProperty` - Confirms the operating system of the value specified in the `EKSInstanceId` parameter is Linux. 
+  `aws:runCommand` - Gathers operating system and Amazon EKS related log files, compressing them into an archive in the `/var/log` directory. 
+  `aws:branch` - Confirms whether a value was specified for the `LogDestination` parameter. 
+  `aws:runCommand` - Uploads the log archive to the Amazon S3 bucket you specify in the `LogDestination` parameter. 

# `AWSSupport-SetupK8sApiProxyForEKS`
<a name="automation-awssupport-setupk8sapiproxyforeks"></a>

 **Description** 

The **AWSSupport-SetupK8sApiProxyForEKS** automation runbook provides a way to create an AWS Lambda function that acts as a proxy for making control plane API calls to the Amazon Elastic Kubernetes Service cluster endpoint. It serves as a building block for runbooks which require making control plane API calls for automating tasks and troubleshooting issues with an Amazon EKS cluster.

**Important**  
All the resources created by this automation are tagged so that they can be easily found. The tags used are:  
 `AWSSupport-SetupK8sApiProxyForEKS`: true 

**Note**  
The automation is a helper runbook and cannot be executed as a standalone runbook. It is invoked as a child automation for runbooks which require control plane API calls to Amazon EKS cluster.
Please ensure to run `Cleanup` operation after usage to avoid incurring unwanted costs.

**Document type**

Automation

**Owner**

Amazon

**Platforms**

Linux

**Parameters**
+ AutomationAssumeRole

  Type: String

  Description: (Optional) The Amazon Resource Name (ARN) of the AWS Identity and Access Management (IAM) role that allows Systems Manager Automation to perform the actions on your behalf. If no role is specified, Systems Manager Automation uses the permissions of the user that starts this runbook.
+ ClusterName

  Type: String

  Description: (Required) The name of the Amazon Elastic Kubernetes Service cluster.
+ Operation

  Type: String

  Description: (Required) Operation to perform: `Setup` provisions the Lambda function in the account, `Cleanup` will de-provision resources created as part of setup phase.

  Allowed Values: `Setup` \$1 `Cleanup`

  Default: Setup
+ LambdaRoleArn

  Type: String

  Description: (Optional) The ARN of the IAM role that allows the AWS Lambda function to access the required AWS services and resources. If no role is specified, this Systems Manager Automation will create one IAM role for Lambda in your account with the name `Automation-K8sProxy-Role-<ExecutionId>` that includes the managed policies: `AWSLambdaBasicExecutionRole` and `AWSLambdaVPCAccessExecutionRole`.

 **How does it work?** 

 The runbook performs the following steps: 
+ Validates that the automation is running as a child execution. The runbook will not work when invoked as a standalone runbook since it does not perform any meaningful work on its own.
+ Checks for existing CloudFormation stack for the proxy Lambda function for the specified cluster.
  + If the stack exists, the existing infrastructure is re-used instead of re-creating it.
  + A reference counter is maintained using tags to ensure a runbook does not delete the infrastructure if it is being re-used by another runbook for the same cluster.
+ Perform the operation type (`Setup`/`Cleanup`) specified for the invocation:
  + **Setup:** Creates or describes existing resources.

    **Cleanup:** Removes provisioned resources, if the infrastructure is not being used by any other runbook.

 **Required IAM Permissions** 

The `AutomationAssumeRole` parameter requires the following permissions given ` LambdaRoleArn` is not passed:
+  `cloudformation:CreateStack` 
+  `cloudformation:DescribeStacks` 
+  `cloudformation:DeleteStack` 
+  `cloudformation:UpdateStack` 
+  `ec2:CreateNetworkInterface` 
+  `ec2:DescribeNetworkInterfaces` 
+  `ec2:DescribeRouteTables` 
+  `ec2:DescribeSecurityGroups` 
+  `ec2:DescribeSubnets` 
+  `ec2:DescribeVpcs` 
+  `ec2:DeleteNetworkInterface` 
+  `eks:DescribeCluster` 
+  `lambda:CreateFunction` 
+  `lambda:DeleteFunction` 
+  `lambda:ListTags` 
+  `lambda:GetFunction` 
+  `lambda:ListTags` 
+  `lambda:TagResource` 
+  `lambda:UntagResource` 
+  `lambda:UpdateFunctionCode` 
+  `logs:CreateLogGroup` 
+  `logs:PutRetentionPolicy` 
+  `logs:TagResource` 
+  `logs:UntagResource` 
+  `logs:DescribeLogGroups` 
+  `logs:DescribeLogStreams` 
+  `logs:ListTagsForResource` 
+  `iam:CreateRole` 
+  `iam:AttachRolePolicy` 
+  `iam:DetachRolePolicy` 
+  `iam:PassRole` 
+  `iam:GetRole` 
+  `iam:DeleteRole` 
+  `iam:TagRole` 
+  `iam:UntagRole` 
+  `tag:GetResources` 
+  `tag:TagResources` 

When `LambdaRoleArn` is provided, the automation does not need to create the role and the following permissions can be excluded:
+  `iam:CreateRole` 
+  `iam:DeleteRole` 
+  `iam:TagRole` 
+  `iam:UntagRole` 
+  `iam:AttachRolePolicy` 
+  `iam:DetachRolePolicy` 

Below is an example policy demonstrating permissions required for ` AutomationAssumeRole` when `LambdaRoleArn` is not passed:

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Action": [
                "tag:GetResources",
                "tag:TagResources",
                "ec2:CreateNetworkInterface",
                "ec2:DescribeNetworkInterfaces",
                "ec2:DescribeRouteTables",
                "ec2:DescribeSecurityGroups",
                "ec2:DescribeSubnets",
                "ec2:DescribeVpcs",
                "ec2:DeleteNetworkInterface",
                "eks:DescribeCluster",
                "iam:GetRole",
                "cloudformation:DescribeStacks",
                "logs:DescribeLogGroups",
                "logs:DescribeLogStreams",
                "lambda:GetFunction",
                "lambda:ListTags",
                "logs:ListTagsForResource"
            ],
            "Resource": "*",
            "Effect": "Allow",
            "Sid": "AllowActionsWithoutConditions"
        },
        {
            "Condition": {
                "StringEquals": {
                    "aws:RequestTag/AWSSupport-SetupK8sApiProxyForEKS": "true"
                }
            },
            "Action": "iam:CreateRole",
            "Resource": [
                "arn:aws:iam::111122223333:role/Automation-K8sProxy*"
            ],
            "Effect": "Allow",
            "Sid": "AllowCreateRoleWithRequiredTag"
        },
        {
            "Condition": {
                "StringEquals": {
                    "aws:ResourceTag/AWSSupport-SetupK8sApiProxyForEKS": "true"
                }
            },
            "Action": [
                "iam:DeleteRole",
                "iam:TagRole",
                "iam:UntagRole"
            ],
            "Resource": [
                "arn:aws:iam::111122223333:role/Automation-K8sProxy*"
            ],
            "Effect": "Allow",
            "Sid": "IAMActions"
        },
        {
            "Condition": {
                "StringEquals": {
                    "aws:ResourceTag/AWSSupport-SetupK8sApiProxyForEKS": "true"
                },
                "StringLike": {
                    "iam:PolicyARN": [
                        "arn:aws:iam::111122223333:policy/service-role/AWSLambdaBasicExecutionRole",
                        "arn:aws:iam::111122223333:policy/service-role/AWSLambdaVPCAccessExecutionRole"
                    ]
                }
            },
            "Action": [
                "iam:AttachRolePolicy",
                "iam:DetachRolePolicy"
            ],
            "Resource": [
                "arn:aws:iam::111122223333:role/Automation-K8sProxy*"
            ],
            "Effect": "Allow",
            "Sid": "AttachRolePolicy"
        },
        {
            "Condition": {
                "StringEquals": {
                    "aws:ResourceTag/AWSSupport-SetupK8sApiProxyForEKS": "true"
                }
            },
            "Action": [
                "lambda:CreateFunction",
                "lambda:DeleteFunction",
                "lambda:TagResource",
                "lambda:UntagResource",
                "lambda:UpdateFunctionCode"
            ],
            "Resource": "arn:aws:lambda:us-east-1:111122223333:function:Automation-K8sProxy*",
            "Effect": "Allow",
            "Sid": "LambdaActions"
        },
        {
            "Condition": {
                "StringEquals": {
                    "aws:ResourceTag/AWSSupport-SetupK8sApiProxyForEKS": "true"
                }
            },
            "Action": [
                "cloudformation:CreateStack",
                "cloudformation:DeleteStack",
                "cloudformation:UpdateStack"
            ],
            "Resource": "arn:aws:cloudformation:us-east-1:111122223333:stack/AWSSupport-SetupK8sApiProxyForEKS*",
            "Effect": "Allow",
            "Sid": "CloudFormationActions"
        },
        {
            "Condition": {
                "StringEquals": {
                    "aws:ResourceTag/AWSSupport-SetupK8sApiProxyForEKS": "true"
                }
            },
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents",
                "logs:PutRetentionPolicy",
                "logs:TagResource",
                "logs:UntagResource"
            ],
            "Resource": [
                "arn:aws:logs:us-east-1:111122223333:log-group:/aws/lambda/Automation-K8sProxy*",
                "arn:aws:logs:us-east-1:111122223333:log-group:/aws/lambda/Automation-K8sProxy*:*"
            ],
            "Effect": "Allow",
            "Sid": "LogsActions"
        },
        {
            "Condition": {
                "StringLikeIfExists": {
                    "iam:PassedToService": "lambda.amazonaws.com"
                }
            },
            "Action": [
                "iam:PassRole"
            ],
            "Resource": [
                "arn:aws:iam::111122223333:role/Automation-K8sProxy-Role*"
            ],
            "Effect": "Allow",
            "Sid": "PassRoleToLambda"
        }
    ]
}
```

------

 In case the `LambdaRoleArn` is passed, please ensure that it has [ AWSLambdaBasicExecutionRole ](https://console.aws.amazon.com/iam/home?region=us-east-1#/policies/details/arn%3Aaws%3Aiam%3A%3Aaws%3Apolicy%2Fservice-role%2FAWSLambdaBasicExecutionRole) policy attached to it for public cluster and additionally, [ AWSLambdaVPCAccessExecutionRole ](https://console.aws.amazon.com/iam/home?region=us-east-1#/policies/details/arn%3Aaws%3Aiam%3A%3Aaws%3Apolicy%2Fservice-role%2FAWSLambdaVPCAccessExecutionRole) for private clusters. 

 **Resources Created** 

The following resources are created during `Setup` operation:

1. AWS Lambda function

1. IAM Role: Lambda execution role, if not provided.

1. CloudWatch Log Group (Lambda Logs)

 *Lambda function and execution role are retained until `Cleanup` operation is executed. Lambda log group will be retained for 30 days or until manually deleted.* 

 **Instructions** 

The runbook is a helper utility designed to be executed from within other runbooks as a child automation. It facilitates the creation of infrastructure enabling the parent runbook to make Amazon EKS K8s control plane API calls. In order to use the runbook, you can follow the below steps from the context of the parent automation.

1. **Setup Phase**: Invoke the automation using ` aws:executeAutomation` action operation from the runbook that would like to make Amazon EKS K8s control plane API calls with operation set to `Setup`.

   Example of input parameters:

   ```
      {
        "AutomationAssumeRole": "<role-arn>",
        "ClusterName": "<eks-cluster-name>",
        "Operation": "Setup"
      }
   ```

   The output of the `aws:executeAutomation` step will contain the ARN of the proxy Lambda function.

1. **Using the Lambda Proxy**: Invoke the Lambda function inside the `aws:executeScript` action using `boto3`'s ` Lambda.Client.invoke(...)` with a list of API call paths and bearer token. The Lambda function will perform HTTP `GET` calls to the specified path by passing the bearer token as part of authorization header.

   Example of Lambda invoke event:

   ```
      {
          "ApiCalls": ["/api/v1/pods/", ...],
          "BearerToken": "..."
      }
   ```
**Note**  
The bearer token has to be generated as part of the parent automation script. You need to ensure the principal executing the parent runbook has read-only permission to the specified Amazon EKS cluster.

1. **Cleanup Phase**: Invoke the automation using ` aws:executeAutomation` action operation from the runbook that would like to make Amazon EKS K8s control plane API calls with operation set to `Cleanup`.

   Example of input parameters:

   ```
      {
        "AutomationAssumeRole": "<role-arn>",
        "ClusterName": "<eks-cluster-name>",
        "Operation": "Cleanup"
      }
   ```

 **Automation Steps** 

1.  **ValidateExecution** 
   + Verifies that the automation is not running as a standalone execution.

1.  **CheckForExistingStack** 
   + Checks if a CloudFormation stack was already provisioned for the specified cluster name.
   + Returns stack existence status and whether it's safe to delete.

1.  **BranchOnIsStackExists** 
   + Decision step that branches based on stack existence.
   + Routes to either update existing stack name or proceed with operation branching.

1.  **UpdateStackName** 
   + Updates the `StackName` variable with the existing stack's name.
   + Only executed if stack already exists.

1.  **BranchOnOperation** 
   + Routes the automation based on the `Operation` parameter (`Setup` /`Cleanup`).
   + For `Setup`: Routes to either create new stack or describe existing resources.
   + For `Cleanup`: Proceeds to stack deletion if safe to delete.

1.  **GetClusterNetworkConfig** 
   + Describes the Amazon EKS cluster to obtain VPC configuration.
   + Retrieves endpoint, VPC ID, subnet IDs, security group ID, and CA data.

1.  **ProvisionResources** 
   + Creates a CloudFormation stack with required resources.
   + Provisions Lambda function with necessary networking configuration.
   + Tags all resources for tracking and management.

1.  **DescribeStackResources** 
   + Retrieves information about the created/existing stack.
   + Gets the ARN of the provisioned Lambda function.

1.  **BranchOnIsLambdaDeploymentRequired** 
   + Determines if Lambda code deployment is needed.
   + Only proceeds to deployment for newly created stacks.

1.  **DeployLambdaFunctionCode** 
   + Deploys the Lambda function code using the deployment package.
   + Updates the function with the proxy implementation.

1.  **AssertLambdaAvailable** 
   + Verifies that the Lambda function code update was successful.
   + Waits for the function to be in `Successful` state.

1.  **PerformStackCleanup** 
   + Deletes the CloudFormation stack and associated resources.
   + Executed during `Cleanup` operation or on failure of ` Setup` operation.

 **Outputs** 

*LambdaFunctionArn*: ARN of the proxy Lambda function

**References**

Systems Manager Automation
+ [Run an automation](https://docs.aws.amazon.com//systems-manager/latest/userguide/automation-working-executing.html)
+ [Setting up an Automation](https://docs.aws.amazon.com//systems-manager/latest/userguide/automation-setup.html)
+ [Support Automation Workflows](https://aws.amazon.com/premiumsupport/technology/saw/)

# `AWSSupport-TroubleshootEbsCsiDriversForEks`
<a name="automation-awssupport-troubleshoot-ebs-csi-drivers-for-eks"></a>

 **Description** 

 The `AWSSupport-TroubleshootEbsCsiDriversForEks` runbook helps troubleshoot issues with Amazon Elastic Block Store volume mounts in Amazon Elastic Kubernetes Service (Amazon EKS) and Amazon EBS Container Storage Interface (CSI) driver issues 

**Important**  
Currently the Amazon EBS CSI Driver running on AWS Fargate is not supported.

 **How does it work?** 

 The runbook `AWSSupport-TroubleshootEbsCsiDriversForEks` performs the following high-level steps: 
+ Verifies if the target Amazon EKS cluster exists and is in active state.
+ Deploys necessary authentication resources for making Kubernetes API calls based on whether the addon is Amazon EKS-managed or self-managed.
+ Performs Amazon EBS CSI controller health checks and diagnostics.
+ Runs IAM permissions checks on node roles and service account roles.
+ Diagnoses persistent volume creation issues for the specified application pod.
+ Checks node-to-pod scheduling and examines pod events.
+ Collects relevant Kubernetes and application logs, uploading them to the specified Amazon S3 bucket.
+ Performs node health checks and verifies connectivity with Amazon EC2 endpoints.
+ Reviews persistent volume block device attachments and mounting status.
+ Cleans up the authentication infrastructure created during troubleshooting.
+ Generates a comprehensive troubleshooting report combining all diagnostic results.

**Note**  
The Amazon EKS cluster's authentication mode must be set to either `API` or `API_AND_CONFIG_MAP`. We recommend using Amazon EKS Access entry. The runbook requires Kubernetes Role-based access control (RBAC) permissions to perform the necessary API calls.
If you don't specify an IAM role for the Lambda function (`LambdaRoleArn` parameter), the automation creates a role named `Automation-K8sProxy-Role-<ExecutionId>` in your account. This role includes the managed policies `AWSLambdaBasicExecutionRole` and `AWSLambdaVPCAccessExecutionRole`.
Some diagnostic steps require the Amazon EKS worker nodes to be Systems Manager managed instances. If the nodes aren't Systems Manager managed instances, steps that require Systems Manager access are skipped, but other checks continue.
The automation includes a cleanup step that removes authentication infrastructure resources. This cleanup step runs even when previous steps fail, which helps prevent orphaned resources in your AWS account.

 [Run this Automation (console)](https://console.aws.amazon.com/systems-manager/automation/execute/AWSSupport-TroubleshootEbsCsiDriversForEks) 

**Document type**

Automation

**Owner**

Amazon

**Platforms**

/

**Required IAM permissions**

The `AutomationAssumeRole` parameter requires the following actions to use the runbook successfully.
+ `ec2:DescribeIamInstanceProfileAssociations`
+ `ec2:DescribeInstanceStatus`
+ `ec2:GetEbsEncryptionByDefault`
+ `eks:DescribeAddon`
+ `eks:DescribeAddonVersions`
+ `eks:DescribeCluster`
+ `iam:GetInstanceProfile`
+ `iam:GetOpenIDConnectProvider`
+ `iam:GetRole`
+ `iam:ListOpenIDConnectProviders`
+ `iam:SimulatePrincipalPolicy`
+ `s3:GetBucketLocation`
+ `s3:GetBucketPolicyStatus`
+ `s3:GetBucketPublicAccessBlock`
+ `s3:GetBucketVersioning`
+ `s3:ListBucket`
+ `s3:ListBucketVersions`
+ `ssm:DescribeInstanceInformation`
+ `ssm:GetAutomationExecution`
+ `ssm:GetDocument`
+ `ssm:ListCommandInvocations`
+ `ssm:ListCommands`
+ `ssm:SendCommand`
+ `ssm:StartAutomationExecution`

 **Instructions** 

Follow these steps to configure the automation:

1. Create a SSM automation role `TroubleshootEbsCsiDriversForEks-SSM-Role` in your account. Verify that the trust relationship contains the following policy.

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Sid": "",
               "Effect": "Allow",
               "Principal": {
                   "Service": "ssm.amazonaws.com"
               },
               "Action": "sts:AssumeRole"
           }
       ]
   }
   ```

------

1. Attach the policy below to the IAM role to grant the required permissions to perform the specified actions on the specified resources.
   + If you are expecting to upload execution and resources logs to Amazon S3 bucket in same AWS region, replace `arn:{partition}:s3:::BUCKET_NAME/*` as yours in `OptionalRestrictPutObjects`.
     + The Amazon S3 bucket should point to the correct Amazon S3 bucket if you will select `S3BucketName` in SSM execution.
     + This permission is optional if you don't specify `S3BucketName`
     + The Amazon S3 bucket must be private and in the same AWS region where you execute the SSM automation.

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Sid": "OptionalRestrictPutObjects",
               "Effect": "Allow",
               "Action": [
                   "s3:PutObject"
               ],
               "Resource": [
                   "arn:aws:s3:::amzn-s3-demo-bucket/*"
               ]
           },
           {
               "Effect": "Allow",
               "Action": [
                   "ec2:DescribeIamInstanceProfileAssociations",
                   "ec2:DescribeInstanceStatus",
                   "ec2:GetEbsEncryptionByDefault",
                   "eks:DescribeAddon",
                   "eks:DescribeAddonVersions",
                   "eks:DescribeCluster",
                   "iam:GetInstanceProfile",
                   "iam:GetOpenIDConnectProvider",
                   "iam:GetRole",
                   "iam:ListOpenIDConnectProviders",
                   "iam:SimulatePrincipalPolicy",
                   "s3:GetBucketLocation",
                   "s3:GetBucketPolicyStatus",
                   "s3:GetBucketPublicAccessBlock",
                   "s3:GetBucketVersioning",
                   "s3:ListBucket",
                   "s3:ListBucketVersions",
                   "ssm:DescribeInstanceInformation",
                   "ssm:GetAutomationExecution",
                   "ssm:GetDocument",
                   "ssm:ListCommandInvocations",
                   "ssm:ListCommands",
                   "ssm:SendCommand",
                   "ssm:StartAutomationExecution"
               ],
               "Resource": "*"
           },
           {
               "Sid": "SetupK8sApiProxyForEKSActions",
               "Effect": "Allow",
               "Action": [
                   "cloudformation:CreateStack",
                   "cloudformation:DeleteStack",
                   "cloudformation:DescribeStacks",
                   "cloudformation:UpdateStack",
                   "ec2:CreateNetworkInterface",
                   "ec2:DeleteNetworkInterface",
                   "ec2:DescribeNetworkInterfaces",
                   "ec2:DescribeRouteTables",
                   "ec2:DescribeSecurityGroups",
                   "ec2:DescribeSubnets",
                   "ec2:DescribeVpcs",
                   "eks:DescribeCluster",
                   "iam:CreateRole",
                   "iam:DeleteRole",
                   "iam:GetRole",
                   "iam:TagRole",
                   "iam:UntagRole",
                   "lambda:CreateFunction",
                   "lambda:DeleteFunction",
                   "lambda:GetFunction",
                   "lambda:InvokeFunction",
                   "lambda:ListTags",
                   "lambda:TagResource",
                   "lambda:UntagResource",
                   "lambda:UpdateFunctionCode",
                   "logs:CreateLogGroup",
                   "logs:CreateLogStream",
                   "logs:DescribeLogGroups",
                   "logs:DescribeLogStreams",
                   "logs:ListTagsForResource",
                   "logs:PutLogEvents",
                   "logs:PutRetentionPolicy",
                   "logs:TagResource",
                   "logs:UntagResource",
                   "ssm:DescribeAutomationExecutions",
                   "tag:GetResources",
                   "tag:TagResources"
               ],
               "Resource": "*"
           },
           {
               "Sid": "PassRoleToAutomation",
               "Effect": "Allow",
               "Action": "iam:PassRole",
               "Resource": [
                   "arn:aws:iam::*:role/TroubleshootEbsCsiDriversForEks-SSM-Role",
                   "arn:aws:iam::*:role/Automation-K8sProxy-Role-*"
               ],
               "Condition": {
                   "StringLikeIfExists": {
                       "iam:PassedToService": [
                           "lambda.amazonaws.com",
                           "ssm.amazonaws.com"
                       ]
                   }
               }
           },
           {
               "Sid": "AttachRolePolicy",
               "Effect": "Allow",
               "Action": [
                   "iam:AttachRolePolicy",
                   "iam:DetachRolePolicy"
               ],
               "Resource": "*",
               "Condition": {
                   "StringLikeIfExists": {
                       "iam:ResourceTag/AWSSupport-SetupK8sApiProxyForEKS": "true"
                   }
               }
           }
       ]
   }
   ```

------

1. Grant the required permissions for Amazon EKS cluster RBAC (Role-Based Access Control). The recommended approach is to create an Access Entry in your Amazon EKS cluster.

    In the Amazon EKS console, navigate to your cluster. For Amazon EKS access entries, verify your access configuration is set to `API_AND_CONFIG_MAP` or `API`. For steps to configure authentication mode for access entries, see [Setting up access entries](https://docs.aws.amazon.com//eks/latest/userguide/setting-up-access-entries.html). 

   Choose **Create access entry**.
   + For *IAM principal ARN*, select the IAM role you created for SSM automation in the previous step.
   + For *Type*, select `Standard`.

1. Add an access policy:
   + For *Access scope*, select `Cluster`.
   + For *Policy name*, select `AmazonEKSAdminViewPolicy`.

   Choose **Add policy**.

   If you are not using access entries to manage Kubernetes API permissions, you must update the `aws-auth` ConfigMap and create a role binding between your IAM user or role. Ensure your IAM entity has the following read-only Kubernetes API permissions:
   + GET `/apis/apps/v1/namespaces/{namespace}/deployments/{name}`
   + GET `/apis/apps/v1/namespaces/{namespace}/replicasets/{name}`
   + GET `/apis/apps/v1/namespaces/{namespace}/daemonsets/{name}`
   + GET `/api/v1/nodes/{name}`
   + GET `/api/v1/namespaces/{namespace}/serviceaccounts/{name}`
   + GET `/api/v1/namespaces/{namespace}/persistentvolumeclaims/{name}`
   + GET `/api/v1/persistentvolumes/{name}`
   + GET `/apis/storage.k8s.io/v1/storageclasses/{name}`
   + GET `/api/v1/namespaces/{namespace}/pods/{name}`
   + GET `/api/v1/namespaces/{namespace}/pods`
   + GET `/api/v1/namespaces/{namespace}/pods/{name}/log`
   + GET `/api/v1/events`

1. Run the automation [AWSSupport-TroubleshootEbsCsiDriversForEks (console)](https://console.aws.amazon.com/systems-manager/documents/AWSSupport-TroubleshootEbsCsiDriversForEks/description)

1. Select **Execute automation**.

1. For the input parameters, enter the following:
   + **AutomationAssumeRole (Optional):**
     + Description: (Optional) The Amazon Resource Name (ARN) of the AWS Identity and Access Management (IAM) role that allows SSM Automation to perform the actions on your behalf. The role needs to be added to your Amazon EKS cluster access entry or RBAC permission to allow Kubernetes API calls.
     + Type: `AWS::IAM::Role::Arn`
     + Example: `TroubleshootEbsCsiDriversForEks-SSM-Role`
   + **EksClusterName:**
     + Description: The name of the target Amazon Elastic Kubernetes Service (Amazon EKS) cluster.
     + Type: `String`
   + **ApplicationPodName:**
     + Description: The name of the Kubernetes application pod having issues with the Amazon EBS CSI driver.
     + Type: `String`
   + **ApplicationNamespace:**
     + Description: The Kubernetes namespace for the application pod having issues with the Amazon EBS CSI driver.
     + Type: `String`
   + **EbsCsiControllerDeploymentName (Optional):**
     + Description: (Optional) The deployment name for the Amazon EBS CSI controller pod.
     + Type: `String`
     + Default: `ebs-csi-controller`
   + **EbsCsiControllerNamespace (Optional):**
     + Description: (Optional) The Kubernetes namespace for the Amazon EBS CSI controller pod.
     + Type: `String`
     + Default: `kube-system`
   + **S3BucketName (Optional):**
     + Description: (Optional) The target Amazon S3 bucket name where the troubleshooting logs will be uploaded.
     + Type: `AWS::S3::Bucket::Name`
   + **LambdaRoleArn (Optional):**
     + Description: (Optional) The ARN of the IAM role that allows the AWS Lambda function to access the required AWS services and resources.
     + Type: `AWS::IAM::Role::Arn`

   Select **Execute**.

1. After completed, review the *Outputs* section for the detailed results of the execution.

**References**

Systems Manager Automation
+ [Run this Automation (console)](https://console.aws.amazon.com/systems-manager/documents/AWSSupport-TroubleshootEbsCsiDriversForEks/description)
+ [Run an automation](https://docs.aws.amazon.com//systems-manager/latest/userguide/automation-working-executing.html)
+ [Setting up an Automation](https://docs.aws.amazon.com//systems-manager/latest/userguide/automation-setup.html)
+ [Support Automation Workflows](https://aws.amazon.com/premiumsupport/technology/saw/)

For more information on Amazon EBS CSI Driver, see [Amazon EBS CSI Driver](https://docs.aws.amazon.com//eks/latest/userguide/ebs-csi.html).

# `AWSSupport-TroubleshootEKSALBControllerIssues`
<a name="automation-awssupport-troubleshoot-eks-alb-controller-issues"></a>

 **Description** 

 The `AWSSupport-TroubleshootEKSALBControllerIssues` automation runbook helps diagnose common issues that prevent the AWS Load Balancer Controller from properly provisioning and managing Application Load Balancer (ALB) and Network Load Balancer (NLB) for Kubernetes ingresses and services. 

 This runbook performs end-to-end validation of essential components including OIDC identity provider setup, IRSA configuration, networking prerequisites, ingress/service configuration, and resource quotas. It also captures controller logs and relevant Kubernetes resource configurations to help identify misconfigurations or operational issues. 

**Important**  
This automation runbook is designed for Amazon EKS clusters using Amazon Elastic Compute Cloud (Amazon EC2) node groups and does not currently support clusters running on AWS Fargate.

 **How does it work?** 

 The runbook `AWSSupport-TroubleshootEKSALBControllerIssues` performs the following high-level steps: 
+ Validates Amazon EKS cluster status, access entry configuration and OIDC provider setup.
+ Creates temporary Lambda proxy for Kubernetes API communication.
+ Checks AWS Load Balancer Controller deployment and service account configuration.
+ Verifies pod identity webhook and IAM role injection.
+ Validates subnet configuration and tagging for Application Load Balancer and Network Load Balancer provisioning.
+ Checks Application Load Balancer and Network Load Balancer account quotas against current usage.
+ Validates ingress and service resource annotations.
+ Checks worker node security group tagging for load balancer integration.
+ Collects controller pod logs for diagnostics.
+ Cleans up temporary authentication resources.
+ Generates diagnostic report with findings and remediation steps.

**Note**  
The Amazon EKS cluster must have an access entry configured for the IAM entity running this automation. The cluster's authentication mode must be set to either `API` or `API_AND_CONFIG_MAP`. Without proper access entry configuration, the automation will terminate during initial validation.
The `LambdaRoleArn` parameter is required and must have the AWS managed policies `AWSLambdaBasicExecutionRole` and `AWSLambdaVPCAccessExecutionRole` attached to allow the proxy function to communicate with the Kubernetes API.
The AWS Load Balancer Controller must be version `v2.1.1` or later.
The automation includes a cleanup step that removes temporary authentication infrastructure resources. This cleanup step runs even when previous steps fail, ensuring no orphaned resources remain in your AWS account.

 [Run this Automation (console)](https://console.aws.amazon.com/systems-manager/automation/execute/AWSSupport-TroubleshootEKSALBControllerIssues) 

**Document type**

Automation

**Owner**

Amazon

**Platforms**

/

**Required IAM permissions**

The `AutomationAssumeRole` parameter requires the following actions to use the runbook successfully.
+ `cloudformation:CreateStack`
+ `cloudformation:DeleteStack`
+ `cloudformation:DescribeStacks`
+ `cloudformation:UpdateStack`
+ `ec2:CreateNetworkInterface`
+ `ec2:DeleteNetworkInterface`
+ `ec2:DescribeInstances`
+ `ec2:DescribeNetworkInterfaces`
+ `ec2:DescribeRouteTables`
+ `ec2:DescribeSecurityGroups`
+ `ec2:DescribeSubnets`
+ `ec2:DescribeVpcs`
+ `eks:DescribeCluster`
+ `eks:ListAssociatedAccessPolicies`
+ `elasticloadbalancing:DescribeAccountLimits`
+ `elasticloadbalancing:DescribeLoadBalancers`
+ `iam:GetRole`
+ `iam:ListOpenIDConnectProviders`
+ `iam:PassRole`
+ `lambda:CreateFunction`
+ `lambda:DeleteFunction`
+ `lambda:GetFunction`
+ `lambda:InvokeFunction`
+ `lambda:ListTags`
+ `lambda:TagResource`
+ `lambda:UntagResource`
+ `lambda:UpdateFunctionCode`
+ `logs:CreateLogGroup`
+ `logs:CreateLogStream`
+ `logs:DescribeLogGroups`
+ `logs:DescribeLogStreams`
+ `logs:ListTagsForResource`
+ `logs:PutLogEvents`
+ `logs:PutRetentionPolicy`
+ `logs:TagResource`
+ `logs:UntagResource`
+ `ssm:DescribeAutomationExecutions`
+ `ssm:GetAutomationExecution`
+ `ssm:StartAutomationExecution`
+ `tag:GetResources`
+ `tag:TagResources`

 **Instructions** 

Follow these steps to configure and run the automation:

**Note**  
Before running the automation, follow these steps to configure the required IAM roles: one for Systems Manager Automation to execute the runbook, and another for Lambda to communicate with the Kubernetes API:  
Create a SSM automation role `TroubleshootEKSALBController-SSM-Role` in your account. Verify that the trust relationship contains the following policy.  

   ```
   {
               "Version": "2012-10-17",		 	 	 
               "Statement": [
                   {
                       "Sid": "",
                       "Effect": "Allow",
                       "Principal": {
                           "Service": "ssm.amazonaws.com"
                       },
                       "Action": "sts:AssumeRole"
                   }
               ]
           }
   ```
Attach the following IAM policy to grant the required permissions:  

   ```
   {
               "Version": "2012-10-17",		 	 	 
               "Statement": [{
                   "Sid": "TroubleshootEKSALBControllerIssuesActions",
                   "Effect": "Allow",
                   "Action": [
                       "eks:DescribeCluster",
                       "eks:ListAssociatedAccessPolicies",
                       "iam:GetRole",
                       "iam:ListOpenIDConnectProviders",
                       "ssm:StartAutomationExecution",
                       "ssm:GetAutomationExecution",
                       "ssm:DescribeAutomationExecutions",
                       "ec2:DescribeSubnets",
                       "ec2:DescribeRouteTables",
                       "elasticloadbalancing:DescribeLoadBalancers",
                       "elasticloadbalancing:DescribeAccountLimits",
                       "ec2:DescribeInstances",
                       "ec2:DescribeNetworkInterfaces",
                       "ec2:DescribeSecurityGroups"
                   ],
                   "Resource": "*"
               },
               {
                   "Sid": "SetupK8sApiProxyForEKSActions",
                   "Effect": "Allow",
                   "Action": [
                       "cloudformation:CreateStack",
                       "cloudformation:DeleteStack",
                       "cloudformation:DescribeStacks",
                       "cloudformation:UpdateStack",
                       "ec2:CreateNetworkInterface",
                       "ec2:DeleteNetworkInterface",
                       "ec2:DescribeNetworkInterfaces",
                       "ec2:DescribeRouteTables",
                       "ec2:DescribeSecurityGroups",
                       "ec2:DescribeSubnets",
                       "ec2:DescribeVpcs",
                       "eks:DescribeCluster",
                       "iam:GetRole",
                       "lambda:CreateFunction",
                       "lambda:DeleteFunction",
                       "lambda:GetFunction",
                       "lambda:InvokeFunction",
                       "lambda:ListTags",
                       "lambda:TagResource",
                       "lambda:UntagResource",
                       "lambda:UpdateFunctionCode",
                       "logs:CreateLogGroup",
                       "logs:CreateLogStream",
                       "logs:DescribeLogGroups",
                       "logs:DescribeLogStreams",
                       "logs:ListTagsForResource",
                       "logs:PutLogEvents",
                       "logs:PutRetentionPolicy",
                       "logs:TagResource",
                       "logs:UntagResource",
                       "ssm:DescribeAutomationExecutions",
                       "tag:GetResources",
                       "tag:TagResources"
                   ],
                   "Resource": "*"
               },
               {
                   "Sid": "PassRoleToAutomation",
                   "Effect": "Allow",
                   "Action": "iam:PassRole",
                   "Resource": "*",
                   "Condition": {
                       "StringLikeIfExists": {
                           "iam:PassedToService": [
                               "lambda.amazonaws.com",
                               "ssm.amazonaws.com"
                           ]
                       }
                   }
               }]
           }
   ```
Configure access entry for your Amazon EKS cluster. This is a mandatory requirement for the automation. For steps to configure authentication mode for access entries, see [Setting up access entries](https://docs.aws.amazon.com//eks/latest/userguide/setting-up-access-entries.html).  
In the Amazon EKS console, navigate to your cluster and follow these steps:  
Under **Access** section, verify your authentication configuration is set to either `API` or `API_AND_CONFIG_MAP`.
Choose **Create access entry** and configure:  
For *IAM principal ARN*, select the IAM role you created (`TroubleshootEKSALBController-SSM-Role`).
For *Type*, select `Standard`.
Add an access policy:  
For *Policy name*, select `AmazonEKSAdminViewPolicy`.
For *Access scope*, select `Cluster`.
Choose **Add policy**.
Verify the details and choose **Create**.
Create an IAM role for the Lambda function (referenced as `LambdaRoleArn` in the input parameters):  
Create a new IAM role with the following trust policy:  

     ```
     {
                 "Version": "2012-10-17",		 	 	 
                 "Statement": [
                     {
                         "Effect": "Allow",
                         "Principal": {
                             "Service": "lambda.amazonaws.com"
                         },
                         "Action": "sts:AssumeRole"
                     }
                 ]
             }
     ```
Attach the following AWS managed policies to this role:  
`AWSLambdaBasicExecutionRole`
`AWSLambdaVPCAccessExecutionRole`
Note the ARN of this role as you will need it for the `LambdaRoleArn` input parameter.

1. Navigate to [AWSSupport-TroubleshootEKSALBControllerIssues](https://console.aws.amazon.com/systems-manager/documents/AWSSupport-TroubleshootEKSALBControllerIssues/description) in the AWS Systems Manager console.

1. Choose **Execute automation**.

1. For the input parameters enter the following:
   + **AutomationAssumeRole (Optional):**

     Type: AWS::IAM::Role::Arn

     Description: (Optional) The Amazon Resource Name (ARN) of the AWS Identity and Access Management (IAM) role that allows Systems Manager Automation to perform actions on your behalf. If no role is specified, Systems Manager Automation uses the permissions of the user that starts this runbook.

     Allowed Pattern: ^arn:(?:aws\$1aws-cn\$1aws-us-gov):iam::\$1d\$112\$1:role/?[a-zA-Z\$10-9\$1=,.@\$1-\$1/]\$1\$1
   + **EksClusterName (Required):**

     Type: String

     Description: (Required) Name of the Amazon Elastic Kubernetes Service (Amazon EKS) cluster.

     Allowed Pattern: ^[0-9A-Za-z][A-Za-z0-9-\$1]\$10,99\$1\$1
   + **ALBControllerDeploymentName (Optional):**

     Type: String

     Description: (Optional) The name of the AWS Load Balancer Controller deployment in your Amazon EKS cluster. This is typically 'aws-load-balancer-controller' unless you've customized it during installation.

     Allowed Pattern: ^[a-z0-9]([-.a-z0-9]\$10,251\$1[a-z0-9])?\$1

     Default: aws-load-balancer-controller
   + **ALBControllerNamespace (Optional):**

     Type: String

     Description: (Optional) The Kubernetes namespace where the AWS Load Balancer Controller is deployed. By default, this is 'kube-system', but it may be different if you've installed the controller in a custom namespace.

     Allowed Pattern: ^[a-z0-9]([-a-z0-9]\$10,61\$1[a-z0-9])?\$1

     Default: kube-system
   + **ServiceAccountName (Optional):**

     Type: String

     Description: (Optional) The name of the Kubernetes Service Account associated with the AWS Load Balancer Controller. This is typically 'aws-load-balancer-controller' unless customized during installation.

     Allowed Pattern: ^[a-z0-9]([-.a-z0-9]\$10,251\$1[a-z0-9])?\$1

     Default: aws-load-balancer-controller
   + **ServiceAccountNamespace (Optional):**

     Type: String

     Description: (Optional) The Kubernetes namespace where the Service Account for the AWS Load Balancer Controller is located. This is typically 'kube-system', but may differ if you've used a custom namespace.

     Allowed Pattern: ^[a-z0-9]([-a-z0-9]\$10,61\$1[a-z0-9])?\$1

     Default: kube-system
   + **IngressName (Optional):**

     Type: String

     Description: (Optional) Name of the Ingress resource to validate (Application Load Balancer). If not specified, Ingress validation will be skipped.

     Allowed Pattern: ^\$1\$1^[a-z0-9][a-z0-9.-]\$10,251\$1[a-z0-9]\$1

     Default: "" (empty string)
   + **IngressNamespace (Optional):**

     Type: String

     Description: (Optional) Namespace of the Ingress resource. Required if `IngressName` is specified.

     Allowed Pattern: ^\$1\$1^[a-z0-9][a-z0-9-]\$10,61\$1[a-z0-9]\$1

     Default: "" (empty string)
   + **ServiceName (Optional):**

     Type: String

     Description: (Optional) Name of a specific Service resource to validate Network Load Balancer (Network Load Balancer) annotations. If not specified, Service resources validation will be skipped.

     Allowed Pattern: ^\$1\$1^[a-z0-9][a-z0-9.-]\$10,251\$1[a-z0-9]\$1

     Default: "" (empty string)
   + **ServiceNamespace (Optional):**

     Type: String

     Description: (Optional) Namespace of the Service resource. Required if `ServiceName` is specified.

     Allowed Pattern: ^\$1\$1^[a-z0-9][a-z0-9-]\$10,61\$1[a-z0-9]\$1

     Default: "" (empty string)
   + **LambdaRoleArn (Required):**

     Type: AWS::IAM::Role::Arn

     Description: (Required) The ARN of the IAM role that allows the AWS Lambda (Lambda) function to access the required AWS services and resources. Associate the AWS managed policies: `AWSLambdaBasicExecutionRole` and `AWSLambdaVPCAccessExecutionRole` to your lambda function execution IAM role.

     Allowed Pattern: ^arn:(?:aws\$1aws-cn\$1aws-us-gov):iam::\$1d\$112\$1:role/?[a-zA-Z\$10-9\$1=,.@\$1-\$1/]\$1\$1

1. Choose **Execute**.

1. The automation initiates.

1. The document performs the following steps:

   1. **ValidateAccessEntryAndOIDCProvider:**

      Validates Amazon EKS cluster IAM setup by checking access entry permissions and OIDC provider configuration.

   1. **SetupK8sAuthenticationClient:**

      Execute the SAW Document AWSSupport-SetupK8sApiProxyForEKS to set up a lambda function to run Amazon EKS API calls on the cluster.

   1. **VerifyALBControllerAndIRSASetup:**

      Checks whether the given Service Account & Application Load Balancer controller exists in their respective namespaces. Also checks Application Load Balancer controller's Service Account Role Annotation & Trust policy.

   1. **VerifyPodIdentityWebhookAndEnv:**

      Checks whether pod-identity-webhook is running. Also checks whether IRSA is injected into pod's ENV variables.

   1. **ValidateSubnetRequirements:**

      Check at least two subnets in two AZ's with 8 available IP's, Proper subnet tagging exist for public/private load balancers.

   1. **CheckLoadBalancerLimitsAndUsage:**

      Compare the account limit against the number of Application Load Balancer and Network Load Balancer.

   1. **CheckIngressOrServiceAnnotations:**

      Checks for correct annotations and specifications in Ingress and Service resources to ensure they are properly configured for Application Load Balancer and Network Load Balancer usage.

   1. **CheckWorkerNodeSecurityGroupTags:**

      Verify that exactly one security group attached to the worker nodes has the required cluster tag.

   1. **CaptureALBControllerLogs:**

      Retrieves latest diagnostic logs from the AWS Load Balancer Controller pods running in the Amazon EKS cluster.

   1. **CleanupK8sAuthenticationClient:**

      Executes the SAW Document 'AWSSupport-SetupK8sApiProxyForEKS' using the 'Cleanup' operation to clean up resources created as part of the automation.

   1. **GenerateReport:**

      Generates the automation report.

1. After the execution completes, review the Outputs section for the detailed results of the execution:

   1. **Report:**

      Provides a comprehensive summary of all checks performed, including the status of the Amazon EKS cluster, Application Load Balancer Controller setup, IRSA configuration, subnet requirements, load balancer limits, ingress/service annotations, worker node security group tags, and Application Load Balancer Controller logs. It also includes any identified issues and recommended remediation steps.

**References**

Systems Manager Automation
+ [Run this Automation (console)](https://console.aws.amazon.com/systems-manager/documents/AWSSupport-TroubleshootEKSALBControllerIssues/description)
+ [Run an automation](https://docs.aws.amazon.com//systems-manager/latest/userguide/automation-working-executing.html)
+ [Setting up Automation](https://docs.aws.amazon.com//systems-manager/latest/userguide/automation-setup.html)
+ [Support Automation Workflows](https://aws.amazon.com/premiumsupport/technology/saw/)

Documentation related to AWS Load Balancer Controller
+ [AWS Load Balancer Controller](https://docs.aws.amazon.com//eks/latest/userguide/aws-load-balancer-controller.html)
+ [Setting up access entries](https://docs.aws.amazon.com//eks/latest/userguide/setting-up-access-entries.html)

# `AWSSupport-TroubleshootEKSDNSFailure`
<a name="automation-awssupport-troubleshooteksdnsfailure"></a>

 **Description** 

The `AWSSupport-TroubleshootEKSDNSFailure` runbook helps troubleshoot issues with CoreDNS pods and configuration in Amazon Elastic Kubernetes Service (Amazon EKS) when applications or pods encounter DNS resolution failures. The runbook validates VPC DNS settings, inspects the CoreDNS deployment and ConfigMap, checks Horizontal Pod Autoscaler (HPA) configuration, collects CoreDNS logs, and performs DNS resolution checks on worker nodes. Optionally, a probing Amazon Elastic Compute Cloud instance can be created in the same subnet as the problematic worker node to perform DNS resolution checks without requiring direct access to the node.

**Important**  
The Amazon EKS cluster's authentication mode must be set to `API` or `API_AND_CONFIG_MAP`. This runbook deploys a AWS Lambda (Lambda) function as a proxy to make authenticated Kubernetes API calls and cleans up all created resources at the end of execution.

 [Run this Automation (console)](https://console.aws.amazon.com/systems-manager/automation/execute/AWSSupport-TroubleshootEKSDNSFailure) 

**Document type**

Automation

**Owner**

Amazon

**Platforms**

Linux, macOS, Windows

**Parameters**
+ AutomationAssumeRole

  Type: AWS::IAM::Role::Arn

  Description: (Optional) The Amazon Resource Name (ARN) of the AWS Identity and Access Management (IAM) role that allows Systems Manager Automation to perform the actions on your behalf. If no role is specified, Systems Manager Automation uses the permissions of the user that starts this runbook.
+ EksClusterName

  Type: String

  Description: (Required) The name of the target Amazon EKS cluster.
+ DnsName

  Type: String

  Default: amazon.com

  Description: (Optional) The stub domain suffix for the domain name that the application or pod is failing to resolve.
+ ProblematicNodeInstanceId

  Type: AWS::EC2::Instance::Id

  Description: (Optional) The instance ID of the worker node where the application experiencing DNS resolution errors is running. When provided, the runbook creates a probing Amazon EC2 instance in the same subnet to perform DNS resolution checks. Use this parameter if the worker node is not in a public subnet or if installing `bind-utils` on the worker node is not permitted.
+ CoreDnsNamespace

  Type: String

  Default: kube-system

  Description: (Optional) The Kubernetes namespace for CoreDNS pods.
+ S3BucketName

  Type: AWS::S3::Bucket::Name

  Description: (Optional) The name of the Amazon Simple Storage Service bucket where CoreDNS troubleshooting logs are uploaded.
+ LambdaRoleArn

  Type: AWS::IAM::Role::Arn

  Description: (Optional) The ARN of the IAM role for the Lambda proxy function. If not provided, the runbook creates a role named `Automation-K8sProxy-Role-<ExecutionId>` with the `AWSLambdaBasicExecutionRole` and `AWSLambdaVPCAccessExecutionRole` managed policies. It is recommended to provide your own role.

**Required IAM permissions**

The `AutomationAssumeRole` parameter requires the following actions to use the runbook successfully.
+ `eks:DescribeCluster`
+ `ec2:DescribeVpcs`
+ `ec2:DescribeInstances`
+ `ec2:DescribeSubnets`
+ `ec2:DescribeSecurityGroups`
+ `cloudformation:CreateStack`
+ `cloudformation:DescribeStacks`
+ `cloudformation:DeleteStack`
+ `lambda:InvokeFunction`
+ `s3:GetBucketPublicAccessBlock`
+ `s3:GetBucketAcl`
+ `s3:PutObject`
+ `ssm:DescribeInstanceInformation`
+ `ssm:SendCommand`
+ `ssm:GetCommandInvocation`
+ `ssm:StartAutomationExecution`
+ `ssm:GetAutomationExecution`

 **Document Steps** 

1. `AssertIfTargetClusterExists` - Verifies that the Amazon EKS cluster specified in `EksClusterName` exists and is in the `ACTIVE` state. If the cluster is not found or not active, the runbook skips to `GenerateReport`.

1. `UpdateEksClusterExists` - Sets the internal `eksClusterExists` variable to `true` for use in report generation.

1. `GetVpcDnsSettings` - Retrieves the `enableDnsSupport` and `enableDnsHostnames` settings for the Amazon Virtual Private Cloud associated with the Amazon EKS cluster.

1. `BranchOnVpcDnsSettings` - Checks whether both VPC DNS settings are enabled. If either is disabled, the runbook skips to `GenerateReport`. Otherwise, proceeds to `DeployK8sAuthApisResources`.

1. `DeployK8sAuthApisResources` - Executes `AWSSupport-SetupK8sApiProxyForEKS` to deploy a Lambda function as a proxy for making authenticated Kubernetes API calls to the Amazon EKS cluster.

1. `RetrieveCoreDNSDeployment` - Retrieves information about the CoreDNS deployment, including pod readiness, container status, and the readiness of nodes hosting CoreDNS pods. Also retrieves the CoreDNS cluster IP.

1. `RetrieveAndInspectCoreDNSConfigMap` - Retrieves the CoreDNS ConfigMap from the Amazon EKS cluster and checks for configuration issues, including stub domain settings for the domain specified in `DnsName`.

1. `ValidateHpaConfiguration` - Checks whether a Horizontal Pod Autoscaler (HPA) is configured for the CoreDNS deployment in the specified namespace.

1. `CheckS3BucketPublicStatus` - Validates that the Amazon S3 bucket specified in `S3BucketName` does not allow public or anonymous read or write access.

1. `CollectLogToS3` - Collects CoreDNS pod logs and uploads them to the specified Amazon S3 bucket.

1. `BranchOnProblematicNodeInstanceId` - Checks whether `ProblematicNodeInstanceId` is provided and CoreDNS host nodes exist. If both conditions are met, proceeds to `VerifyThatProblematicNodeInstanceBelongsToCluster`. Otherwise, branches to `BranchOnCoreDnsDeployment`.

1. `VerifyThatProblematicNodeInstanceBelongsToCluster` - Confirms that the instance specified in `ProblematicNodeInstanceId` is a worker node in the Amazon EKS cluster.

1. `UpdateProblematicNodeInstanceStanding` - Sets the internal `problematicNodeInstanceStanding` variable to `true`.

1. `GetProblematicInstanceDetails` - Retrieves the AMI ID, instance type, subnet ID, and security group IDs of the problematic worker node for use when creating the probing Amazon EC2 instance.

1. `CreateProbingInfrastructure` - Creates an instance profile and probing Amazon EC2 instance via an AWS CloudFormation stack in the same subnet as the problematic worker node. The stack is named `AWSSupport-TroubleshootEKSDNSFailure-<ExecutionId>`.

1. `GetProbingInstanceId` - Retrieves the probing Amazon EC2 instance ID from the CloudFormation stack outputs.

1. `WaitForProbingInstanceSSMAgentStateToBeOnline` - Waits for the Amazon EC2 Systems Manager Agent on the probing Amazon EC2 instance to report an `Online` status before proceeding.

1. `RetrieveCoreDNSPodsIPFromProblematicNode` - Retrieves the CoreDNS pod IP addresses as seen from the problematic worker node.

1. `PerformDNSResolutionOnProbing``EC2Instance` - Runs DNS resolution checks on the probing Amazon EC2 instance using the cluster IP and CoreDNS pod IP.

1. `DeleteCloudFormationStack` - Deletes the CloudFormation stack that created the probing Amazon EC2 instance and instance profile.

1. `UpdateCfnStackDeleted` - Sets the internal `cfnStackDeleted` variable to `true`.

1. `BranchOnCoreDnsDeployment` - Checks whether `ProblematicNodeInstanceId` was not provided and CoreDNS host nodes exist. If both conditions are met, proceeds to `PerformDNSResolutionOnCoreDnsWorkerNodes`. Otherwise, proceeds to `CleanupK8sAuthenticationInfrastructure`.

1. `BranchOnCoreDnsNodesExistForRunCommandSteps` - Checks whether CoreDNS host nodes exist before running `aws:runCommand` steps. If no nodes exist (for example, when CoreDNS has zero replicas), skips to `CleanupK8sAuthenticationInfrastructure`.

1. `PerformDNSResolutionOnCoreDnsWorkerNodes` - Runs DNS resolution checks directly on the CoreDNS worker nodes using the cluster IP and CoreDNS pod IP.

1. `VerifyNameserverMatchAndKubeProxyLogsAndIPTableEntries` - Verifies that the nameserver IP matches the cluster IP, checks kube-proxy pod access to the API server, and validates kube-dns iptables entries on the CoreDNS worker nodes.

1. `VerifyPPSThrottlingOnENIs` - Checks the DNS packets-per-second (PPS) limit per Elastic Network Interface (ENI) on the CoreDNS worker nodes to identify potential throttling.

1. `UpdateChecksOnNodes` - Sets the internal `checksOnNodes` variable to `true` to indicate that node-level checks were performed.

1. `CleanupK8sAuthenticationInfrastructure` - Executes `AWSSupport-SetupK8sApiProxyForEKS` with the `Cleanup` operation to remove the Lambda proxy function and associated resources created during the automation.

1. `UpdateK8sInfrastructreDeleted` - Sets the internal `K8sInfrastructreDeleted` variable to `true`.

1. `CleanUpAllResources` - Performs a comprehensive cleanup of any remaining resources, including the CloudFormation stack and Lambda proxy function, in case earlier cleanup steps did not complete successfully.

1. `CollectOutputFromAllRunCommandSteps` - Collects and consolidates the output from all `aws:runCommand` steps that were executed during the automation.

1. `GenerateReport` - Compiles the results from all preceding steps into a comprehensive evaluation report covering VPC DNS settings, CoreDNS deployment health, ConfigMap configuration, HPA configuration, log collection status, and DNS resolution check results.

 **Outputs** 

`GenerateReport.EvalReport` - A comprehensive report of all DNS troubleshooting checks performed, including findings and recommended remediation steps.