Links to Amazon EMR on EKS best practices guides on GitHub
We've built the Amazon EMR on EKS Best Practices Guide
Security
Note
For more information on security with Amazon EMR on EKS, see Amazon EMR on EKS security best practices.
Encryption best practices:
Managing network security
Using AWS secrets manager to store secrets
Pyspark job submission
Pyspark job submission:
Storage
Using EBS volumes:
Using Amazon FSx for Lustre volumes:
Using Instance store volumes:
Metastore integration
Using Hive metastore:
Using AWS Glue:
Debugging
Using Spark debugging:
Connecting to Spark UI on the driver pod
How to use self-hosted Spark history server with Amazon EMR on EKS
Troubleshooting Amazon EMR on EKS issues
Node placement
Using Kubernetes node selectorssingle-az and other use
cases.
Performance
Using Dynamic Resource Allocation (DRA)
By default, spark.dynamicAllocation.preallocateExecutors is enabled in Amazon EMR Spark. When
spark.dynamicAllocation.initialExecutors and spark.dynamicAllocation.minExecutors are
not set, Spark may request a large number of executors at startup based on estimated task counts, even for small
workloads. To avoid excessive container churn, use one of the following approaches:
Set
spark.dynamicAllocation.initialExecutorsorspark.dynamicAllocation.minExecutorsto a value appropriate for your workload size.Set
spark.dynamicAllocation.preallocateExecutors.maxEstimatedTasksto a lower value to limit the number of executors requested at startup.Set
spark.dynamicAllocation.preallocateExecutorstofalseto disable executor preallocation entirely.
EKS best practices
Cost optimization
Using spot instances:
Using AWS Outposts
Running Amazon EMR on EKS using AWS Outposts