Using job submitter classification
Overview
The Amazon EMR on EKS StartJobRun request creates a job submitter
pod (also known as the job-runner pod) to spawn the Spark driver. You can
use the emr-job-submitter classification to configure node selectors, add tolerations,
customize logging, and make other modifications to the job submitter pod.
The following settings are available under the emr-job-submitter
classification:
-
jobsubmitter.node.selector.[selectorKey] -
Adds to the node selector of the job submitter pod, with key
selectorKeyand the value as the configuration value. For example, you can setjobsubmitter.node.selector.identifiertomyIdentifierand the job submitter pod will have a node selector with a keyidentifierand a valuemyIdentifier. This can be used to specify which nodes the job submitter pod can be placed on. To add multiple node selector keys, set multiple configurations with this prefix. -
jobsubmitter.label.[labelKey] -
Adds to the labels of the job submitter pod, with key
labelKeyand the value as the configuration value. To add multiple labels, set multiple configurations with this prefix. -
jobsubmitter.annotation.[annotationKey] -
Adds to the annotations of the job submitter pod, with key
annotationKeyand the value as the configuration value. To add multiple annotations, set multiple configurations with this prefix. -
jobsubmitter.node.toleration.[tolerationKey] -
Adds tolerations
to the job submitter pod. By default there are no tolerations added to the pod. The toleration's key will be tolerationKeyand the toleration's value will be the configuration value. If the configuration value is set to a non-empty string, the operator will beEquals. If the configuration value is set to"", then the operator will beExists. -
jobsubmitter.node.toleration.[tolerationKey].[effect] -
Adds a toleration effect to the prefixed
tolerationKey. This field is required when adding tolerations. The allowed values for the effect field areNoExecute,NoSchedule, andPreferNoSchedule. -
jobsubmitter.node.toleration.[tolerationKey].[tolerationSeconds] -
Adds tolerationSeconds to the prefixed
tolerationKey. Optional field. Only applicable when the effect isNoExecute. -
jobsubmitter.scheduler.name -
Sets a custom schedulerName for the job submitter pod.
-
jobsubmitter.logging -
Enables or disables logging on the job submitter pod. When this is set to
DISABLEDthe logging container is removed from the job submitter pod, which will disable any logging for this pod specified in themonitoringConfiguration, such ass3MonitoringConfigurationorcloudWatchMonitoringConfiguration. When this setting is not set or is set to any other value, logging on the job submitter pod is enabled. -
jobsubmitter.logging.image -
Sets a custom image to be used for the logging container on the job submitter pod.
-
jobsubmitter.logging.request.cores -
Sets a custom value for the number of CPUs, in CPU units, for the logging container on the job submitter pod. By default, this is set to 100m.
-
jobsubmitter.logging.request.memory -
Sets a custom value for the amount of memory, in bytes, for the logging container on the job submitter pod. By default, this is set to 200Mi. A mebibyte is a unit of measure that's similar to a megabyte.
-
jobsubmitter.container.image -
Sets a custom image for the job submitter pod's
job-runnercontainer. -
jobsubmitter.container.image.pullPolicy -
Sets the imagePullPolicy
for the job submitter pod's containers.
We recommend to place job submitter pods on On-Demand Instances. Placing job submitter pods on Spot instances might result in a job failure if the instance where the job submitter pod runs is subject to a Spot Instance interruption. You can also place the job submitter pod in a single Availability Zone or use any Kubernetes labels that are applied to the nodes.
Job submitter classification examples
In this section
StartJobRun request with On-Demand node placement for the job submitter pod
StartJobRun request with labels, annotations, and a custom scheduler for the job submitter pod
StartJobRun request with logging disabled for the job submitter pod
StartJobRun request with custom logging container image, CPU, and memory for the job submitter pod
StartJobRun request with a custom job submitter container image and pull policy
StartJobRun request with On-Demand node placement for the job submitter pod
cat >spark-python-in-s3-nodeselector-job-submitter.json << EOF { "name": "spark-python-in-s3-nodeselector", "virtualClusterId": "virtual-cluster-id", "executionRoleArn": "execution-role-arn", "releaseLabel": "emr-6.11.0-latest", "jobDriver": { "sparkSubmitJobDriver": { "entryPoint": "s3://S3-prefix/trip-count.py", "sparkSubmitParameters": "--conf spark.driver.cores=5 --conf spark.executor.memory=20G --conf spark.driver.memory=15G --conf spark.executor.cores=6" } }, "configurationOverrides": { "applicationConfiguration": [ { "classification": "spark-defaults", "properties": { "spark.dynamicAllocation.enabled":"false" } }, { "classification": "emr-job-submitter", "properties": { "jobsubmitter.node.selector.eks.amazonaws.com/capacityType": "ON_DEMAND" } } ], "monitoringConfiguration": { "cloudWatchMonitoringConfiguration": { "logGroupName": "/emr-containers/jobs", "logStreamNamePrefix": "demo" }, "s3MonitoringConfiguration": { "logUri": "s3://joblogs" } } } } EOF aws emr-containers start-job-run --cli-input-json file:///spark-python-in-s3-nodeselector-job-submitter.json
StartJobRun request with single-AZ node placement and Amazon EC2 instance type placement for the job submitter pod
"configurationOverrides": { "applicationConfiguration": [ { "classification": "emr-job-submitter", "properties": { "jobsubmitter.node.selector.topology.kubernetes.io/zone": "Availability Zone", "jobsubmitter.node.selector.node.kubernetes.io/instance-type":"m5.4xlarge" } } ] }
StartJobRun request with labels, annotations, and a custom scheduler for the job submitter pod
"configurationOverrides": { "applicationConfiguration": [ { "classification": "emr-job-submitter", "properties": { "jobsubmitter.label.label1": "value1", "jobsubmitter.label.label2": "value2", "jobsubmitter.annotation.ann1": "value1", "jobsubmitter.annotation.ann2": "value2", "jobsubmitter.scheduler.name": "custom-scheduler" } } ] }
StartJobRun request with a toleration applied to the job submitter pod with key dedicated, value graviton_machines, effect NoExecute, and a tolerationSeconds of 60 seconds
"configurationOverrides": { "applicationConfiguration": [ { "classification": "emr-job-submitter", "properties": { "jobsubmitter.node.toleration.dedicated":"graviton_machines", "jobsubmitter.node.toleration.dedicated.effect":"NoExecute", "jobsubmitter.node.toleration.dedicated.tolerationSeconds":"60" } } ] }
StartJobRun request with logging disabled for the job submitter pod
"configurationOverrides": { "applicationConfiguration": [ { "classification": "emr-job-submitter", "properties": { "jobsubmitter.logging": "DISABLED" } } ], "monitoringConfiguration": { "cloudWatchMonitoringConfiguration": { "logGroupName": "/emr-containers/jobs", "logStreamNamePrefix": "demo" }, "s3MonitoringConfiguration": { "logUri": "s3://joblogs" } } }
StartJobRun request with custom logging container image, CPU, and memory for the job submitter pod
"configurationOverrides": { "applicationConfiguration": [ { "classification": "emr-job-submitter", "properties": { "jobsubmitter.logging.image": "YOUR_ECR_IMAGE_URL", "jobsubmitter.logging.request.memory": "200Mi", "jobsubmitter.logging.request.cores": "0.5" } } ], "monitoringConfiguration": { "cloudWatchMonitoringConfiguration": { "logGroupName": "/emr-containers/jobs", "logStreamNamePrefix": "demo" }, "s3MonitoringConfiguration": { "logUri": "s3://joblogs" } } }
StartJobRun request with a custom job submitter container image and pull policy
"configurationOverrides": { "applicationConfiguration": [ { "classification": "emr-job-submitter", "properties": { "jobsubmitter.container.image": "123456789012.dkr.ecr.us-west-2.amazonaws.com/emr6.11_custom_repo", "jobsubmitter.container.image.pullPolicy": "kubernetes pull policy" } } ] }