Using job submitter classification

Overview

The Amazon EMR on EKS StartJobRun request creates a job submitter pod (also known as the job-runner pod) to spawn the Spark driver. You can use the emr-job-submitter classification to configure node selectors, add tolerations, customize logging, and make other modifications to the job submitter pod.

The following settings are available under the emr-job-submitter classification:

jobsubmitter.node.selector.[selectorKey]: Adds to the node selector of the job submitter pod, with key selectorKey and the value as the configuration value. For example, you can set jobsubmitter.node.selector.identifier to myIdentifier and the job submitter pod will have a node selector with a key identifier and a value myIdentifier. This can be used to specify which nodes the job submitter pod can be placed on. To add multiple node selector keys, set multiple configurations with this prefix.
jobsubmitter.label.[labelKey]: Adds to the labels of the job submitter pod, with key labelKey and the value as the configuration value. To add multiple labels, set multiple configurations with this prefix.
jobsubmitter.annotation.[annotationKey]: Adds to the annotations of the job submitter pod, with key annotationKey and the value as the configuration value. To add multiple annotations, set multiple configurations with this prefix.
jobsubmitter.node.toleration.[tolerationKey]: Adds tolerations to the job submitter pod. By default there are no tolerations added to the pod. The toleration's key will be tolerationKey and the toleration's value will be the configuration value. If the configuration value is set to a non-empty string, the operator will be Equals. If the configuration value is set to "", then the operator will be Exists.
jobsubmitter.node.toleration.[tolerationKey].[effect]: Adds a toleration effect to the prefixed tolerationKey. This field is required when adding tolerations. The allowed values for the effect field are NoExecute, NoSchedule, and PreferNoSchedule.
jobsubmitter.node.toleration.[tolerationKey].[tolerationSeconds]: Adds tolerationSeconds to the prefixed tolerationKey. Optional field. Only applicable when the effect is NoExecute.
jobsubmitter.scheduler.name: Sets a custom schedulerName for the job submitter pod.
jobsubmitter.logging: Enables or disables logging on the job submitter pod. When this is set to DISABLED the logging container is removed from the job submitter pod, which will disable any logging for this pod specified in the monitoringConfiguration, such as s3MonitoringConfiguration or cloudWatchMonitoringConfiguration. When this setting is not set or is set to any other value, logging on the job submitter pod is enabled.
jobsubmitter.logging.image: Sets a custom image to be used for the logging container on the job submitter pod.
jobsubmitter.logging.request.cores: Sets a custom value for the number of CPUs, in CPU units, for the logging container on the job submitter pod. By default, this is set to 100m.
jobsubmitter.logging.request.memory: Sets a custom value for the amount of memory, in bytes, for the logging container on the job submitter pod. By default, this is set to 200Mi. A mebibyte is a unit of measure that's similar to a megabyte.
jobsubmitter.container.image: Sets a custom image for the job submitter pod's job-runner container.
jobsubmitter.container.image.pullPolicy: Sets the imagePullPolicy for the job submitter pod's containers.

We recommend to place job submitter pods on On-Demand Instances. Placing job submitter pods on Spot instances might result in a job failure if the instance where the job submitter pod runs is subject to a Spot Instance interruption. You can also place the job submitter pod in a single Availability Zone or use any Kubernetes labels that are applied to the nodes.

Job submitter classification examples

In this section

StartJobRun request with On-Demand node placement for the job submitter pod
StartJobRun request with single-AZ node placement and Amazon EC2 instance type placement for the job submitter pod
StartJobRun request with labels, annotations, and a custom scheduler for the job submitter pod
StartJobRun request with a toleration applied to the job submitter pod with key dedicated, value graviton_machines, effect NoExecute, and a tolerationSeconds of 60 seconds
StartJobRun request with logging disabled for the job submitter pod
StartJobRun request with custom logging container image, CPU, and memory for the job submitter pod
StartJobRun request with a custom job submitter container image and pull policy

`StartJobRun` request with On-Demand node placement for the job submitter pod


cat >spark-python-in-s3-nodeselector-job-submitter.json << EOF
{
  "name": "spark-python-in-s3-nodeselector", 
  "virtualClusterId": "virtual-cluster-id", 
  "executionRoleArn": "execution-role-arn", 
  "releaseLabel": "emr-6.11.0-latest", 
  "jobDriver": {
    "sparkSubmitJobDriver": {
      "entryPoint": "s3://S3-prefix/trip-count.py", 
      "sparkSubmitParameters": "--conf spark.driver.cores=5  --conf spark.executor.memory=20G --conf spark.driver.memory=15G --conf spark.executor.cores=6"
    }
  }, 
  "configurationOverrides": {
    "applicationConfiguration": [
      {
        "classification": "spark-defaults", 
        "properties": {
          "spark.dynamicAllocation.enabled":"false"
        }
      },
      {
        "classification": "emr-job-submitter",
        "properties": {
          "jobsubmitter.node.selector.eks.amazonaws.com/capacityType": "ON_DEMAND"
        }
      }
    ], 
    "monitoringConfiguration": {
      "cloudWatchMonitoringConfiguration": {
        "logGroupName": "/emr-containers/jobs", 
        "logStreamNamePrefix": "demo"
      }, 
      "s3MonitoringConfiguration": {
        "logUri": "s3://joblogs"
      }
    }
  }
}
EOF
aws emr-containers start-job-run --cli-input-json file:///spark-python-in-s3-nodeselector-job-submitter.json

`StartJobRun` request with single-AZ node placement and Amazon EC2 instance type placement for the job submitter pod


"configurationOverrides": {
  "applicationConfiguration": [
    {
      "classification": "emr-job-submitter",
      "properties": {
        "jobsubmitter.node.selector.topology.kubernetes.io/zone": "Availability Zone",
        "jobsubmitter.node.selector.node.kubernetes.io/instance-type":"m5.4xlarge"
      }
    }
  ]
}

`StartJobRun` request with labels, annotations, and a custom scheduler for the job submitter pod


"configurationOverrides": { 
  "applicationConfiguration": [ 
    {
      "classification": "emr-job-submitter", 
      "properties": {
        "jobsubmitter.label.label1": "value1",
        "jobsubmitter.label.label2": "value2",
        "jobsubmitter.annotation.ann1": "value1",
        "jobsubmitter.annotation.ann2": "value2",
        "jobsubmitter.scheduler.name": "custom-scheduler"
      }
    }
  ]
}

`StartJobRun` request with a toleration applied to the job submitter pod with key `dedicated`, value `graviton_machines`, effect `NoExecute`, and a `tolerationSeconds` of 60 seconds


"configurationOverrides": {
  "applicationConfiguration": [
    {
      "classification": "emr-job-submitter",
      "properties": {
        "jobsubmitter.node.toleration.dedicated":"graviton_machines",
        "jobsubmitter.node.toleration.dedicated.effect":"NoExecute",
        "jobsubmitter.node.toleration.dedicated.tolerationSeconds":"60"
      }
    }
  ]
}

`StartJobRun` request with logging disabled for the job submitter pod


"configurationOverrides": {
  "applicationConfiguration": [
    {
      "classification": "emr-job-submitter",
      "properties": {
        "jobsubmitter.logging": "DISABLED"
      }
    }
  ], 
  "monitoringConfiguration": {
    "cloudWatchMonitoringConfiguration": {
      "logGroupName": "/emr-containers/jobs", 
      "logStreamNamePrefix": "demo"
    }, 
    "s3MonitoringConfiguration": {
      "logUri": "s3://joblogs"
    }
  }
}

`StartJobRun` request with custom logging container image, CPU, and memory for the job submitter pod


"configurationOverrides": {
  "applicationConfiguration": [
    {
      "classification": "emr-job-submitter",
      "properties": {
        "jobsubmitter.logging.image": "YOUR_ECR_IMAGE_URL",
        "jobsubmitter.logging.request.memory": "200Mi",
        "jobsubmitter.logging.request.cores": "0.5"
      }
    }
  ], 
  "monitoringConfiguration": {
    "cloudWatchMonitoringConfiguration": {
      "logGroupName": "/emr-containers/jobs", 
      "logStreamNamePrefix": "demo"
    }, 
    "s3MonitoringConfiguration": {
      "logUri": "s3://joblogs"
    }
  }
}

`StartJobRun` request with a custom job submitter container image and pull policy


"configurationOverrides": {
  "applicationConfiguration": [
    {
      "classification": "emr-job-submitter",
      "properties": {
        "jobsubmitter.container.image": "123456789012.dkr.ecr.us-west-2.amazonaws.com/emr6.11_custom_repo",
        "jobsubmitter.container.image.pullPolicy": "kubernetes pull policy"
      }
    }
  ]
}

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Submit a job run with StartJobRun

Using Amazon EMR container defaults classification

Using job submitter classification

Overview

Job submitter classification examples

In this section

StartJobRun request with On-Demand node placement for the job submitter pod

StartJobRun request with single-AZ node placement and Amazon EC2 instance type placement for the job submitter pod

StartJobRun request with labels, annotations, and a custom scheduler for the job submitter pod

StartJobRun request with a toleration applied to the job submitter pod with key dedicated, value graviton_machines, effect NoExecute, and a tolerationSeconds of 60 seconds

StartJobRun request with logging disabled for the job submitter pod

StartJobRun request with custom logging container image, CPU, and memory for the job submitter pod

StartJobRun request with a custom job submitter container image and pull policy