

# How job configurations work
<a name="jobs-configurations-details"></a>

You use the rollout and abort configurations when you're deploying a job, and the timeout and retry configurations for job execution. The following sections show more information about how these configurations work.

**Topics**
+ [Job rollout, scheduling, and abort configurations](#job-rollout-abort-scheduling)
+ [Job execution timeout and retry configurations](#job-timeout-retry)

## Job rollout, scheduling, and abort configurations
<a name="job-rollout-abort-scheduling"></a>

You can use the job rollout, scheduling, and abort configurations to define how many devices receive the job document, schedule a job rollout, and determine the criteria for canceling a job.

### Job rollout configuration
<a name="job-rollout-configuration"></a>

You can specify how quickly targets are notified of a pending job execution. You can also create a staged rollout to manage updates, reboots, and other operations. To specify how your targets are notified, use job rollout rates.

#### Job rollout rates
<a name="job-rollout-using"></a>

You can create a rollout configuration by using either a constant rollout rate or an exponential rollout rate. To specify the maximum number of job targets to inform per minute, use a constant rollout rate.

AWS IoT jobs can be deployed using exponential rollout rates as various criteria and thresholds are met. If the number of failed jobs matches a set of criteria that you specify, then you can cancel the job rollout. You set the job rollout rate criteria when you create a job by using the [https://docs.aws.amazon.com/iot/latest/apireference/API_JobExecutionsRolloutConfig.html](https://docs.aws.amazon.com/iot/latest/apireference/API_JobExecutionsRolloutConfig.html) object. You also set the job abort criteria at job creation by using the [https://docs.aws.amazon.com/iot/latest/apireference/API_AbortConfig.html](https://docs.aws.amazon.com/iot/latest/apireference/API_AbortConfig.html) object.

The following example shows how rollout rates work. For example, a job rollout with a base rate of 50 per minute, increment factor of 2, and number of notified and succeeded devices each as 1,000, would work as follows: The job will start at a rate of 50 job executions per minute and continue at that rate until either 1,000 things have received job execution notifications, or 1,000 successful job executions have occurred. 

The following table illustrates how the rollout would proceed over the first four increments.


|  |  |  |  |  | 
| --- |--- |--- |--- |--- |
|  Rollout rate per minute  |  50  |  100  |  200  |  400  | 
|  Number of notified devices or successful job executions to satisfy a rate increase  |  1,000  |  2,000  |  3,000  |  4,000  | 

**Note**  
If you're at your max concurrent limit of 500 Jobs (`isConcurrent = True`), then all active jobs will remain with a status of `IN-PROGRESS` and not roll out any new job executions until the number of concurrent jobs is 499 or less (`isConcurrent = False)`. This applies to snapshot and continuous jobs.  
If `isConcurrent = True`, the job is currently rolling out job executions to all devices in your target group. If `isConcurrent = False`, the job has completed the rollout of all job executions to all devices in your target group. It will update its status state once all devices in your target group reach a terminal state, or a threshold percentage of your target group if you selected a job abort configuration. The Job level status states for `isConcurrent = True` and `isConcurrent = False` are both `IN_PROGRESS`.  
For more information about active and concurrent job limits, see [Active and concurrent job limits](job-limits.md#job-limits-active-concurrent).

#### Job rollout rates for continuous jobs using dynamic thing groups
<a name="job-rollout-dynamic-groups"></a>

When you use a continuous job to roll out remote operations on your fleet, AWS IoT Jobs rolls out job executions for devices in your target thing group. For new devices that are added to the dynamic thing group, these job executions continue to roll out to those devices even after the job has been created.

The rollout configuration can control the rollout rates only for devices that are added to the group until job creation. After a job has been created, for any new devices, the job executions are created in near real time as soon as the devices join the target group.

### Job scheduling configuration
<a name="job-scheduling"></a>

You can schedule a continuous or snapshot job up to a year in advance using a pre-determined start time, end time, and end behavior for what will happen to each job execution upon reaching the end time. Additionally, you can create an optional recurring maintenance window with a flexible frequency, start time, and duration for continuous jobs to roll out a job document to all devices within the target group.

#### Job scheduling configurations
<a name="jobs-scheduling-without-maintenance-window"></a>

**Start time**

The start time of a scheduled job is the future date and time that job will begin rollout of the job document to all devices in the target group. Start time for a scheduled job applies to continuous jobs and snapshot jobs. When a scheduled job is initially created, it maintains a status state of `SCHEDULED`. Upon arriving at the `startTime` that you selected, it updates to `IN_PROGRESS` and begins the job document rollout. The `startTime` must be less than or equal to one year from the initial date and time that you created the scheduled job.

For more information on the syntax for `startTime` when using an API command or the AWS CLI, see [Timestamp](https://docs.aws.amazon.com/cli/latest/userguide/cli-usage-parameters-types.html#parameter-type-timestamp).

For a job with the optional scheduling configuration that takes place during a recurring maintenance window in a location observing daylight savings time (DST), the time will change by one hour when switching from DST to standard time and from standard time to DST.

**Note**  
The time zone displayed in the AWS Management Console is your current system time zone. However, these time zones will be converted into UTC in the system.

**End time**

The end time of a scheduled job is the future date and time that the job will stop rollout of the job document to any remaining devices in the target group. End time for a scheduled job applies to continuous jobs and snapshot jobs. After a scheduled job arrives at the selected `endTime`, and all job executions have reached a terminal state, it updates its status state from `IN_PROGRESS` to `COMPLETED`. The `endTime` must be less than or equal to two years from the initial date and time that you created the scheduled job. The minimum duration between `startTime` and `endTime` is 30 minutes. Job execution retry attempts will occur until the job reaches the `endTime`, then the `endBehavior` will dictate how to proceed.

For more information on the syntax for `endTime` when using an API command or the AWS CLI, see [Timestamp](https://docs.aws.amazon.com/cli/latest/userguide/cli-usage-parameters-types.html#parameter-type-timestamp).

For a job with the optional scheduling configuration that takes place during a recurring maintenance window in a location observing daylight savings time (DST), the time will change by one hour when switching from DST to standard time and from standard time to DST.

**Note**  
The time zone displayed in the AWS Management Console is your current system time zone. However, these time zones will be converted into UTC in the system.

**End behavior**

The end behavior of a scheduled job determines what happens to the job and all unfinished job executions when the job reaches the selected `endTime`.

The following lists the end behaviors that you can select from when creating the job or job template:
+ `STOP_ROLLOUT`
  + `STOP_ROLLOUT` stops the rollout of the job document to all remaining devices in the target group for the job. Additionally, all `QUEUED` and `IN_PROGRESS` job executions will continue until they reach a terminal state. This is the default end behavior unless you select `CANCEL` or `FORCE_CANCEL`.
+ `CANCEL`
  + `CANCEL` stops the rollout of the job document to all remaining devices in the target group for the job. Additionally, all `QUEUED` job executions will be cancelled while all `IN_PROGRESS` job executions will continue until they reach a terminal state.
+ `FORCE_CANCEL`
  + `FORCE_CANCEL` stops the rollout of the job document to all remaining devices in the target group for the job. Additionally, all `QUEUED` and `IN_PROGRESS` job executions will be cancelled.

**Note**  
To select an `endbehavior`, you must select an `endtime`

**Max duration**

The max duration of a scheduled job must be less than or equal to two years regardless of the `startTime` and `endTime`. 

The following table lists common duration scenarios of a scheduled job:


| **Scheduled Job example number** | **startTime** | **endTime** | **Max duration** | 
| --- | --- | --- | --- | 
|  1  |  Immediately after initial job creation.  |  One year after initial job creation.  |  One year  | 
|  2  |  One month after initial job creation.  |  13 months after initial job creation.  |  One year  | 
|  3  |  One year after initial job creation.  |  Two years after initial job creation.  |  One year  | 
|  4  |  Immediately after initial job creation.  |  Two years after initial job creation.  |  Two years  | 

#### Recurring maintenance window
<a name="jobs-scheduling-maintenance-window"></a>

The maintenance window is an optional configuration within the scheduling configuration of the AWS Management Console and `SchedulingConfig` within the `CreateJob` and `CreateJobTemplate` APIs. You can set up a recurring maintenance window with a predetermined start time, duration, and frequency (daily, weekly, or monthly) that the maintenance window occurs. Maintenance windows only apply to continuous jobs. The maximum duration of a recurring maintenance window is 23 hours, 50 minutes.

The following diagram illustrates the job status states for various scheduled job scenarios with an optional maintenance window:

![\[A diagram showing the lifecycle of a continuous job, progressing through states of SCHEDULED, IN_PROGRESS, CANCELLED, and DELETION_IN_PROGRESS upon certain events.\]](http://docs.aws.amazon.com/iot/latest/developerguide/images/job-states-diagram-scheduled-maintenance-window.png)


For more information about job status states, see [Jobs and job execution states](iot-jobs-lifecycle.md).

**Note**  
If a job arrives at the `endTime` during a maintenance window, it will update from `IN_PROGRESS` to `COMPLETED`. Additionally, any remaining job executions will follow the `endBehavior` for the job.

**Cron expressions**

For scheduled jobs rolling out the job document during a maintenance window with a custom frequency, the custom frequency is entered using a cron expression. A cron expression has six required fields, which are separated by white space. 

**Syntax**

```
cron(fields)
```


| **Field** | **Values** | **Wildcards** | 
| --- | --- | --- | 
|  Minutes  |  0-59  |  , - \$1 /  | 
|  Hours  |  0-23  |  , - \$1 /  | 
|  Day-of-month  |  1-31  |  , - \$1 ? / L W  | 
|  Month  |  1-12 or JAN-DEC  |  , - \$1 /  | 
|  Day-of-week  |  1-7 or SUN-SAT  |  , - \$1 ? L \$1  | 
|  Year  |  1970-2199  |  , - \$1 /  | 

**Wildcards**
+ The **,** (comma) wildcard includes additional values. In the Month field, JAN,FEB,MAR would include January, February, and March.
+ The **-** (dash) wildcard specifies ranges. In the Day field, 1-15 would include days 1 through 15 of the specified month.
+ The **\$1** (asterisk) wildcard includes all values in the field. In the Hours field, **\$1** would include every hour. You can't use **\$1** in both the Day-of-month and Day-of-week fields. If you use it in one, you must use **?** in the other.
+ The **/** (forward slash) wildcard specifies increments. In the Minutes field, you could enter 1/10 to specify every tenth minute, starting from the first minute of the hour (for example, the 11th, 21st, and 31st minute, and so on).
+ The **?** (question mark) wildcard specifies one or another. In the Day-of-month field, you could enter **7** and if you didn't care what day of the week the 7th was, you could enter **?** in the Day-of-week field.
+ The **L** wildcard in the Day-of-month or Day-of-week fields specifies the last day of the month or week.
+ The **W** wildcard in the Day-of-month field specifies a weekday. In the Day-of-month field, **3W** specifies the weekday closest to the third day of the month.
+ The **\$1** wildcard in the Day-of-week field specifies a certain instance of the specified day of the week within a month. For example, 3\$12 would be the second Tuesday of the month: the 3 refers to Tuesday because it is the third day of each week, and the 2 refers to the second day of that type within the month.
**Note**  
If you use a '\$1' character, you can define only one expression in the day-of-week field. For example, `"3#1,6#3"` isn't valid because it's interpreted as two expressions.

**Restrictions**
+ You can't specify the Day-of-month and Day-of-week fields in the same cron expression. If you specify a value (or a \$1) in one of the fields, you must use a **?**in the other.

**Examples**

Refer to the following sample cron strings when using a cron expression for the `startTime` of a recurring maintenance window.


| **Minutes** | **Hours** | **Day of month** | **Month** | **Day of week** | **Year** | **Meaning** | 
| --- | --- | --- | --- | --- | --- | --- | 
| 0 | 10 | \$1 | \$1 | ? | \$1 |  Run at 10:00 am (UTC) every day  | 
| 15 | 12 | \$1 | \$1 | ? | \$1 |  Run at 12:15 pm (UTC) every day  | 
| 0 | 18 | ? | \$1 | MON-FRI | \$1 |  Run at 6:00 pm (UTC) every Monday through Friday  | 
| 0 | 8 | 1 | \$1 | ? | \$1 |  Run at 8:00 am (UTC) every first day of the month  | 

#### Recurring maintenance window duration end logic
<a name="jobs-scheduling-maintenance-window-end-behavoir"></a>

When a job rollout during a maintenance window reaches the end of the current maintenance window occurrence duration, the following actions will occur:
+ The Job will cease all rollouts of the job document to any remaining devices in your target group. It will resume at the `startTime` of the next maintenance window.
+ All job executions with a status of `QUEUED` will remain in `QUEUED` until the `startTime` of the next maintenance window occurrence. In the next window, they can switch to `IN_PROGRESS` when the device is ready to begin performing the actions specified in the job document.
+ All job executions with a status of `IN_PROGRESS` will continue performing the actions specified in the job document until they reach a terminal state. Any retry attempts as specified in `JobExecutionsRetryConfig` will take place at the `startTime` of the next maintenance window.

### Job abort configuration
<a name="job-abort-using"></a>

Use this configuration to create a criteria to cancel a job when a threshold percentage of devices meet that criteria. For example, you can use this configuration to cancel a job in the following cases: 
+ When a threshold percentage of devices don't receive the job execution notifications, such as when your device is incompatible for an Over-The-Air (OTA) update. In this case, your device can report a `REJECTED` status.
+ When a threshold percentage of devices report failure for their job executions, such as when your device encounters a disconnection when attempting to download the job document from an Amazon S3 URL. In such cases, your device must be programmed to report the `FAILURE` status to AWS IoT.
+ When a `TIMED_OUT` status is reported because the job execution times out for a threshold percentage of devices after the job executions have started.
+ When there are multiple retry failures. When you add a retry configuration, each retry attempt can incur additional charges to your AWS account. In such cases, canceling the job can cancel queued job executions and avoid retry attempts for these executions. For more information about the retry configuration and using it with the abort configuration, see [Job execution timeout and retry configurations](#job-timeout-retry).

You can set up a job abort condition by using the AWS IoT console or the AWS IoT Jobs API.

## Job execution timeout and retry configurations
<a name="job-timeout-retry"></a>

Use the job execution timeout configuration to send you [Jobs notifications](jobs-comm-notifications.md) when a job execution has been in progress for longer than the set duration. Use the job execution retry configuration to retry the execution when the job fails or times out.

### Job execution timeout configuration
<a name="job-timeout-configuration"></a>

Use the job execution timeout configuration to notify you whenever a job execution gets stuck in the `IN_PROGRESS` state for an unexpectedly long period of time. When the job is `IN_PROGRESS`, you can monitor the progress of your job execution.

#### Timers for job timeouts
<a name="job-timeout-timers"></a>

There are two types of timers: in-progress timers and step timers.

**In-progress timers**  
When you create a job or a job template, you can specify a value for the in-progress timer that's between 1 minute and 7 days. You can update the value of this timer until the start of your job execution. After your timer starts, it can't be updated, and the timer value applies to all job executions for the job. Whenever a job execution remains in the `IN_PROGRESS` status for longer than this interval, the job execution fails and switches to the terminal `TIMED_OUT` status. AWS IoT also publishes an MQTT notification.

**Step timer**  
You can also set a step timer that applies to only the job execution that you want to update. This timer has no effect on the in-progress timer. Each time you update a job execution, you can set a new value for the step timer. You can also create a new step timer when starting the next pending job execution for a thing. If the job execution remains in the `IN_PROGRESS` status for longer than the step timer interval, it fails and switches to the terminal `TIMED_OUT` status.

**Note**  
You can set the in-progress timer by using the AWS IoT console or the AWS IoT Jobs API. To specify the step timer, use the API.

#### How timers work for job timeouts
<a name="job-timeout-timers-works"></a>

The following illustrates the ways in which in-progress timeouts and step timeouts interact with each other in a 20-minute timeout period.

![\[A timeline showing an in-progress timer of 20 minutes with nested step timers of 7, 5, and 8 minutes.\]](http://docs.aws.amazon.com/iot/latest/developerguide/images/timeout-diagram.png)


The following shows the different steps:

1. 

**12:00**  
A new job is created and an in-progress timer for twenty minutes is started when creating a job. The in-progress timer starts to run and the job execution switches to `IN_PROGRESS` status.

1. 

**12:05 PM**  
A new step timer with a value of 7 minutes is created. The job execution will now time out at 12:12 PM.

1. 

**12:10 PM**  
A new step timer with a value of 5 minutes is created. When a new step timer is created, the previous step timer is discarded, and the job execution will now time out at 12:15 PM.

1. 

**12:13 PM**  
A new step timer with a value of 9 minutes is created. The previous step timer is discarded and the job execution will now time out at 12:20 PM because the in-progress timer times out at 12:20 PM. The step timer can't exceed the in-progress timer's absolute bound.

### Job execution retry configuration
<a name="job-retry-configuration"></a>

You can use the retry configuration to retry the job execution when a certain set of criteria is met. A retry can be attempted when a job times out or when the device fails. To retry execution because of a timeout failure, you must enable the timeout configuration.

**How to use the retry configuration**  
Use the following steps to retry the configuration:

1. Determine whether to use the retry configuration for `FAILED`, `TIMED_OUT`, or both failure criteria. For the `TIMED_OUT` status, after the status is reported, AWS IoT Jobs automatically retries the job execution for the device.

1. For the `FAILED` status, check whether your job execution failure can be retried. If it's retryable, program your device to report a `FAILURE` status to AWS IoT. The following section describes more about retryable and non-retryable failures. 

1. Specify the number of retries to use for each failure type by using the preceding information. For a single device, you can specify up to 10 retries for both failure types combined. The retry attempts stop automatically when an execution succeeds or when it reaches the specified number of attempts.

1. Add an abort configuration to cancel the job if there are repeated retry failures to avoid additional charges from being incurred with a large number of retry attempts.

**Note**  
When a job reaches the end of a recurring maintenance window occurrence, all `IN_PROGRESS` job executions will continue performing actions identified in the job document until they reach a terminal state. If a job execution reaches a terminal state of `FAILED` or `TIMED_OUT` outside of a maintenance window, a retry attempt will occur in the next window if the attempts aren't exhausted. At the `startTime` of the next maintenance window occurrence, a new job execution will be created and enter a status state of `QUEUED` until the device is ready to begin.

**Retry and abort configuration**  
Each retry attempt incurs additional charges to your AWS account. To avoid incurring additional charges from repeated retry failures, we recommend adding an abort configuration. For more information about pricing, see [AWS IoT Device Management pricing](https://aws.amazon.com/iot-device-management/pricing/).

You might encounter multiple retry failures when a high threshold percentage of your devices either time out or report failure. In this case, you can use the abort configuration to cancel the job and avoid any queued job executions or further retry attempts.

**Note**  
When the abort criteria is met for canceling a job execution, only `QUEUED` job executions are canceled. Any queued retries for the device will not be attempted. However, current job executions that have an `IN_PROGRESS` status will not be canceled.

Before retrying a failed job execution, we also recommend that you check whether your job execution failure is retryable, as described in the following section.

**Retry for failure type of `FAILED`**  
To attempt retries for failure type of `FAILED`, your devices must be programmed to report the `FAILURE` status for a failed job execution to AWS IoT. Set the retry configuration with the criteria to retry `FAILED` job executions and specify the number of retries to be performed. When AWS IoT Jobs detects the `FAILURE` status, it will then automatically attempt to retry the job execution for the device. The retries continue until the job execution succeeds or when it reaches the maximum number of retry attempts.

You can track each retry attempt and the job that's running on these devices. By tracking the execution status, after the specified number of retries have been attempted, you can use your device to report failures and initiate another retry attempt. 

**Retryable and non-retryable failures**  
Your job execution failure can be retryable or non-retryable. Each retry attempt can incur charges to your AWS account. To avoid incurring additional charges from multiple retry attempts, first consider checking whether your job execution failure is retryable. An example of retryable failure includes a connection error that your device encounters while attempting to download the job document from an Amazon S3 URL. If your job execution failure is retryable, program your device to report a `FAILURE` status in case the job execution fails. Then, set the retry configuration to retry `FAILED` executions. 

If the execution can't be retried, to avoid retrying and potentially incurring additional charges to your account, we recommend that you program the device to report a `REJECTED` status to AWS IoT. Examples of non-retryable failure include when your device is incompatible of receiving a job update, or when it experiences a memory error while executing a job. In these cases, AWS IoT Jobs will not retry the job execution because it retries the job execution only when it detects a `FAILED` or `TIMED_OUT` status. 

After you've determined that a job execution failure is retryable, if a retry attempt still fails, consider checking the device logs.

**Note**  
When a job with the optional scheduling configuration reaches its `endTime`, the selected `endBehavior` will stop the rollout of the job document to all remaining devices in the target group and dictate how to proceed with the remaining job executions. The attempts are retried if selected via the retry configuration. 

**Retry for failure type of `TIMEOUT`**  
If you enable timeout when creating a job, then AWS IoT Jobs will attempt to retry the job execution for the device when the status changes from `IN_PROGRESS` to `TIMED_OUT`. This status change can occur when the in-progress timer times out, or when a step timer that you specify is in `IN_PROGRESS` and then times out. The retries continue until the job execution succeeds, or when it reaches the maximum number of retry attempts for this failure type.

**Continuous jobs and thing group membership updates**  
For continuous jobs that have a job status as `IN_PROGRESS`, the number of retry attempts is reset to zero when there are updates to a thing's group membership. For example, consider that you specified five retry attempts and three retries have already been performed. If a thing is now removed from the thing group and then rejoins the group, such as with dynamic thing groups, the number of retry attempts is reset to zero. You can now perform five retry attempts for your thing group instead of the two attempts that were remaining. In addition, when a thing is removed from the thing group, additional retry attempts are canceled.