Pause and resume¶
Durable functions can pause for seconds, hours, or days without keeping a Lambda invocation running. The SDK suspends execution, checkpoints the wait, and resumes the handler when the wait condition is met. This turns traditional polling loops and long-timeout wait-for-reply code into structured, cost-free pauses.
The durable wait operations cover delayed-work scenarios:
- wait for a fixed duration.
- waitForCallback for an external system signalling completion.
- waitForCondition for polling a check function on a schedule.
Prefer wait over sleep¶
Do not call language-specific methods like setTimeout, time.sleep, or Thread.sleep
to pause a durable function. Those keep the invocation running and reset to zero on
replay. The durable waits suspend the execution and do not incur compute cost while
suspended.
Tip
Name every wait. Named waits show up in the operation history and CloudWatch, which keeps timelines legible in logs and test assertions.
Always set a callback timeout¶
waitForCallback suspends the execution until an external system calls the SDK's
callback success or failure endpoints. Without a timeout the execution waits
up to the execution timeout, holding the resource slot until an operator intervenes.
from aws_durable_execution_sdk_python.config import (
Duration,
WaitForCallbackConfig,
)
result = context.wait_for_callback(
lambda callback_id, ctx: approvals_service.request(
id=event["orderId"], callback_id=callback_id
),
name="wait-for-approval",
config=WaitForCallbackConfig(timeout=Duration.from_hours(24)),
)
import software.amazon.lambda.durable.config.CallbackConfig;
import software.amazon.lambda.durable.config.WaitForCallbackConfig;
WaitForCallbackConfig config = WaitForCallbackConfig.builder()
.callbackConfig(CallbackConfig.builder()
.timeout(Duration.ofHours(24))
.build())
.build();
Approval outcome = context.waitForCallback(
"wait-for-approval",
Approval.class,
(callbackId, ctx) -> approvalsService.request(input.orderId(), callbackId),
config);
Danger
Always set a timeout on waitForCallback to avoid stalled executions.
When the timeout elapses, the SDK raises a callback timeout error. Either let it propagate to mark the step and the execution as failed, or catch it and handle it with compensatory actions.
Use heartbeats for long external operations¶
A heartbeat timeout fails the callback if the external system stops checking in, even if the overall timeout has not elapsed.
The external system has to call the SDK's heartbeat endpoint periodically while the work is in progress.
Warning
A 24-hour timeout means a 24-hour outage when the external worker crashes. Set a heartbeat timeout comfortably longer than the expected interval between heartbeats, but shorter than the overall operation timeout.
Poll external services with waitForCondition¶
Use waitForCondition for external systems where you have to poll rather than create a
callback.
The SDK runs your check function, applies a wait strategy between polls, and resumes when the check signals completion. Each poll is a step. The wait between polls suspends the execution.
Use an exponential backoff wait strategy so an unresponsive downstream system does not create a retry storm.
const final = await context.waitForCondition(
"wait-for-job",
async (state, ctx) => {
const status = await jobService.getStatus(state.jobId);
return { ...state, status };
},
{
initialState: { jobId: event.jobId, status: "pending" },
waitStrategy: (state, attempt) => {
if (state.status === "completed") return { shouldContinue: false };
const delaySeconds = Math.min(2 ** attempt, 60);
return { shouldContinue: true, delay: { seconds: delaySeconds } };
},
},
);
from aws_durable_execution_sdk_python.config import Duration
from aws_durable_execution_sdk_python.waits import WaitForConditionConfig
def check(state, ctx):
state["status"] = job_service.get_status(state["jobId"])
return state
def wait_strategy(state, attempt):
if state["status"] == "completed":
return {"should_continue": False}
delay = min(2 ** attempt, 60)
return {"should_continue": True, "delay": Duration.from_seconds(delay)}
final_state = context.wait_for_condition(
check,
WaitForConditionConfig(
initial_state={"jobId": event["jobId"], "status": "pending"},
wait_strategy=wait_strategy,
),
name="wait-for-job",
)
import java.time.Duration;
import java.util.Map;
import software.amazon.lambda.durable.TypeToken;
import software.amazon.lambda.durable.config.WaitForConditionConfig;
import software.amazon.lambda.durable.model.WaitForConditionResult;
import software.amazon.lambda.durable.retry.JitterStrategy;
import software.amazon.lambda.durable.retry.WaitStrategies;
var strategy = WaitStrategies.<Map<String, Object>>exponentialBackoff(
60, Duration.ofSeconds(5), Duration.ofMinutes(1), 2.0, JitterStrategy.FULL);
var config = WaitForConditionConfig.<Map<String, Object>>builder()
.initialState(Map.of("jobId", input.jobId(), "status", "pending"))
.waitStrategy(strategy)
.build();
Map<String, Object> finalState = context.waitForCondition(
"wait-for-job",
new TypeToken<>() {},
(state, stepCtx) -> {
String status = jobService.getStatus((String) state.get("jobId"));
var updated = Map.<String, Object>of("jobId", state.get("jobId"), "status", status);
return "completed".equals(status)
? WaitForConditionResult.stopPolling(updated)
: WaitForConditionResult.continuePolling(updated);
},
config);
Tip
Durable waits round up to a minimum of one second. Don't use wait for condition to poll for sub-second low-latency state changes.