Skip to content

Pause and resume

Durable functions can pause for seconds, hours, or days without keeping a Lambda invocation running. The SDK suspends execution, checkpoints the wait, and resumes the handler when the wait condition is met. This turns traditional polling loops and long-timeout wait-for-reply code into structured, cost-free pauses.

The durable wait operations cover delayed-work scenarios:

Prefer wait over sleep

Do not call language-specific methods like setTimeout, time.sleep, or Thread.sleep to pause a durable function. Those keep the invocation running and reset to zero on replay. The durable waits suspend the execution and do not incur compute cost while suspended.

// Wait 24 hours with no active Lambda time.
await context.wait("cool-off", { hours: 24 });
from aws_durable_execution_sdk_python.config import Duration

context.wait(Duration.from_hours(24), name="cool-off")
import java.time.Duration;

context.wait("cool-off", Duration.ofHours(24));

Tip

Name every wait. Named waits show up in the operation history and CloudWatch, which keeps timelines legible in logs and test assertions.

Always set a callback timeout

waitForCallback suspends the execution until an external system calls the SDK's callback success or failure endpoints. Without a timeout the execution waits up to the execution timeout, holding the resource slot until an operator intervenes.

const outcome = await context.waitForCallback(
  "wait-for-approval",
  async (callbackId) => {
    await approvalsService.request({ id: event.orderId, callbackId });
  },
  { timeout: { hours: 24 } },
);
from aws_durable_execution_sdk_python.config import (
    Duration,
    WaitForCallbackConfig,
)

result = context.wait_for_callback(
    lambda callback_id, ctx: approvals_service.request(
        id=event["orderId"], callback_id=callback_id
    ),
    name="wait-for-approval",
    config=WaitForCallbackConfig(timeout=Duration.from_hours(24)),
)
import software.amazon.lambda.durable.config.CallbackConfig;
import software.amazon.lambda.durable.config.WaitForCallbackConfig;

WaitForCallbackConfig config = WaitForCallbackConfig.builder()
    .callbackConfig(CallbackConfig.builder()
        .timeout(Duration.ofHours(24))
        .build())
    .build();

Approval outcome = context.waitForCallback(
    "wait-for-approval",
    Approval.class,
    (callbackId, ctx) -> approvalsService.request(input.orderId(), callbackId),
    config);

Danger

Always set a timeout on waitForCallback to avoid stalled executions.

When the timeout elapses, the SDK raises a callback timeout error. Either let it propagate to mark the step and the execution as failed, or catch it and handle it with compensatory actions.

Use heartbeats for long external operations

A heartbeat timeout fails the callback if the external system stops checking in, even if the overall timeout has not elapsed.

The external system has to call the SDK's heartbeat endpoint periodically while the work is in progress.

const outcome = await context.waitForCallback(
  "long-running-job",
  async (callbackId) => startJob({ jobId: event.jobId, callbackId }),
  {
    timeout: { hours: 24 },
    heartbeatTimeout: { minutes: 10 },
  },
);
config = WaitForCallbackConfig(
    timeout=Duration.from_hours(24),
    heartbeat_timeout=Duration.from_minutes(10),
)
CallbackConfig cb = CallbackConfig.builder()
    .timeout(Duration.ofHours(24))
    .heartbeatTimeout(Duration.ofMinutes(10))
    .build();

Warning

A 24-hour timeout means a 24-hour outage when the external worker crashes. Set a heartbeat timeout comfortably longer than the expected interval between heartbeats, but shorter than the overall operation timeout.

Poll external services with waitForCondition

Use waitForCondition for external systems where you have to poll rather than create a callback.

The SDK runs your check function, applies a wait strategy between polls, and resumes when the check signals completion. Each poll is a step. The wait between polls suspends the execution.

Use an exponential backoff wait strategy so an unresponsive downstream system does not create a retry storm.

const final = await context.waitForCondition(
  "wait-for-job",
  async (state, ctx) => {
    const status = await jobService.getStatus(state.jobId);
    return { ...state, status };
  },
  {
    initialState: { jobId: event.jobId, status: "pending" },
    waitStrategy: (state, attempt) => {
      if (state.status === "completed") return { shouldContinue: false };
      const delaySeconds = Math.min(2 ** attempt, 60);
      return { shouldContinue: true, delay: { seconds: delaySeconds } };
    },
  },
);
from aws_durable_execution_sdk_python.config import Duration
from aws_durable_execution_sdk_python.waits import WaitForConditionConfig


def check(state, ctx):
    state["status"] = job_service.get_status(state["jobId"])
    return state


def wait_strategy(state, attempt):
    if state["status"] == "completed":
        return {"should_continue": False}
    delay = min(2 ** attempt, 60)
    return {"should_continue": True, "delay": Duration.from_seconds(delay)}


final_state = context.wait_for_condition(
    check,
    WaitForConditionConfig(
        initial_state={"jobId": event["jobId"], "status": "pending"},
        wait_strategy=wait_strategy,
    ),
    name="wait-for-job",
)
import java.time.Duration;
import java.util.Map;
import software.amazon.lambda.durable.TypeToken;
import software.amazon.lambda.durable.config.WaitForConditionConfig;
import software.amazon.lambda.durable.model.WaitForConditionResult;
import software.amazon.lambda.durable.retry.JitterStrategy;
import software.amazon.lambda.durable.retry.WaitStrategies;

var strategy = WaitStrategies.<Map<String, Object>>exponentialBackoff(
    60, Duration.ofSeconds(5), Duration.ofMinutes(1), 2.0, JitterStrategy.FULL);

var config = WaitForConditionConfig.<Map<String, Object>>builder()
    .initialState(Map.of("jobId", input.jobId(), "status", "pending"))
    .waitStrategy(strategy)
    .build();

Map<String, Object> finalState = context.waitForCondition(
    "wait-for-job",
    new TypeToken<>() {},
    (state, stepCtx) -> {
        String status = jobService.getStatus((String) state.get("jobId"));
        var updated = Map.<String, Object>of("jobId", state.get("jobId"), "status", status);
        return "completed".equals(status)
            ? WaitForConditionResult.stopPolling(updated)
            : WaitForConditionResult.continuePolling(updated);
    },
    config);

Tip

Durable waits round up to a minimum of one second. Don't use wait for condition to poll for sub-second low-latency state changes.

See also