Determinism during replay¶

The Durable Execution SDK checkpoints your code so that it can terminate the current invocation and not consume compute while it waits for a timed duration or processing result to be ready. The AWS Lambda backend re-invokes the function when it is ready to resume processing.

Durable functions run your handler from the top on every invocation. A step that completed in an earlier invocation returns its checkpointed result on replay without re-executing the code inside the step. Anything outside a step or other durable operation runs every time the function replays.

For replay to follow the same path, the code that runs every time has to produce the same values. That is determinism. Non-deterministic code inside your handler body can send control flow down a different branch on replay, so a downstream step runs with the wrong inputs.

Handler code must be deterministic¶

Any code that is not inside a durable operation must be a pure function of the handler inputs and the results of completed operations. Anything that depends on wall-clock time, a random source, an external service, the local file system, or mutable global state is non-deterministic and must run inside a durable operation.

Concrete examples of code that is not deterministic:

Time and identity Date.now(), time.time(), Instant.now(), UUID generation, or anything that returns a different value each call.
External I/O HTTP calls, database reads, AWS SDK calls, reading files.
Random numbers Math.random(), random.random(), Random.

Non-deterministic code must be in a durable operation¶

A step checkpoints its return value. On replay the step returns the checkpointed value instead of running the underlying code. Wrapping a non-deterministic call in a step means the value will always be the result of the first successful completion of that code.

TypeScriptPythonJava

import { withDurableExecution, DurableContext } from "@aws/durable-execution-sdk-js";
import { randomUUID } from "crypto";

export const handler = withDurableExecution(
  async (event: { amount: number }, context: DurableContext) => {
    const transactionId = await context.step(
      "generate-transaction-id",
      async () => randomUUID(),
    );

    const receipt = await context.step("charge", async () => {
      return charge(event.amount, transactionId);
    });

    return { transactionId, receipt };
  },
);

import uuid
from aws_durable_execution_sdk_python import DurableContext, durable_execution, durable_step
from aws_durable_execution_sdk_python.types import StepContext


@durable_step
def generate_transaction_id(ctx: StepContext) -> str:
    return str(uuid.uuid4())


@durable_step
def charge(ctx: StepContext, amount: float, transaction_id: str) -> dict:
    return payment_service.charge(amount, transaction_id)


@durable_execution
def handler(event: dict, context: DurableContext) -> dict:
    transaction_id = context.step(generate_transaction_id())
    receipt = context.step(charge(event["amount"], transaction_id))
    return {"transactionId": transaction_id, "receipt": receipt}

import software.amazon.lambda.durable.DurableContext;
import software.amazon.lambda.durable.DurableHandler;
import java.util.UUID;

public class ChargeHandler implements DurableHandler<ChargeInput, Receipt> {
    @Override
    public Receipt handle(ChargeInput input, DurableContext context) {
        String transactionId = context.step(
            "generate-transaction-id",
            String.class,
            ctx -> UUID.randomUUID().toString());

        return context.step(
            "charge",
            Receipt.class,
            ctx -> paymentService.charge(input.amount(), transactionId));
    }
}

Because the SDK checkpoints the result of generate-transaction-id, every replay sees the same transactionId and the charge step receives the same argument. Without the wrapper, UUID.randomUUID() would produce a new value on every replay and the downstream step would either double-charge or hit an idempotency error from the payment service.

Tip

Wrap every non-deterministic call inside a step.

Pass data through return values, not closures¶

State outside a step resets to its initial value on replay. Steps return their cached results, but assignments, mutations, and pushes that happen outside steps run again on every invocation. Although this pattern looks like it works on the first invocation, it breaks as soon as the workflow replays after a crash or a wait.

TypeScriptPythonJava

// Wrong: total mutates outside the step, replay restarts it at 0.
export const handler = withDurableExecution(async (event, context) => {
  let total = 0;
  for (const item of event.items) {
    await context.step(`save-${item.id}`, async () => saveItem(item));
    total += item.price;
  }
  return { total };
});

// Right: each step returns the new running total.
export const handler = withDurableExecution(async (event, context) => {
  let total = 0;
  for (const item of event.items) {
    total = await context.step(`save-${item.id}`, async () => {
      await saveItem(item);
      return total + item.price;
    });
  }
  return { total };
});

# Wrong: total mutates outside the step, replay restarts it at 0.
@durable_execution
def handler(event, context: DurableContext) -> dict:
    total = 0
    for item in event["items"]:
        context.step(save_item(item), name=f"save-{item['id']}")
        total += item["price"]
    return {"total": total}


# Right: each step returns the new running total.
@durable_step
def save_and_accumulate(ctx: StepContext, item: dict, running_total: float) -> float:
    save_item(item)
    return running_total + item["price"]


@durable_execution
def handler(event, context: DurableContext) -> dict:
    total = 0.0
    for item in event["items"]:
        total = context.step(
            save_and_accumulate(item, total),
            name=f"save-{item['id']}",
        )
    return {"total": total}

// Right: each step returns the new running total.
double total = 0.0;
for (Item item : input.items()) {
    final double running = total;
    total = context.step(
        "save-" + item.id(),
        Double.class,
        ctx -> {
            saveItem(item);
            return running + item.price();
        });
}
return new Result(total);

For processing a list of independent items, map is a simpler choice than an explicit loop. It runs the per-item operation in parallel, checkpoints each result, and returns a BatchResult you can reduce. Use an explicit loop when items depend on each other, such as a running total or chained transformations. Keep the loop deterministic. Each step must produce the same result on replay.

Danger

Mutating state outside a step fails silently. The first invocation looks correct. Replay resets the mutation while steps return their cached results.

Keep branches stable across replay¶

Control flow decisions made outside steps must depend only on deterministic inputs. If an if or switch depends on something non-deterministic, replay can walk a different branch and attempt to return results from operations that never ran. Wrap the non-deterministic decision into a step and branch on the step's return value.

TypeScriptPythonJava

// Wrong: replay may see different values of `new Date()`.
if (new Date().getHours() < 12) {
  await context.step("morning-work", async () => runMorning());
} else {
  await context.step("afternoon-work", async () => runAfternoon());
}

// Right: the SDK checkpoints the decision.
const shift = await context.step("pick-shift", async () => {
  return new Date().getHours() < 12 ? "morning" : "afternoon";
});
if (shift === "morning") {
  await context.step("morning-work", async () => runMorning());
} else {
  await context.step("afternoon-work", async () => runAfternoon());
}

# Wrong: replay may see different values of datetime.now().
from datetime import datetime

if datetime.now().hour < 12:
    context.step(run_morning(), name="morning-work")
else:
    context.step(run_afternoon(), name="afternoon-work")

# Right: the SDK checkpoints the decision.
shift = context.step(pick_shift(), name="pick-shift")
if shift == "morning":
    context.step(run_morning(), name="morning-work")
else:
    context.step(run_afternoon(), name="afternoon-work")


@durable_step
def pick_shift(ctx: StepContext) -> str:
    return "morning" if datetime.now().hour < 12 else "afternoon"

// Wrong: replay may see different values of LocalTime.now().
if (LocalTime.now().getHour() < 12) {
    context.step("morning-work", Void.class, ctx -> { runMorning(); return null; });
} else {
    context.step("afternoon-work", Void.class, ctx -> { runAfternoon(); return null; });
}

// Right: the SDK checkpoints the decision.
String shift = context.step(
    "pick-shift",
    String.class,
    ctx -> LocalTime.now().getHour() < 12 ? "morning" : "afternoon");
if ("morning".equals(shift)) {
    context.step("morning-work", Void.class, ctx -> { runMorning(); return null; });
} else {
    context.step("afternoon-work", Void.class, ctx -> { runAfternoon(); return null; });
}

The same rule applies to reading from external services. Fetching a flag from a database outside a step risks replaying against a changed value. Fetch inside a step, branch on the returned value.

Warning

Feature flags, environment variables read at runtime, and configuration pulled from a remote store could all change between the first invocation and a replay. Capture the value inside a step so the evaluation criteria are stable.

Determinism during replay¶

Handler code must be deterministic¶

Non-deterministic code must be in a durable operation¶

Pass data through return values, not closures¶

Keep branches stable across replay¶

See also¶