Manage state¶
The result of a durable operation is recorded in a checkpoint. Checkpoints persist in the durable execution backend. Large checkpoints consume durable execution storage that count against the Durable execution storage written in megabytes, and slow fetching and deserializing the operation history during replay. Keep checkpointed values small. Fetch bulk data inside the step that needs it rather than persist it into the checkpoint.
What counts as durable state¶
Any data you pass as input to a handler, anything you return from a step or child context, a wait-for-condition state object, or a wait-for-callback result gets serialized and stored. Internal variables between steps do not persist across invocations.
A variable local to the handler that holds the result of a step does not consume durable state. That same variable, if it holds a million-row result set, becomes persistent state once you return it from the step.
// The step return value is checkpointed as durable state.
const value = await context.step("fetch", async () => fetchValue());
// Assigning or copying the local variable does NOT add to durable state.
const alias = value;
const copy = structuredClone(value);
// Returning the local variable from the handler DOES add it to durable state
// (the handler output is serialized and stored).
return { value };
import copy as copy_mod
# The step return value is checkpointed as durable state.
value = context.step(fetch_value())
# Assigning or copying the local variable does NOT add to durable state.
alias = value
dup = copy_mod.deepcopy(value)
# Returning the local variable from the handler DOES add it to durable state
# (the handler output is serialized and stored).
return {"value": value}
// The step return value is checkpointed as durable state.
Value value = context.step("fetch", Value.class, ctx -> fetchValue());
// Assigning or copying the local variable does NOT add to durable state.
Value alias = value;
Value copy = new Value(value); // copy constructor
// Returning the local variable from the handler DOES add it to durable state
// (the handler output is serialized and stored).
return new Result(value);
Store references, not payloads¶
A common pattern is to do the fetch inside the step, extract the identifier, and return the identifier from the step. The next step that needs the full payload does its own fetch by ID. For very large payloads, stage the data in Amazon S3, DynamoDB, or another store inside the first step and pass the key or version ID to the next step.
Warning
Returning full API responses from a step puts the entire response in the checkpoint. Extract the IDs and fields you actually need. Drop the rest.
// Wrong: full document returned and checkpointed.
const document = await context.step("fetch-document", async () => {
return s3.getObject({ Bucket: "docs", Key: event.key });
});
await context.step("summarize", async () => summarize(document));
// Right: only the reference flows between steps.
const reference = await context.step("stage-document", async () => {
const data = await s3.getObject({ Bucket: "docs", Key: event.key });
const stagedKey = await stageForProcessing(data);
return { bucket: "processing", key: stagedKey };
});
await context.step("summarize", async () => {
const data = await s3.getObject(reference);
return summarize(data);
});
# Wrong: full document returned and checkpointed.
@durable_step
def fetch_document(ctx: StepContext, key: str) -> dict:
return s3.get_object(Bucket="docs", Key=key)
# Right: pass the reference, re-fetch in the next step.
@durable_step
def stage_document(ctx: StepContext, key: str) -> dict:
data = s3.get_object(Bucket="docs", Key=key)
staged_key = stage_for_processing(data)
return {"bucket": "processing", "key": staged_key}
@durable_step
def summarize(ctx: StepContext, reference: dict) -> str:
data = s3.get_object(Bucket=reference["bucket"], Key=reference["key"])
return make_summary(data)
reference = context.step(stage_document(event["key"]))
summary = context.step(summarize(reference))
record DocumentRef(String bucket, String key) {}
DocumentRef reference = context.step(
"stage-document",
DocumentRef.class,
ctx -> {
var data = s3.getObject(b -> b.bucket("docs").key(input.key()));
String stagedKey = stageForProcessing(data);
return new DocumentRef("processing", stagedKey);
});
String summary = context.step(
"summarize",
String.class,
ctx -> {
var data = s3.getObject(b -> b.bucket(reference.bucket()).key(reference.key()));
return makeSummary(data);
});
Keep the handler input small¶
The handler input is stored once at the start of the execution and read from execution state on every replay. A 2 MB event payload stays in execution state for the life of the execution and is fetched and deserialized on every replay. For large inputs, publish the payload to a store in the client, pass only the reference, and fetch inside the first step.
Danger
Large handler inputs sit in execution state for the life of the execution, count against the storage quota, and slow every replay. Move the payload to S3 or DynamoDB at the edge and pass only a key.
Keep concurrency results small¶
The concurrency operations map and
parallel return a BatchResult
containing every item result. The SDK automatically handles large batch results for you.
When the full BatchResult exceeds 256 KB, the parent checkpoint stores a summary
(total count, success count, failure count) instead of the full payload, and
reconstructs the full result from per-item results of each child context's own
checkpoint on replay.
Even though this mechanism means the parent checkpoint does not consume excessive state,
the per-item results still persist in child checkpoints. A map over 1,000 items that
each return 5 KB stores 5 MB across the child checkpoints and consumes 1,000 operations
before retries. Plan both item count and size against
AWS Lambda service quotas.
Keep per-item state small:
- Return only what the next step needs. Inside each per-item function, extract the identifier or summary and return it. The full per-item result stays out of the checkpoint entirely.
- Stage raw results to S3 or DynamoDB. Write each item's output to external storage
from inside the per-item function, and return the key or version ID. The
BatchResultthen carries pointers, not payloads.
const results = await context.map(items, async (ctx, item) => {
const outputKey = await ctx.step("process", async () => {
const output = await processItem(item);
return storeOutput(output); // returns an S3 key, not the output itself
});
return outputKey;
});
// `results` carries pointers, not payloads.
Tip
Treat the per-item result like a step return. If it is more than a few hundred bytes,
stage the payload in S3 and return a key. Use the FLAT nesting type to skip per-item
checkpointing when the work is cheap to re-run, lowering both operation count and
storage.
Custom serialization for heavy types¶
When the natural shape of your data is large or expensive to encode, configure a custom serdes on the operation. The serdes can compress, encrypt, or offload to external storage while returning only a pointer. See Serialization for details.
Tip
A custom serdes can offload heavy payloads transparently. Steps still look like they return the object, while the serdes writes it to S3 and returns a pointer behind the scenes.
See also¶
- Serialization Per-operation serdes customization.
- Map operation and parallel operation for per-item result patterns.