AGENTCOST01-BP01 Use the reflection pattern to design efficient agent reasoning loops

Unbounded reasoning loops consume tokens unpredictably and can result in higher than expected token consumption for routine tasks. A bounded reflection pattern gives you predictable token budgets and preserves decision quality.

Desired outcome:

You have explicit termination conditions for every agent: a maximum iteration count, a confidence threshold, and a per-session token budget.
You apply reflection selectively, triggering full self-correction only when initial output quality falls below a threshold.
You track per-cycle token consumption and decision quality so termination parameters can be tuned from data rather than guesswork.

Common anti-patterns:

Running agents without iteration limits or cost caps, allowing indefinite token consumption without progress toward the task.
Applying expensive reflection and self-correction to every output, regardless of whether the initial answer was already good.
Operating without per-cycle token instrumentation, so no one can tell which reasoning phase drives cost.
Using fixed iteration counts instead of confidence thresholds, which either wastes tokens on unnecessary iterations or cuts off complex reasoning prematurely.
Building reflection patterns without budget guardrails, so unbounded loops consume tokens before alerts fire.

Benefits of establishing this best practice:

Predictable token consumption through bounded reasoning cycles with explicit termination conditions.
Selective reflection preserves decision quality for ambiguous cases while reducing token waste on straightforward tasks.
Cost-quality baselines reveal which reasoning patterns deliver the best trade-offs, enabling data-driven tuning of thresholds.

Level of risk exposed if this best practice is not established: Medium

Implementation guidance

Every reflection loop assumes that another iteration will improve the answer more than it costs, which works with ambiguous tasks but often loses value on straightforward ones. Without that contract, agents reflect on every output regardless of whether reflection improves quality. The discipline is to emit a structured confidence signal alongside each action, inspect it in the orchestration layer, and short-circuit the loop when confidence clears a threshold. Otherwise the loop runs until it hits a hard iteration ceiling, which is both the slowest and most expensive outcome for the common case.

Enforcement matters as much as the contract. Iteration caps expressed only in the system prompt can drift past under adversarial inputs or prompt injections. Amazon Bedrock AgentCore Policy applies Cedar policies at the Amazon Bedrock AgentCore Gateway boundary, so iteration and token limits are rejected at the traffic layer rather than noticed after they're exceeded. Amazon Bedrock AgentCore Runtime provides session-isolated execution and consumption-based pricing, so each session carries its own budget and one runaway session doesn't corrupt accounting for others.

Selective reflection separates ambiguity handling from cheaper routine work. Score the initial output against a lightweight rubric, a small model or heuristic, and gate full reflection on that score. Tag reflection outcomes with the task category so you can see where reflection consistently improves quality and where it adds cost with no benefit. Categories that never benefit from reflection should have the trigger disabled entirely. Amazon Bedrock AgentCore Evaluations supports LLM-as-a-Judge assessment of decision quality, which gives you an objective confidence signal rather than a self-reported one from the agent being evaluated.

The plan, execute, verify, and reflect phases within a reflection cycle have different reasoning intensities. Routing planning and verification to smaller, faster models while reserving the largest model for execution captures cumulative savings on the frequent low-cost phases, offsetting the higher per-token cost of the infrequent high-intensity phase.

Implementation steps

Define explicit termination conditions per agent: Set a maximum iteration count, a confidence threshold, and a per-session token budget, and enforce them through Amazon Bedrock AgentCore Policy Cedar policies at the AgentCore Gateway boundary so enforcement happens at the traffic layer rather than in application code.
Instrument per-cycle token consumption: Enable Amazon Bedrock AgentCore Observability to capture per-session token counts through OpenTelemetry, and configure Amazon CloudWatch alarms on anomalous per-cycle patterns.
Establish objective confidence thresholds: Configure Amazon Bedrock AgentCore Evaluations to score decision quality through LLM-as-a-Judge, and anchor early-termination thresholds to measured quality rather than self-reported confidence.
Gate reflection on initial output quality: Score each initial output with a lightweight rubric and trigger the full reflection pass only when the score falls below a configurable threshold, keeping reflection overhead off the straightforward cases.
Recalibrate thresholds on a cadence: Review cost-quality baselines monthly (or quarterly for stable workloads) and adjust confidence thresholds, iteration limits, and reflection triggers based on the distribution of observed outcomes.

Resources

Related best practices:

Related documents:

Related videos:

Related examples:

GitHub: awslabs/amazon-bedrock-agentcore-samples - Runtime tutorials

Related workshops:

Diving Deep into Bedrock AgentCore - Evaluations

Related services:

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Reasoning and execution cost optimization

AGENTCOST01-BP02 Optimize multi-agent collaboration cost through efficient handoff patterns