AGENTPERF03-BP05 Implement agentic retrieval patterns for dynamic, agent-driven knowledge access

Complex questions often require information from multiple sources, iterative refinement, or real-time data that a single retrieval pass can't provide. In agentic retrieval the agent actively controls the retrieval process as part of its reasoning loop, deciding when to retrieve, what to retrieve, which retrieval tool to use, and whether the retrieved context is sufficient before proceeding. Each iteration adds embedding generation, vector search, re-ranking, and context injection overhead, so the retrieval loop needs explicit termination conditions.

Desired outcome:

You have agents retrieving the right information in the minimum number of iterations required.
You have simple questions answered with a single retrieval and complex questions handled through structured multi-hop retrieval with explicit termination conditions.
You have the agent selecting the most appropriate retrieval tool for each query type.
You have retrieval iteration counts, per-iteration latency, and sufficiency rates tracked and optimized.

Common anti-patterns:

Treating all retrieval as a single-shot preprocessing step, forcing the agent to work with whatever context was retrieved on the first attempt regardless of sufficiency.
Allowing agents to retrieve iteratively without retrieval budgets or termination conditions, producing unbounded retrieval loops that consume tokens and latency without converging.
Routing all retrieval through a single pipeline regardless of query type, missing opportunities to use faster or more appropriate retrieval tools for different information needs.

Benefits of establishing this best practice:

Parallel sub-query execution and retrieval-tool routing reduce end-to-end latency by selecting the fastest appropriate source.
Explicit budgets that cap iterations and total tokens keep retrieval costs under control.

Level of risk exposed if this best practice is not established: High

Implementation guidance

Design retrieval as a set of agent tools rather than a monolithic pipeline. Distinct retrieval tools for different knowledge access patterns let the agent route to the right source:

A semantic search tool backed by Amazon Bedrock Knowledge Bases for conceptual questions
A structured query tool for exact lookups by identifier
A real-time data tool for information requiring current values
A web search tool for questions beyond the organization's knowledge base
A document processing tool backed by Amazon Bedrock Data Automation for extracting structured data from images, forms, and tables

Amazon Bedrock AgentCore Gateway exposes retrieval tools as MCP-compatible endpoints, and registering each tool with clear descriptions, what question types it handles, what data sources it accesses, and its expected latency guides the agent's tool selection.

Retrieval sufficiency evaluation is a lightweight assessment after each retrieval iteration, typically run by a smaller, faster model. The evaluator judges whether the retrieved context is sufficient, identifies gaps, and formulates refined queries. A maximum retrieval iteration limit (typically 2-3 iterations) helps prevent unbounded loops. If the agent has not retrieved sufficient context within the budget, it proceeds with the best available context and communicates uncertainty.

For complex questions requiring multiple sources, query decomposition breaks the question into focused sub-queries and runs independent sub-queries concurrently. Per-task retrieval performance budgets, derived from the task's overall latency SLO, keep the iterative pattern inside the workload's target.

Implementation steps

Implement distinct retrieval tools for different knowledge access patterns: Register a semantic search tool, a structured-query tool, a real-time data tool, a web search tool, and a document processing tool through Amazon Bedrock AgentCore Gateway with clear descriptions that guide the agent's tool selection.
Implement retrieval sufficiency evaluation as a lightweight post-retrieval assessment: Use a small, fast model to judge whether retrieved context is sufficient, identify gaps, and formulate refined queries for the next iteration.
Configure maximum retrieval iteration limits with graceful fallback to best-available context: Cap iterations at 2-3 for most tasks, and when the budget is exhausted proceed with the best context obtained and communicate uncertainty rather than looping without bounds.
Implement query decomposition for complex questions, running independent sub-queries concurrently: Break multi-source questions into focused sub-queries and fan them out in parallel so sub-query latency doesn't accumulate serially.
Define per-task retrieval performance budgets based on the overall latency SLO: Allocate an explicit portion of the task's latency SLO to retrieval so the iterative pattern can't silently consume the budget reserved for inference or downstream tool calls.

Resources

Related best practices:

Related documents:

Related examples:

GitHub: Advanced RAG using Bedrock and SageMaker AI

Related services:

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

AGENTPERF03-BP04 Establish efficient agent caching and data access patterns

Communication and protocol efficiency