AGENTCOST04-BP03 Implement intelligent caching and failure handling for tool results
Tool costs can be unpredictable when agents repeat identical or equivalent calls, and they can spike sharply when retries run unbounded through a service outage. Two-layer caching, schema validation, and automatic cutoffs convert those failure modes into predictable, bounded costs.
Desired outcome:
-
You have session-scoped and cross-session semantic caches reducing redundant tool invocations.
-
You validate tool inputs against JSON Schema before invocation to help prevent wasted calls on malformed requests.
-
You have automatic cutoffs that halt retries when failure rates exceed thresholds, with automatic fallback to alternative tools.
-
You track cache hit rates and retry costs as distinct metrics.
Common anti-patterns:
-
Not caching frequently used tool results, making repeated identical calls within the same session that waste compute and external API costs.
-
Using only exact-match caching when agents phrase the same request differently, missing cache hits for semantically identical calls.
-
Retrying failed tool invocations indefinitely without automatic cutoffs, multiplying cost during service degradation without resolving the underlying issue.
-
Not validating tool input schemas before invocation, allowing malformed calls to waste invocation cost without producing usable results.
Benefits of establishing this best practice:
-
Two-layer caching reduces redundant tool invocations and external API charges.
-
Automatic cutoffs halt retries when failure rates exceed thresholds, helping prevent expensive retry storms.
-
Event-driven cache invalidation supports aggressive caching of volatile data by purging stale results promptly when source data changes.
Level of risk exposed if this best practice is not established: High
Implementation guidance
Tool caching has to work at two scopes to cover both obvious and
non-obvious repetition. The session-scoped layer works through
Amazon
Bedrock AgentCore Runtime and catches duplicate calls
within a single agent session, which is a common failure mode when
agents revisit a reasoning branch. The cross-session layer uses
Amazon OpenSearch Service Serverless
Schema validation can help prevent waste. Agents sometimes generate tool calls with incorrect parameter types, missing required fields, or invalid enum values, and those calls pay tool-serving and external API costs for a response that can't be used. JSON schema validation in the action group Lambda function rejects malformed requests before they reach external APIs and returns a validation error to the agent for correction.
Cache invalidation can help make aggressive caching safer. Event-driven invalidation listens for source-data changes and purges affected cache entries immediately, so volatile data can still be cached without returning stale results. Without event-driven invalidation, teams end up choosing between aggressive TTLs (stale results) or short TTLs (low hit rates), and both options leave cost on the table.
For failure handling, Amazon Bedrock AgentCore Policy Cedar policies enforce automatic cutoffs when failure rates exceed thresholds, halting retry storms during service degradation. Automatic fallback to alternative tools maintains agent functionality during outages, and retry budgets per reasoning session cap total retry attempts using exponential backoff with jitter. Cache and retry telemetry is exposed through Amazon Bedrock AgentCore Observability and Amazon CloudWatch: hit rates per layer, cutoff state transitions, and retry cost as a percentage of total tool cost. For caching that extends beyond tool results into model invocations, see AGENTCOST02-BP03 Use intelligent caching to reduce redundant model invocations.
Implementation steps
-
Deploy two-layer caching: Implement a session-scoped in-process cache on Amazon Bedrock AgentCore Runtime and an Amazon OpenSearch Service Serverless
semantic cache for cross-session reuse, with TTLs calibrated per tool (short for volatile data, long for static reference data). -
Deploy semantic caching: Generate parameter embeddings and query OpenSearch Serverless for similar prior calls above a cosine similarity threshold before invoking the tool.
-
Validate tool inputs: Implement JSON Schema validation in action group Lambda functions to reject malformed requests before they reach external APIs, returning validation errors for the agent to correct.
-
Enforce cutoffs and fallback tools: Configure Amazon Bedrock AgentCore Policy Cedar policies for automatic cutoffs, wire automatic fallback to alternative tools when cutoffs activate, and set retry budgets per reasoning session.
-
Monitor cache and retry metrics: Create Amazon CloudWatch metrics for cache hit rates, cutoff transitions, and retry costs using Amazon Bedrock AgentCore Observability, with alarms for degraded performance.
Resources
Related best practices:
-
AGENTCOST02-BP03 Use intelligent caching to reduce redundant model invocations
-
AGENTCOST04-BP01 Design cost effective tool selection to minimize unnecessary invocations
-
AGENTCOST04-BP02 Cost optimize tool serving through serverless and resource sharing
-
AGENTCOST05-BP01 Establish agent-level reasoning cost tracking and attribution
Related documents:
Related videos:
Related examples:
Related services: