View a markdown version of this page

AGENTCOST04-BP03 Implement intelligent caching and failure handling for tool results - Agentic AI Lens

AGENTCOST04-BP03 Implement intelligent caching and failure handling for tool results

Tool costs can be unpredictable when agents repeat identical or equivalent calls, and they can spike sharply when retries run unbounded through a service outage. Two-layer caching, schema validation, and automatic cutoffs convert those failure modes into predictable, bounded costs.

Desired outcome:

  • You have session-scoped and cross-session semantic caches reducing redundant tool invocations.

  • You validate tool inputs against JSON Schema before invocation to help prevent wasted calls on malformed requests.

  • You have automatic cutoffs that halt retries when failure rates exceed thresholds, with automatic fallback to alternative tools.

  • You track cache hit rates and retry costs as distinct metrics.

Common anti-patterns:

  • Not caching frequently used tool results, making repeated identical calls within the same session that waste compute and external API costs.

  • Using only exact-match caching when agents phrase the same request differently, missing cache hits for semantically identical calls.

  • Retrying failed tool invocations indefinitely without automatic cutoffs, multiplying cost during service degradation without resolving the underlying issue.

  • Not validating tool input schemas before invocation, allowing malformed calls to waste invocation cost without producing usable results.

Benefits of establishing this best practice:

  • Two-layer caching reduces redundant tool invocations and external API charges.

  • Automatic cutoffs halt retries when failure rates exceed thresholds, helping prevent expensive retry storms.

  • Event-driven cache invalidation supports aggressive caching of volatile data by purging stale results promptly when source data changes.

Level of risk exposed if this best practice is not established: High

Implementation guidance

Tool caching has to work at two scopes to cover both obvious and non-obvious repetition. The session-scoped layer works through Amazon Bedrock AgentCore Runtime and catches duplicate calls within a single agent session, which is a common failure mode when agents revisit a reasoning branch. The cross-session layer uses Amazon OpenSearch Service Serverless for semantic caching: generate embeddings of tool parameters and query for similar prior calls above a cosine similarity threshold before invoking the tool. Each cache entry's TTL should be calibrated to the underlying data's volatility. For example, a weather API's freshness requirement is minutes, while a static reference knowledge base tolerates hours or days.

Schema validation can help prevent waste. Agents sometimes generate tool calls with incorrect parameter types, missing required fields, or invalid enum values, and those calls pay tool-serving and external API costs for a response that can't be used. JSON schema validation in the action group Lambda function rejects malformed requests before they reach external APIs and returns a validation error to the agent for correction.

Cache invalidation can help make aggressive caching safer. Event-driven invalidation listens for source-data changes and purges affected cache entries immediately, so volatile data can still be cached without returning stale results. Without event-driven invalidation, teams end up choosing between aggressive TTLs (stale results) or short TTLs (low hit rates), and both options leave cost on the table.

For failure handling, Amazon Bedrock AgentCore Policy Cedar policies enforce automatic cutoffs when failure rates exceed thresholds, halting retry storms during service degradation. Automatic fallback to alternative tools maintains agent functionality during outages, and retry budgets per reasoning session cap total retry attempts using exponential backoff with jitter. Cache and retry telemetry is exposed through Amazon Bedrock AgentCore Observability and Amazon CloudWatch: hit rates per layer, cutoff state transitions, and retry cost as a percentage of total tool cost. For caching that extends beyond tool results into model invocations, see AGENTCOST02-BP03 Use intelligent caching to reduce redundant model invocations.

Implementation steps

  1. Deploy two-layer caching: Implement a session-scoped in-process cache on Amazon Bedrock AgentCore Runtime and an Amazon OpenSearch Service Serverless semantic cache for cross-session reuse, with TTLs calibrated per tool (short for volatile data, long for static reference data).

  2. Deploy semantic caching: Generate parameter embeddings and query OpenSearch Serverless for similar prior calls above a cosine similarity threshold before invoking the tool.

  3. Validate tool inputs: Implement JSON Schema validation in action group Lambda functions to reject malformed requests before they reach external APIs, returning validation errors for the agent to correct.

  4. Enforce cutoffs and fallback tools: Configure Amazon Bedrock AgentCore Policy Cedar policies for automatic cutoffs, wire automatic fallback to alternative tools when cutoffs activate, and set retry budgets per reasoning session.

  5. Monitor cache and retry metrics: Create Amazon CloudWatch metrics for cache hit rates, cutoff transitions, and retry costs using Amazon Bedrock AgentCore Observability, with alarms for degraded performance.

Resources

Related best practices:

Related documents:

Related videos:

Related examples:

Related services: