AGENTREL06-BP02 Establish fallback mechanisms for legacy system degradation
Legacy systems carry lower availability SLAs than cloud-based services, and their failure modes are often unpredictable. Health-aware fallback paths, caches for reference data, queues for transactions, graceful degradation for real-time, keep agent workflows running through legacy outages.
Desired outcome:
-
You have health monitoring on every legacy integration with alarms that trigger fallback activation.
-
You have cache-based fallbacks for reference data and queue-based fallbacks for transactional operations.
-
You recover automatically when legacy systems return, with periodic probes deactivating the cutoff.
Common anti-patterns:
-
Assuming legacy systems match cloud-based reliability, without implementing fallbacks for their actual SLAs.
-
Deploying fallbacks that silently return stale or incorrect data without informing agents or users.
-
Skipping legacy health monitoring, so outages become visible only when agent tasks fail.
Benefits of establishing this best practice:
-
Agent functionality stays available during legacy outages through pre-defined fallback paths.
-
Proactive fallback activation through health monitoring replaces reactive failure detection.
-
Users and downstream systems see transparent indications of degraded capability.
Level of risk exposed if this best practice is not established: High
Implementation guidance
Health monitoring is the prerequisite for any automatic fallback. Without a health signal, the only way to know the legacy system is down is to watch user-visible failures accumulate. Periodic probes through Amazon Bedrock AgentCore Observability and Amazon CloudWatch check endpoint availability and response time. Alarms on those probes trigger fallback activation before the first user-facing failure happens.
Fallback shape depends on the operation type. For reference data, product catalogs, configuration values, mostly-static lookups, cache-based fallbacks serve recently retrieved data during outages. The accuracy cost is low because the data doesn't change often. For transactional operations, queue-based fallbacks buffer requests for replay when the system recovers, preserving the intent of each operation without attempting it against an unreachable system. For real-time data that can't be cached or queued, live prices, current inventory, instantaneous system state, graceful degradation means informing the user the information is temporarily unavailable rather than returning a plausible-but-wrong answer.
Automatic cutoffs need to unwind themselves, or every outage turns into a permanent downgrade. Use CloudWatch alarms to detect when error rates cross a threshold and trigger automated responses, updating Amazon Bedrock AgentCore Policy Cedar policies to deny tool access or activating Lambda-based circuit breaker logic. Configure periodic probes that test availability and re-enable access when the system recovers. Monitor fallback activation frequency through AgentCore Observability to identify legacy systems causing disproportionate reliability issues. Those systems are the candidates for modernization investment.
Implementation steps
-
Implement health monitoring for each legacy integration: Use Amazon Bedrock AgentCore Observability and Amazon CloudWatch probes to check endpoint availability and response time.
-
Configure alarms that trigger fallback activation: Alarm on health degradation so fallbacks activate before user-visible failures accumulate.
-
Implement operation-appropriate fallbacks: Cache-based fallbacks for reference data, queue-based fallbacks for transactional operations, and graceful degradation messages for real-time data.
-
Deploy automatic cutoffs with recovery detection: Use CloudWatch alarms to trigger Amazon Bedrock AgentCore Policy updates or circuit breaker logic, and run periodic probes to re-enable access when the system recovers.
-
Monitor fallback activation frequency: Use AgentCore Observability to identify legacy systems that consistently degrade, so modernization effort can be prioritized.
Resources
Related best practices:
Related documents:
Related services: