

# AGENTSEC05-BP02 Implement distributed tracing for agent interactions
<a name="agentsec05-bp02"></a>

 A request that hops across agents, queues, and event buses is only investigable if a single identifier follows it end-to-end. Tracing with both a trace ID for instrumented services and an application-level correlation ID for asynchronous boundaries makes cross-agent incidents reconstructable. 

 **Desired outcome:** 
+  You trace every request that flows through a multi-agent system end-to-end with a single correlation identifier, so security teams can reconstruct the complete chain of agent interactions for traced operations. 
+  Service maps give real-time visibility into agent dependencies and communication patterns. 

 **Common anti-patterns:** 
+  Generating new trace IDs at each agent boundary rather than propagating the original, breaking the correlation chain and making it impossible to link related actions. 
+  Tracing only synchronous agent interactions and omitting asynchronous operations (Amazon SQS messages, Amazon EventBridge events), creating gaps that obscure the full execution path. 
+  Not instrumenting tool invocations within agent traces, losing visibility into which external services were called and what data was exchanged during execution. 

 **Benefits of establishing this best practice:** 
+  Correlated traces across agent boundaries make every agent execution reconstructable after the fact. 
+  Service maps and trace analysis surface unexpected communication patterns, such as agents interacting with services outside their normal dependency graph. 

 **Level of risk exposed if this best practice is not established:** Medium 

## Implementation guidance
<a name="implementation-guidance"></a>

 Two identifiers do the work together. A *trace ID* is generated by the tracing system (Amazon CloudWatch Application Signals, AWS X-Ray, OpenTelemetry) and follows a request through instrumented services. It is the identifier the tracing backend uses to reconstruct spans into a trace tree. A *correlation ID* is generated by the application and propagated end-to-end, and it survives boundaries where trace context is re-generated (most commonly asynchronous messaging channels, where the consumer frequently starts a new trace). Trace IDs give you the automatic correlation where instrumentation is continuous. Correlation IDs give you reliable linkage across the boundaries that instrumentation can't traverse. 

 [Amazon Bedrock AgentCore Observability](https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/observability-telemetry.html) provides built-in tracing for agents deployed on AgentCore Runtime. Sessions represent complete user interactions, traces represent individual request-response cycles, and spans represent specific operations within a trace. AgentCore outputs span data for memory resources by default, and session-level metrics are viewable on the Amazon CloudWatch generative AI observability page. The built-in instrumentation captures the agent execution loop and propagates trace context across agent boundaries without custom code. 

 For deeper visibility or for agents not running on AgentCore Runtime, instrument agent code with AWS Distro for OpenTelemetry (ADOT) to generate traces compatible with AWS X-Ray and third-party observability platforms. Create spans for each significant operation within an agent (model invocations, tool calls, memory reads and writes, inter-agent communications) and configure X-Ray sampling rules to capture 100% of traces for security-critical operations while using statistical sampling for high-volume routine operations. 

 Correlation ID propagation is the primary concern in all agent-to-agent communications. Include both the correlation ID and the current trace ID in inter-agent messages, API calls, and event payloads, so the full execution chain can be reconstructed from any point. For asynchronous operations through Amazon SQS or Amazon EventBridge, propagate both IDs through message attributes. The correlation ID preserves end-to-end linkage even when the tracing system starts a new trace on the consumer side. 

 The Amazon CloudWatch generative AI observability page provides agent-specific session and trace metrics for AgentCore Runtime. For cross-service visualization that covers non-agent components in the same request path (databases, queues, downstream services), Amazon CloudWatch ServiceLens renders a service map of agent interactions and surfaces unexpected communication patterns. Amazon CloudWatch Logs Insights queries identify traces with unusual patterns: agents calling unexpected services, traces with abnormally high tool invocation counts, or traces that span unexpected geographic regions. 

### Implementation steps
<a name="implementation-steps"></a>

1.  **Verify built-in AgentCore tracing:** For agents on AgentCore Runtime, confirm [Amazon Bedrock AgentCore Observability](https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/observability-telemetry.html) is capturing session, trace, and span data and review the default metrics on the Amazon CloudWatch generative AI observability page. 

1.  **Instrument custom agents with ADOT:** For custom agents or deeper visibility, instrument agent code with AWS Distro for OpenTelemetry to generate spans for model invocations, tool calls, memory operations, and inter-agent communications. 

1.  **Propagate correlation IDs through async boundaries:** Include the correlation ID and current trace ID in all inter-agent messages, API calls, and event payloads, and propagate them through Amazon SQS message attributes and Amazon EventBridge event detail for asynchronous operations. 

1.  **Configure X-Ray sampling by risk:** Capture 100% of traces for security-critical operations and statistical sampling for routine operations through AWS X-Ray sampling rules. 

1.  **Visualize service maps and detect anomalies:** Use Amazon CloudWatch ServiceLens to visualize agent service maps and build Amazon CloudWatch Logs Insights queries to detect anomalous trace patterns. 

1.  **Set trace retention:** Configure trace retention policies that match your incident investigation and compliance requirements. 

## Resources
<a name="resources"></a>

 **Related best practices:** 
+  [AGENTSEC05-BP01 Implement comprehensive logging and decision artifact storage](agentsec05-bp01.html) 
+  [AGENTSEC06-BP04 Monitor and detect coordination anomalies](agentsec06-bp04.html) 
+  [AGENTREL07-BP03 Implement distributed tracing to track system dependencies and facilitate recovery](agentrel07-bp03.html) 
+  [AGENTPERF01-BP03 Profile end-to-end agent latency and identify optimization targets](agentperf01-bp03.html) 

 **Related documents:** 
+  [AgentCore Observability: Sessions, traces, and spans](https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/observability-telemetry.html) 
+  [Build trustworthy AI agents with Amazon Bedrock AgentCore Observability](https://aws.amazon.com/blogs/machine-learning/build-trustworthy-ai-agents-with-amazon-bedrock-agentcore-observability/) 
+  [AWS X-Ray documentation](https://docs.aws.amazon.com/xray/latest/devguide/aws-xray.html) 
+  [AWS Distro for OpenTelemetry](https://aws-otel.github.io/) 
+  [Amazon CloudWatch ServiceLens](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/ServiceLens.html) 

 **Related services:** 
+  [Amazon Bedrock AgentCore](https://aws.amazon.com/bedrock/agentcore/) 
+  [AWS X-Ray](https://aws.amazon.com/xray/) 
+  [Amazon CloudWatch](https://aws.amazon.com/cloudwatch/) 