# Architecture details
<a name="architecture-details"></a>

This section describes the components and AWS services that make up this solution and the architecture details on how these components work together.

## AWS services in this solution
<a name="aws-services-in-this-solution"></a>


| AWS service | Description | 
| --- | --- | 
|   [Amazon API Gateway](https://aws.amazon.com/api-gateway/)   |   **Core**. This service provides the REST APIs for the Deployment dashboard and the WebSocket API for the use case.  | 
|   [AWS CloudFormation](https://aws.amazon.com/cloudformation/)   |   **Core**. This solution is distributed as a CloudFormation template, and CloudFormation deploys the AWS resources for the solution.  | 
|   [Amazon CloudFront](https://aws.amazon.com/cloudfront/)   |   **Core**. CloudFront serves the web content hosted in Amazon S3.  | 
|   [Amazon Cognito](https://aws.amazon.com/cognito/)   |   **Core**. This service handles user management and authentication for the API.  | 
|   [Amazon DynamoDB](https://aws.amazon.com/dynamodb/)   |   **Core**. DynamoDB stores deployment information and configuration details for the Deployment dashboard. It stores chat history and conversation IDs in the Text use case to enable conversation history and query disambiguation.  | 
|   [AWS Lambda](https://aws.amazon.com/lambda/)   |   **Core**. The solution uses Lambda functions to: \$1 Back the REST and WebSocket API endpoints \$1 Handle the core logic of each use case orchestrator \$1 Implement custom resources during CloudFormation deployment  | 
|   [Amazon S3](https://aws.amazon.com/s3/)   |   **Core**. Amazon S3 hosts the static web content.  | 
|   [Amazon CloudWatch](https://aws.amazon.com/cloudwatch/)   |   **Supporting**. This solution publishes logs from solution resources to [CloudWatch Logs](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/WhatIsCloudWatchLogs.html), and publishes metrics to [CloudWatch metrics](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/working_with_metrics.html). The solution also creates a [CloudWatch dashboard](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Dashboards.html) to view this data.  | 
|   [AWS Systems Manager](https://aws.amazon.com/systems-manager/)   |   **Supporting**. Systems Manager provides application-level resource monitoring and visualization of resource operations and cost data. Also used to store configuration data in Parameter Store.  | 
|   [AWS WAF](https://aws.amazon.com/waf/)   |   **Supporting.** AWS WAF is deployed in front of the API Gateway deployment to protect it.  | 
|   [Amazon Bedrock](https://aws.amazon.com/bedrock/)   |   **Optional**. The solution leverages Amazon Bedrock to access foundation or customized models, Amazon Bedrock Agents, Amazon Bedrock Knowledge Bases. Amazon Bedrock is the recommended integration to keep your data from leaving the AWS network.  | 
|   [Amazon Bedrock AgentCore](https://aws.amazon.com/bedrock/agentcore/)   |   **Optional** The solution leveraages Amazon Bedrock AgentCore to run and support MCP Server connections as well as Agent Builder and Workflow Use Cases.  | 
|   [Amazon Elastic Container Registry (Amazon ECR)](https://aws.amazon.com/ecr/)   |   **Optional**. For Agent Builder deployments, ECR stores and distributes agent container images. The solution uses ECR Pull-Through Cache to automatically retrieve pre-built agent images from the GAAB team’s public ECR repository.  | 
|   [AWS Distro for OpenTelemetry (ADOT)](https://aws.amazon.com/otel/)   |   **Optional**. For Agent Builder deployments, ADOT provides automatic instrumentation for agent observability, enabling distributed tracing and structured logging for agent operations.  | 
|   [Amazon Kendra](https://aws.amazon.com/kendra/)   |   **Optional**. In the Text use case, admin users can optionally decide to connect an Amazon Kendra index to use as a knowledge base for the conversation with the LLM. This can be used to inject new information into the LLM giving it the ability to use that information in its responses.  | 
|   [Amazon SageMaker AI](https://aws.amazon.com/sagemaker/)   |   **Optional**. The solution can integrate with an Amazon SageMaker AI inference endpoint to access FMs that are hosted within your AWS account and Region and is a preferred integration to keep your data from leaving the AWS network.  You must deploy the solution in the same Region where the inference endpoint is available.   | 
|   [Amazon Virtual Private Cloud](https://aws.amazon.com/vpc/)   |   **Optional**. The solution provides the option to deploy components with a VPC-enabled configuration. While deploying the solution with a VPC-enabled configuration, you have the option to let the solution create a VPC for you, or use an existing VPC that exists in the same account and Region where the solution will be deployed (Bring Your Own VPC). If the solution creates the VPC, it creates the necessary network components that includes, subnets, security groups and its rules, route tables, network ACLs, NAT Gateways, Internet Gateways, VPC endpoints, and its policies.  | 

# Deployment dashboard
<a name="deployment-dashboard-1"></a>

## API Gateway custom authorizers
<a name="api-gateway-custom-authorizers"></a>

Beneath the surface, Lambda custom authorizers for API Gateway are used for all API calls (both RESTful and WebSocket based) to validate if a given user has permission to perform an action based on the group(s) they belong to. This custom authorizer is backed by a DynamoDB table containing the policies for each group. On invocation of an API, API Gateway invokes the custom authorizer Lambda function, which decodes the provided Amazon Cognito access token to determine which user groups the user belongs to. The policy table is then queried by group name to return the relevant policy for that group.

On every new use case deployment, the admin policy is updated to store a new statement allowing the **execute-api:Invoke** action on that use case’s API. When use cases are deleted, the corresponding statement is removed from the policy.

For the groups created for an individual use case, only a single statement is present in the policy, allowing the **execute-api:Invoke** action on only that use case’s API.

Due to this structure, any user belonging to a use case’s group can access that use case’s API. A single user can also be manually added to multiple groups to allow that user to use multiple use cases.

**Warning**  
You can also manually edit the policies for a given group in the policy table if you want to grant access to a new use case to an existing group of users. The use case group is deleted when the use case is deleted (even if you have made manual edits), so proceed with caution when deleting a use case.

In the case where a use case stack is deployed standalone (without the use of the Deployment dashboard), an [Amazon Cognito user pool](https://docs.aws.amazon.com/cognito/latest/developerguide/cognito-user-identity-pools.html) is created for that deployment containing a single user with access to that use case’s API. This user pool belongs only to this use case and is not shared across other standalone deployments.

# Text use case
<a name="chat-use-case-1"></a>

## Streaming support
<a name="streaming-support"></a>

In a chat application, latency is an important metric to enable a responsive user experience. The potential for LLM inferences to take from seconds to minutes, provides challenges in how to best serve content to customers. For this reason, several LLM providers allow streaming responses back to the caller. Instead of waiting for the entire inference to complete before returning a response, each token can be returned when it is available.

To support the use of this feature, the Text use case has been designed to use a WebSocket API to back the chat experience. This WebSocket is deployed through API Gateway. The use of a WebSocket API enables a connection to be created at the beginning of a chat session and for responses to be streamed through that socket. This allows frontend applications to provide a better user experience.

**Note**  
Even if a model provides streaming support, this does not necessarily mean that the solution will be able to stream responses back through the WebSocket API. There is a need for the solution to enable custom logic to support streaming for each model provider. If streaming is available, admin users will be able to enable/disable this feature at deployment time.

# How the Generative AI Application Builder on AWS solution works
<a name="how-the-generative-ai-application-builder-on-aws-solution-works"></a>

The admin user primarily interfaces with the Deployment dashboard to view, create, and manage new and existing use case deployments. Through this dashboard, the admin user has access to the following actions:
+ View list of deployments
+ Create new deployments
+ Edit existing deployments
+ Clone a deployment’s configuration to create a new deployment
+ Delete a deployment (deprovision the resources through a CloudFormation delete)
+ Permanently delete the configuration details of a deployment

 **Depicts Use case diagram for the admin user of the Deployment dashboard** 

![\[image4\]](http://docs.aws.amazon.com/solutions/latest/generative-ai-application-builder-on-aws/images/image4.png)


**Note**  
The admin user might not have direct access to the AWS console. In that case, the admin user must work with the DevOps user to support actions such as ingesting data into a Kendra knowledge base.

For the Text use case, the business user gets access to a user interface enabling them to chat with the LLM. The specifics of this configuration are controlled by the deployment settings configured by the admin user. In the Text use case, the business user has access to the following actions:
+ Send messages through the chat interface
+ View conversation history
+ Clear the conversation history
+ View prompt
+ Edit prompt

 **Depicts Use case diagram for the business user of the Text use case** 

![\[image5\]](http://docs.aws.amazon.com/solutions/latest/generative-ai-application-builder-on-aws/images/image5.png)


With the Bedrock Agent use case, the business user can access a UI for chatting with the configured Amazon Bedrock Agent. The admin user can configure these speciﬁcs in the deployment settings. In the Bedrock Agent use case, the business user has access to the following actions:
+ Send messages through the chat interface
+ View conversation history
+ Clear the conversation history

 **Depicts Use case diagram for the business user of the Bedrock Agent use case** 

![\[agent use case user diagram\]](http://docs.aws.amazon.com/solutions/latest/generative-ai-application-builder-on-aws/images/agent-use-case-user-diagram.png)


# Agent Builder
<a name="agent-builder-1"></a>

The Agent Builder provides a platform for creating, deploying, and managing production-ready AI agents on Amazon Bedrock AgentCore. This section describes the technical components and implementation details.

## AgentCore integration
<a name="agentcore-integration"></a>

Agent Builder uses a configuration-based deployment approach with pre-built agent images to enable fast, secure, and scalable agent deployments.

 **Pre-built agent images** 

Agent container images are built by the GAAB team during the CI/CD pipeline and published to a public ECR repository. Each image version is tied to the GAAB solution version (for example, v4.0.0 → gaab-strands-agent:v4.0.0). Images are based on the Strands SDK and include:
+ Agent runtime environment
+ MCP client integration
+ Memory management capabilities
+ OpenTelemetry instrumentation

 **ECR Pull-Through Cache** 

The solution uses ECR Pull-Through Cache to automatically distribute agent images from the public ECR repository to the customer’s private ECR. This AWS-managed service:
+ Caches images on first pull (2-5 minute delay)
+ Eliminates custom image copying logic
+ Provides local image availability for subsequent deployments
+ Creates unique cache rules per deployment to avoid conflicts

 **Configuration storage** 

Agent configurations are stored in DynamoDB alongside existing use case configurations. Each configuration includes:
+ System prompt template
+ Model provider and model ID
+ Model parameters (temperature, max\$1tokens)
+ MCP server references and endpoints
+ Memory settings (long-term memory toggle)
+ Deployment metadata

 **Image version registry** 

A DynamoDB table tracks available agent image versions and their cache URIs, enabling version management and backward compatibility.

## Agent configuration
<a name="agent-configuration"></a>

 **System prompts** 

System prompts define agent behavior, personality, and capabilities. Admin users can:
+ Edit the default template through the Agent Builder UI
+ Include instructions for tool usage and response formatting
+ Reset to default template at any time

 **Model selection** 

Agent Builder supports Amazon Bedrock models in v4.0.0:
+ Model provider: Amazon Bedrock (only option in v4.0.0)
+ Model selection: Claude, Nova, and other Bedrock models
+ Model parameters: Temperature, max\$1tokens, top\$1p, and model-specific settings

 **MCP server integration** 

Model Context Protocol servers provide agents with access to enterprise tools and data:
+ Server discovery through GET /mcp API endpoint
+ Dynamic configuration without code changes
+ Authentication and endpoint management
+ Tool capability exposure to agents

## Streaming and processing
<a name="streaming-and-processing"></a>

 **Real-time streaming** 

Agent Builder uses Server-Sent Events (SSE) from AgentCore bridged to WebSocket for real-time response streaming:
+ Lambda function establishes SSE connection to AgentCore Runtime
+ Streams are bridged to API Gateway WebSocket
+ Enables token-by-token response delivery to clients
+ Maintains connection for long-running requests

 **Processing constraints** 

Agent processing in v4.0.0 is limited to Lambda execution timeout:
+ Maximum processing time: 15 minutes
+ Synchronous processing model
+ Suitable for conversational agents and moderate workflows
+ Extended async support planned for v4.1\$1

## Memory management
<a name="memory-management"></a>

 **Short-term memory** 

Enabled by default for all agents using a custom MemoryHookProvider:
+ Captures conversation events through Strands callback handlers
+ Organizes by actorId and sessionId for context isolation
+ Maintains conversation context within sessions
+ Automatic integration with AgentCore Memory

 **Long-term memory** 

Optional feature using AgentCore Memory Tool from strands\$1tools:
+ Simple toggle in Agent Builder UI
+ Semantic memory strategy with default settings
+ Agent-controlled access through natural tool invocation
+ Stores extracted insights across sessions
+ Uses conversationId as sessionId

## Observability
<a name="observability"></a>

 **AWS OpenTelemetry Distro (ADOT)** 

Agents are automatically instrumented during container build:
+ Automatic trace generation for agent operations
+ Distributed tracing across service boundaries
+ Structured logging with correlation IDs
+ Integration with CloudWatch Transaction Search

 **Authentication flow** 

Users authenticate through Amazon Cognito with JWT tokens validated by custom Lambda authorizers that retrieve IAM policies from DynamoDB based on user groups.

## Workflow Builder
<a name="workflow-builder-1"></a>

Workflow Builder enables multi-agent orchestration by creating a supervisor agent that coordinates multiple Agent Builder agents using the Agents as Tools delegation pattern.

### Workflow architecture
<a name="workflow-architecture"></a>

 **Key components** 
+  **Supervisor Agent**: Entrypoint agent that receives user requests and delegates to specialized agents
+  **Specialized Agents**: Agent Builder use cases registered as tools for the supervisor
+  **Agent Registry**: DynamoDB table storing agent configurations and metadata
+  **Orchestration Layer**: Strands SDK implementation of Agents as Tools pattern

### Agent instantiation
<a name="agent-instantiation"></a>

 **Local agent creation** 

All specialized agents are instantiated locally within the same AgentCore Runtime:

1. Retrieves agent configurations from DynamoDB

1. Creates local instances of each Agent Builder agent

1. Each agent maintains its own MCP server connections

1. Supervisor agent registers specialized agents as tools

1. Strands SDK manages agent selection and delegation