AGENTPERF07-BP01 Design efficient multitenant agent deployment models
Organizations that serve multiple tenants from a shared agent service get better resource efficiency and faster tenant onboarding when the deployment model delivers consistent performance for every tenant. Siloed deployments provide strong isolation at higher cost. Pooled deployments maximize efficiency but need mechanisms to help prevent noisy neighbor effects. Hybrid models combine both, using pooled resources for standard tenants and dedicated resources for premium tenants.
Desired outcome:
-
You have multitenant agent deployments that use deployment models matched to tenant requirements.
-
You have deployment models documented with clear performance characteristics and SLA commitments for each tier.
-
You have a deployment model that supports efficient tenant onboarding through configuration rather than infrastructure provisioning.
Common anti-patterns:
-
Deploying fully siloed infrastructure for every tenant regardless of their performance requirements, creating resource waste for tenants that would be well-served by pooled resources.
-
Using a single pooled deployment without isolation mechanisms, allowing high-volume tenants to consume disproportionate resources and degrade performance for others.
-
Skipping tenant tiers with different performance SLAs, treating all tenants identically regardless of business value.
Benefits of establishing this best practice:
-
Resource investment stays proportional to tenant value and performance requirements.
-
Appropriate isolation mechanisms deliver consistent performance for every tenant.
-
Configuration-driven provisioning speeds tenant onboarding.
Level of risk exposed if this best practice is not established: High
Implementation guidance
Define tenant tiers based on performance requirements and business value:
-
A standard tier using pooled resources with best-effort performance
-
A premium tier using dedicated resources with stronger SLA targets
-
(Optional) An enterprise tier with fully isolated infrastructure
For pooled deployments,
Amazon
Bedrock AgentCore Runtime provides session-isolated
execution that naturally helps prevent cross-tenant interference
at the agent execution level. For tenants that need custom
container environments, deploy on
Amazon EKS
Tenant-aware routing at the API layer uses
Amazon API Gateway
Implementation steps
-
Define tenant tiers with specific performance SLAs, isolation requirements, and pricing: Define standard, premium, and optional enterprise tiers with explicit SLAs and isolation expectations.
-
Design the deployment architecture for each tier: Architect pooled (shared AgentCore Runtime, on-demand Amazon Bedrock), premium (dedicated resources, provisioned throughput), and enterprise (fully isolated) deployments.
-
Implement tenant-aware routing using API Gateway with per-tenant usage plans and rate limits: Use Amazon API Gateway
usage plans with per-tenant rate and burst limits. -
Configure data isolation using DynamoDB tenant partition keys (pooled) or separate tables (siloed): Enforce data isolation at the data layer with partition keys or separate tables as appropriate for the tier.
-
Automate tenant onboarding for each tier using CDK/CloudFormation templates: Template onboarding so new tenants are provisioned through configuration rather than manual infrastructure work.
-
Monitor per-tenant performance metrics to validate SLA compliance: Publish per-tenant metrics to CloudWatch and alert when observed performance drifts outside SLA.
Resources
Related best practices:
Related documents:
Related videos:
Related services: