AGENTCOST06-BP02 Cost optimize versioning and deployment through efficient artifact management
Agent runtimes that create an immutable version on every configuration change accumulate hundreds of versions during normal development, each holding references to container images that can't be managed easily. Layered base images, endpoint-based traffic routing, and automated cleanup policies keep deployment cost proportional to real usage rather than to the number of past configurations.
Desired outcome:
-
You have container layer deduplication storing shared dependencies once across agent versions.
-
You use endpoint-based traffic routing for blue/green and canary deployments without duplicate running infrastructure.
-
You have automated lifecycle policies that delete unused versions, helping prevent indefinite storage accumulation.
-
You monitor version inventory and catch unused versions before they become a material cost.
Common anti-patterns:
-
Retaining all agent versions indefinitely without lifecycle policies, accumulating storage costs for versions that receive zero invocations.
-
Creating separate container images without sharing common base layers, multiplying storage costs across agent versions.
-
Running full parallel environments for blue/green deployments instead of routing only test traffic percentages.
-
Allowing unused agent versions to accumulate without monitoring, reducing the risk of automated cleanup and steadily increasing storage overhead without visibility into version invocation patterns.
Benefits of establishing this best practice:
-
Container layer deduplication stores shared dependencies once, reducing storage costs proportionally to version reuse. Verify deduplication by comparing total repository storage in ECR (reported in CloudWatch metrics under RepositorySize) against the sum of individual image manifest sizes. Effective deduplication shows repository storage significantly smaller than the sum of manifests.
-
Endpoint-based routing enables deployment transitions without duplicate running infrastructure.
-
Automated cleanup deletes unused versions, helping prevent indefinite storage cost growth.
Level of risk exposed if this best practice is not established: Medium
Implementation guidance
Amazon
Bedrock AgentCore Runtime creates an immutable version on
every configuration update. The version itself is lightweight
metadata (container image reference, protocol settings, and
network configuration), but each version holds a reference to
container images that
Amazon ECR
The first mitigation is layer structure. Build agent containers with a common base layer containing shared dependencies (runtime, SDK, common tools) and agent-specific layers on top. ECR automatically deduplicates identical layers across images, so agents that share most dependencies share most storage. The second mitigation is traffic routing. The AgentCore Runtime endpoint system supports blue/green and canary deployments without parallel infrastructure: create a production endpoint pointing to the stable version and use weighted routing to send a small traffic percentage to new versions during validation. Amazon Bedrock AgentCore Observability provides the per-version metrics (error rates, latency percentiles, cost-per-task-completion) that drive the promotion decision.
Traditional canary promotion criteria look at error rates and latency, but agent versions can differ significantly on reasoning cost (a new prompt might produce correct answers that take 30% more tokens). Including cost-per-task-completion in the promotion criteria helps prevent a cost regression from slipping into production behind good quality metrics.
Define a maximum version retention (a reasonable starting point is the last five versions plus any version currently serving traffic through an endpoint) and configure Amazon ECR lifecycle policies to delete untagged images and images older than 90 days not referenced by active versions. Automated deletion of versions beyond the retention limit, gated on verification that no endpoints reference the version and it has had zero invocations during the retention window, keeps ECR from turning into a graveyard of development iterations.
Implementation steps
-
Structure containers for layer sharing: Build agent containers with common base layers and agent-specific layers so Amazon ECR
layer deduplication is effective. -
Use endpoint-based traffic routing: Deploy agents to Amazon Bedrock AgentCore Runtime and configure custom endpoints for production traffic with weighted routing for blue/green and canary deployments.
-
Include cost in promotion criteria: Monitor deployment quality using Amazon Bedrock AgentCore Observability metrics before updating production endpoints, including cost-per-task-completion alongside error rates and latency.
-
Set version retention policies: Define a retention policy such as the last five versions plus active traffic, and configure Amazon ECR lifecycle policies to delete unused images automatically.
-
Monitor version inventory weekly: Deploy a weekly version inventory function that queries AgentCore Runtime APIs for all agent versions, identifies versions with zero invocations through Amazon CloudWatch metrics, and stores the usage metadata for historical analysis before the ECR lifecycle policy deletes the images.
Resources
Related best practices:
Related documents:
Related videos:
Related examples:
Related services: