

# AWS infrastructure
<a name="aws-infrastructure"></a>

 This section presents a summary of the AWS global infrastructure and the fault isolation boundaries it provides. Additionally, this section will provide an overview of the concept of control planes and data planes, which are critical distinctions in how AWS designs its services. This information provides the baseline to understand how fault isolation boundaries and a service’s control plane and data plane apply to the AWS service types we discuss in the next section. 

**Topics**
+ [Availability Zones](availability-zones.md)
+ [Regions](regions.md)
+ [AWS Local Zones](aws-local-zones.md)
+ [AWS Outposts](aws-outposts.md)
+ [Points of presence](points-of-presence.md)
+ [Partitions](partitions.md)
+ [Control planes and data planes](control-planes-and-data-planes.md)
+ [Static stability](static-stability.md)
+ [Summary](summary.md)

# Availability Zones
<a name="availability-zones"></a>

 AWS operates over 100 Availability Zones within several Regions around the world (current numbers can be found here: [AWS Global Infrastructure](https://aws.amazon.com/about-aws/global-infrastructure/)). An Availability Zone is one or more discrete data centers with separate and redundant power infrastructure, networking, and connectivity in an AWS Region. Availability Zones in a Region are meaningfully distant from each other, up to 60 miles (\$1100 km) to prevent correlated failures, but close enough to use synchronous replication with single-digit millisecond latency. They are designed not to be simultaneously impacted by a shared fate scenario like utility power, water disruption, fiber isolation, earthquakes, fires, tornadoes, or floods. Common points of failure, like generators and cooling equipment, are not shared across Availability Zones and are designed to be supplied by different power substations. When AWS deploys updates to its services, deployments to Availability Zones in the same Region are separated in time to prevent correlated failure. 

 All Availability Zones in a Region are interconnected with high-bandwidth, low-latency networking, over fully redundant, dedicated metro fiber. Each Availability Zone in a Region connects to the internet through two transit centers where AWS peers with multiple [tier-1 internet providers](https://en.wikipedia.org/wiki/Tier_1_network) (for more information, refer to [Overview of Amazon Web Services](https://docs.aws.amazon.com/whitepapers/latest/aws-overview/introduction.html?did=wp_card&trk=wp_card)*)*. 

 These features provide strong isolation of Availability Zones from each other, which we refer to as Availability Zone Independence (AZI). The logical construct of Availability Zones and their connectivity to the internet is depicted in the following figure. 

![\[This image shows how Availability Zones consist of one or more physical data centers that are redundantly connected to each other and the internet\]](http://docs.aws.amazon.com/whitepapers/latest/aws-fault-isolation-boundaries/images/availability-zones.png)


# Regions
<a name="regions"></a>

 Each AWS Region consists of multiple independent and physically separate Availability Zones within a geographic area. All Regions currently have three or more Availability Zones. Regions themselves are isolated and independent from other Regions with a few exceptions noted later in this document [(refer to Global single-Region operations)](global-services.md#global-single-region-operations). This separation between Regions limits service failures, when they occur, to a single Region. Other Regions’ normal operations are unaffected in this case. Additionally, the resources and data that you create in one Region do not exist in any other Region unless you explicitly use a replication or copy feature offered by an AWS service or replicate the resource yourself. 

![\[This image illustrates current and planned AWS Regions as of December 2022.\]](http://docs.aws.amazon.com/whitepapers/latest/aws-fault-isolation-boundaries/images/current-and-planned-aws-regions.png)


# AWS Local Zones
<a name="aws-local-zones"></a>

 [AWS Local Zones](https://aws.amazon.com/about-aws/global-infrastructure/localzones/) are a type of infrastructure deployment that places compute, storage, database, and other [select AWS services](https://aws.amazon.com/about-aws/global-infrastructure/localzones/features/) close to large population and industry centers. You can use AWS services, like compute and storage services, in the Local Zone to run low-latency applications at the edge or simplify hybrid cloud migrations. Local Zones have local internet ingress and egress to reduce latency, but are also connected to their parent Region through Amazon’s redundant and high-bandwidth private network, giving applications running in AWS Local Zones fast, secure, and seamless access to the full range of services. 

# AWS Outposts
<a name="aws-outposts"></a>

 [AWS Outposts](https://aws.amazon.com/outposts/) is a family of fully managed solutions delivering AWS infrastructure and services to virtually any on-premises or edge location for a truly consistent hybrid experience. Outposts solutions allow you to extend and run native AWS services on-premises, and are available in a variety of form factors, from 1U and 2U Outposts servers to 42U Outposts racks, and multiple rack deployments. 

 With AWS Outposts, you can run [select AWS services](https://docs.aws.amazon.com/outposts/latest/userguide/what-is-outposts.html#services) locally and connect to a broad range of services available in the parent AWS Region. AWS Outposts are fully managed and configurable compute and storage racks built with AWS-designed hardware that allows customers to run compute and storage on-premises, while seamlessly connecting to AWS’s broad array of services in the cloud. 

# Points of presence
<a name="points-of-presence"></a>

 In addition to the AWS Regions and Availability Zones, AWS also operates a globally distributed point of presence (PoP) network. These PoPs host Amazon CloudFront, a content delivery network (CDN); Amazon Route 53, a public Domain Name System (DNS) resolution service; and AWS Global Accelerator (AGA), an edge networking optimization service. The global edge network currently consists of over 410 PoPs, including more than 400 Edge Locations, and 13 regional mid-tier caches in over 90 cities across 48 countries (current status can be found here: [Amazon CloudFront Key Features](https://aws.amazon.com/cloudfront/features/)). 

![\[This image shows the Amazon CloudFront global edge network\]](http://docs.aws.amazon.com/whitepapers/latest/aws-fault-isolation-boundaries/images/amazon-cloudfront.png)


 Each PoP is isolated from the others, which means a failure affecting a single PoP or metropolitan area does not impact the rest of the global network. The AWS network peers with thousands of Tier 1/2/3 telecom carriers globally, is well connected with all major access networks for optimal performance, and has hundreds of terabits of deployed capacity. Edge locations are connected to the AWS Regions through the AWS network backbone, a fully redundant, multiple 100GbE parallel fiber that circles the globe and links with tens of thousands of networks for improved origin fetches and dynamic content acceleration. 

# Partitions
<a name="partitions"></a>

 AWS groups Regions into [partitions](https://docs.aws.amazon.com/general/latest/gr/aws-arns-and-namespaces.html). Every Region is in exactly one partition, and each partition has one or more Regions. Partitions have independent instances of AWS Identity and Access Management (IAM) and provide a hard boundary between Regions in different partitions. AWS commercial Regions are in the `aws` partition, Regions in China are in the `aws-cn` partition, and AWS GovCloud Regions are in the `aws-us-gov` partition. Some AWS services are designed to provide cross-Region functionality, such as [Amazon S3 Cross-Region Replication](https://docs.aws.amazon.com/AmazonS3/latest/userguide/replication.html#crr-scenario) or [AWS Transit Gateway Inter-Region peering](https://docs.aws.amazon.com/vpc/latest/tgw/tgw-peering.html). These types of capabilities are only supported between Regions in the same partition. You cannot use IAM credentials from one partition to interact with resources in a different partition. 

# Control planes and data planes
<a name="control-planes-and-data-planes"></a>

 AWS separates most services into the concepts of *control plane* and *data plane*. These terms come from the world of networking, specifically routers. The router’s data plane, which is its main functionality, is moving packets around based on rules. But the routing policies have to be created and distributed from somewhere, and that’s where the control plane comes in. 

 Control planes provide the administrative APIs used to create, read/describe, update, delete, and list (CRUDL) resources. For example, the following are all control plane actions: launching a new [Amazon Elastic Compute Cloud](https://aws.amazon.com/ec2/) (Amazon EC2) instance, creating an [Amazon Simple Storage Service](https://aws.amazon.com/s3/) (Amazon S3) bucket, and describing an [Amazon Simple Queue Service](https://aws.amazon.com/sqs/) (Amazon SQS) queue. When you launch an EC2 instance, the control plane has to perform multiple tasks like finding a physical host with capacity, allocating the network interface(s), preparing an [Amazon Elastic Block Store](https://aws.amazon.com/ebs/) (Amazon EBS) volume, generating IAM credentials, adding the Security Group rules, and more. Control planes tend to be complicated orchestration and aggregation systems. 

 The data plane is what provides the primary function of the service. For example, the following are all parts of the data plane for each of the services involved: the running EC2 instance itself, reading and writing to an EBS volume, getting and putting objects in an S3 bucket, and Route 53 answering DNS queries and performing health checks. 

 Data planes are intentionally less complicated, with fewer moving parts compared to control planes, which usually implement a complex system of workflows, business logic, and databases. This makes failure events statistically less likely to occur in the data plane versus the control plane. While both the data and control plane contribute to the overall operation and success of the service, AWS considers them to be distinct components. This separation has both performance and availability benefits. 

# Static stability
<a name="static-stability"></a>

 One of the most important resilience characteristics of AWS services is what AWS calls static stability. What this term means is that systems operate in a static state and continue to operate as normal without the need to make changes during the failure or unavailability of dependencies. One way we do this is by preventing circular dependencies in our services that could stop one of those services from successfully recovering. Another way we do this is by maintaining existing state. We consider the fact that control planes are statistically more likely to fail than data planes. Although the data plane typically depends on data that arrives from the control plane, the data plane maintains its existing state and continues working even in the face of control plane impairment. Data plane access to resources, once provisioned, has no dependency on the control plane, and therefore is not affected by any control plane impairment. In other words, even if the ability to create, modify, or delete resources is impaired, existing resources remain available. This makes AWS data planes statically-stable to an impairment in the control plane. You can implement different patterns to be statically-stable against different types of dependency failures. 

 An example of static stability can be found in Amazon EC2. Once an EC2 instance has been launched, it is just as available as the physical server in a data center. It does not depend on any control plane APIs in order to stay running, or to start running again after a reboot. The same property holds for other AWS resources like VPCs, Amazon S3 buckets and objects, and Amazon EBS volumes. 

 Static stability is a concept that is deeply ingrained in how AWS designs its services, but it is also a pattern that can be used by customers. In fact, a majority of the best practice guidance for using the different types of AWS services in a resilient way is to implement static stability for production environments. The most reliable recovery and mitigation mechanisms are the ones that require the fewest changes to achieve recovery. Instead of relying on the EC2 control plane to launch new EC2 instances to recover from a failed Availability Zone, having that extra capacity pre-provisioned helps achieve static stability. Thus, eliminating dependencies on control planes (the APIs that implement changes to resources) in your recovery path helps produce more resilient workloads. For more details on static stability, control planes, and data planes, refer to the Amazon Builders’ Library article [Static stability using Availability Zones](https://aws.amazon.com/builders-library/static-stability-using-availability-zones). 

# Summary
<a name="summary"></a>

 AWS utilizes different fault containers in our infrastructure to create fault isolation. The core infrastructure fault containers are partitions, Regions, Availability Zones, control planes, and data planes. Next, we’ll examine different types of AWS services, how these fault containers are utilized in their design, and how you should architect workloads with them to be resilient. 