

# Disaster recovery with AWS
<a name="disaster-recovery"></a>

The backup and restore approaches and supporting services and technologies can be used to implement your disaster recovery (DR) solution. Many enterprises are using the AWS Cloud for backup and restore and as a DR site. AWS provides a number of services and features that support DR and business continuity.

**Topics**
+ [On-premises DR to AWS](on-prem-dr-to-aws.md)
+ [DR for cloud-native workloads](dr-cloud-native.md)

# On-premises DR to AWS
<a name="on-prem-dr-to-aws"></a>

Using AWS as an offsite disaster recovery (DR) environment for on-premises workloads is a common hybrid scenario. Define your DR objectives, including the required recovery time and recovery point objectives, before selecting technologies to use. To help with this definition, you can use the [DR plan checklist](https://pages.awscloud.com/rs/112-TZM-766/images/GEN_disaster-recovery-plan-checklist_May-2020.pdf).

There are a number of options available to help you quickly set up and provision a DR environment on AWS. Be sure that you account for all your workload dependencies, and test your DR plan and solution thoroughly and regularly to verify its integrity.

AWS provides [AWS Elastic Disaster Recovery](https://docs.aws.amazon.com/drs/latest/userguide/what-is-drs.html) for creating a full replica of your on-premises servers, including the root volume and operating system, on AWS. Elastic Disaster Recovery continuously replicates your machines into a low-cost staging area in your target AWS account and preferred AWS Region. The block level replication is an exact replica of your servers’ storage including the operating system, system state configuration, databases, applications, and files. If there is a disaster, you can instruct Elastic Disaster Recovery to quickly launch thousands of your machines in their fully provisioned state within minutes.

Elastic Disaster Recovery uses an agent installed on each of your on-premises servers. The agents synchronize the state of your on-premises servers with lower-powered Amazon EC2 equivalents running on AWS. You can also automate your DR failover and failback process with Elastic Disaster Recovery. Automating your failover and failback process can help you achieve a lower and more consistent recovery time objective (RTO).

![\[Data center and an AWS environment with Elastic Disaster Recovery, recovery instances, EBS volumes.\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/backup-recovery/images/elastic-disaster-recovery.png)


1. Replication server status reporting

1. Staging area resources automatically created and terminated

1. Recovery instances launched with RTO of minutes and RPO of seconds

1. Continuous block-level replication (compressed and encrypted)

It’s important to test the DR process and to verify that the live staging environment doesn’t create conflicts with the on-premises environment. For example, confirm that the appropriate licenses are available and functioning in your on-premises, staging, and initiated DR environment. Also confirm that any worker type processes that might poll and pull work from a central database are configured appropriately to avoid overlaps or conflicts. In your DR process, include any necessary steps that must be performed before your recovery server instances come online. Also include the steps to perform after the recovery server instances are online and available. You can use solutions such as the [AWS Elastic Disaster Recovery Plan Automation solution](https://github.com/aws-samples/drs-tools/tree/main/drs-plan-automation) or another approach to help you automate your DR plans.

You can use a [Storage Gateway volume gateway](https://docs.aws.amazon.com/storagegateway/latest/userguide/managing-volumes.html) to provide your on-premises servers with cloud-based volumes. These volumes can also be quickly provisioned for use with Amazon EC2 using Amazon EBS snapshots. In particular, stored volume gateways provide your on-premises applications with low-latency access to their entire datasets. The volume gateways also provide durable snapshot-based backups that can be restored for on premises use or for use with Amazon EC2. You can schedule point-in-time snapshots based on the recovery point objective (RPO) for your workload.

**Important**  
Volume gateway volumes are intended to be used as data volumes and not as boot volumes.

You can use an Amazon EC2 Amazon Machine Image (AMI) with a configuration that matches your on-premises servers and specifies your data volumes separately. After you configure and test the AMI, provision the EC2 instances from the AMI along with the data volumes based on the volume gateway snapshots. This approach requires you to test your environment thoroughly to verify that your EC2 instance is operating properly, especially for Windows workloads.

# DR for cloud-native workloads
<a name="dr-cloud-native"></a>

Consider how your cloud-native workloads align to your DR objectives. AWS provides multiple Availability Zones in Regions around the world. Many enterprises using the AWS Cloud align their workload architectures and DR objectives to withstand the loss of an Availability Zone. The [Reliability Pillar](https://d1.awsstatic.com/whitepapers/architecture/AWS-Reliability-Pillar.pdf) in the AWS Well-Architected Framework supports this best practice. You can architect your workloads and their service and application dependencies to use multiple Availability Zones. You can then automate your DR and achieve your DR objectives with minimal to no intervention.

In practice, however, you might find that you are unable to establish a redundant, active, and automated architecture for all of your components. Examine every layer of your architecture to determine the necessary DR processes to achieve your objectives. This might vary from workload to workload, with different architectural and service requirements. This guide covers considerations and options for Amazon EC2. For other AWS services, you can refer to the [AWS documentation](https://docs.aws.amazon.com/) to determine high availability and DR options.

## DR for Amazon EC2 in a single Availability Zone
<a name="single-az"></a>

Try to architect your workloads to actively support and service clients from multiple Availability Zones. You can use Amazon EC2 Auto Scaling and Elastic Load Balancing to achieve a Multi-AZ server architecture for Amazon EC2 and other services.

If your architecture has EC2 instances that can’t be load balanced and can have only a single instance running at any given moment, you can use either of the following options.
+ Create an Auto Scaling group that has a minimum, maximum, and desired size of 1 and is configured for multiple Availability Zones. Create an AMI that can be used to replace the instance if it fails. Make sure that you define the proper automation and configuration so that a newly provisioned instance from the AMI can be automatically configured and provide service. Create a load balancer that points to the Auto Scaling group and is configured for multiple Availability Zones. Optionally, create an Amazon Route 53 alias that points to the load balancer endpoint.
+ Create a Route 53 record for your active instance and have your clients connect using this record. Create a script that creates a new AMI of your active instance and uses the AMI to provision a new EC2 instance in the stopped state in a separate Availability Zone. Configure the script to run periodically and to terminate the previous stopped instance. If there is an Availability Zone failure, start your backup instance in your alternative Availability Zone. Then update the Route 53 record to point to this new instance.

Test your solution thoroughly by simulating the failure that the solution was designed to protect against. Also consider the updates that your DR solution will need as your workload architecture changes.

## DR for Amazon EC2 in a regional failure
<a name="dr-regional-failure"></a>

Customers with very high availability requirements (for example, mission-critical applications that cannot tolerate any downtime) can use AWS across multiple Regions to provide further resiliency against issues at the Region level. Customers must carefully weigh the complexity, cost, and effort required to establish and maintain a multi-Region DR plan against the benefit. AWS provides features that support multi-Region architectures for global availability, failover, and DR. This guide covers a few of the available features that are specific to backup and recovery for Amazon EC2.

AWS AMIs and Amazon EBS snapshots are regional resources that can be used to provision new instances within a single Region. However, you can copy your snapshots and AMIs to another Region and use them to provision new instances in that Region. To support a regional failure DR plan, you can automate the process of copying AMIs and snapshots to other Regions. AWS Backup and Amazon Data Lifecycle Manager support cross-Region copying as a part of your backup configuration.

[AWS Elastic Disaster Recovery](https://docs.aws.amazon.com/drs/latest/userguide/what-is-drs.html) can be used to automate and continuously replicate your Amazon EC2 servers in one Region to an alternate DR Region. Elastic Disaster Recovery can simplify your multi-Region DR approach and help you to regularly test your cross-Region Amazon EC2 DR plan by using drills. Elastic Disaster Recovery can help when backup and recovery is unable to meet your RTO and RPO objectives. Elastic Disaster Recovery can help you lower your RTO to minutes and your RPO into the sub-second range.

Whichever solution you use, you must determine the provisioning, failover, and failback process to use in the event of an outage. You can use Route 53 with health checks and Domain Name System failover to help support your solution.