# Getting started with AWS Elastic Disaster Recovery
Getting started

**Topics**
+ [

## Disaster recovery overview
](#recovery-workflow-gs)
+ [

# Elastic Disaster Recovery initialization and permissions
](getting-started-initializing.md)
+ [

# Accessing the AWS Elastic Disaster Recovery Console
](accessing-console.md)
+ [

# AWS Elastic Disaster Recovery supported AWS Regions
](supported-regions.md)
+ [

# Using the AWS Elastic Disaster Recovery Console
](drs-console.md)
+ [

# Best practices for Elastic Disaster Recovery
](best_practices_drs.md)
+ [

# Disaster recovery at scale
](drs-at-scale.md)
+ [

# Elastic Disaster Recovery quick start guide
](quick-start-guide-gs.md)

## Disaster recovery overview


The general process is:

1. Initialize AWS Elastic Disaster Recovery in the target AWS Region. You can initialize through the [console or API](getting-started-initializing.md). See the [list of supported AWS Regions](supported-regions.md).

1. [Install the AWS Replication Agent](agent-installation.md) on the source server.

1. Wait until initial sync is finished. After installing the agent, the initial synchronization process performs block-level replication from the source server to the replication server in the staging area.

1. Launch drill instances. Perform acceptance drills on the servers. After the drill is tested successfully, finalize the drill and delete the instance.

1. Configure [post-launch actions](post-launch-action-settings-overview.md) if needed.

1. Confirm that there is no replication lag.

1. Initiate a failover by redirecting traffic.

1. Confirm that the Recovery instance was launched successfully.

1. To recover your data, initiate a [failback](failback-performing.md).

1. Complete the failback.

1. Return to normal operations.

For service quotas and limits, see [AWS Elastic Disaster Recovery endpoints and quotas](https://docs.aws.amazon.com/general/latest/gr/drs.html).

### Resources


The following free technical trainings are available for DRS:
+  [AWS Elastic Disaster Recovery - A Technical Introduction ](https://explore.skillbuilder.aws/learn/course/external/view/elearning/11123/aws-elastic-disaster-recovery-a-technical-introduction) 

# Elastic Disaster Recovery initialization and permissions
Initialization and permissions

In order to use AWS Elastic Disaster Recovery, the service must first be initialized for any AWS Region in which you plan to use Elastic Disaster Recovery. 

## Initializing AWS Elastic Disaster Recovery


AWS Elastic Disaster Recovery must be initialized upon first use from within the AWS Elastic Disaster Recovery Console. The initialization process occurs automatically once a user accesses the AWS Elastic Disaster Recovery Console. The user is directed to create the default replication settings, and upon saving the template, the service is initialized by creating the IAM roles which are required for the service to work. [Learn more about creating the default replication settings as part of the quick start guide.](quick-start-guide-gs.md#first-time-setup-gs) 

**Important**  
AWS Elastic Disaster Recovery **is not** compatible with CloudEndure Disaster Recovery. 

AWS Elastic Disaster Recovery can only be initialized by the Admin user of your AWS Account. During initialization, the following IAM roles are created: 
+ **AWSServiceRoleForElasticDisasterRecovery**
+ **AWSElasticDisasterRecoveryReplicationServerRole**
+ **AWSElasticDisasterRecoveryConversionServerRole**
+ **AWSElasticDisasterRecoveryRecoveryInstanceRole**
+ **AWSElasticDisasterRecoveryAgentRole**
+ **AWSElasticDisasterRecoveryFailbackRole**
+ **AWSElasticDisasterRecoveryRecoveryInstanceWithLaunchActionsRole**

## Additional policies


You can create roles with granular permission for AWS Elastic Disaster Recovery. The service comes with the following predefined managed IAM policies: 


+ AWSElasticDisasterRecoveryConsoleFullAccess
+ AWSElasticDisasterRecoveryReadOnlyAccess
+ AWSElasticDisasterRecoveryAgentPolicy
+ AWSElasticDisasterRecoveryAgentInstallationPolicy
+ AWSElasticDisasterRecoveryFailbackPolicy
+ AWSElasticDisasterRecoveryFailbackInstallationPolicy
+ AWSElasticDisasterRecoveryInstancePolicy
+ AWSElasticDisasterRecoveryServiceRolePolicy
+ AWSElasticDisasterRecoveryLaunchActionsPolicy

Learn more about [AWS Elastic Disaster Recovery roles and managed policies](security-iam-awsmanpol.md). 

## Initializing DRS through the API


You can initialize AWS Elastic Disaster Recovery through the API. This can help you automate service initialization by script when initializing multiple accounts. 

**Note**  
You need to [create the replication settings template ](https://docs.aws.amazon.com/drs/latest/APIReference/API_CreateReplicationConfigurationTemplate.html) after initializing the service. 

To initialize AWS Elastic Disaster Recovery manually, create the following IAM roles through the [IAM CreateRoleAPI](https://docs.aws.amazon.com/IAM/latest/APIReference/API_CreateRole.html). Learn more about [creating IAM roles in the AWS IAM documentation](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create.html). 

Creation of each role must include the following parameters:


****  

| Role name | Path | Trusted Entity | 
| --- | --- | --- | 
|   **AWSElasticDisasterRecoveryAgentRole**   |  /service-role/  | drs.amazonaws.com | 
|   **AWSElasticDisasterRecoveryFailbackRole**   |  /service-role/  | drs.amazonaws.com | 
|   **AWSElasticDisasterRecoveryConversionServerRole**   |  /service-role/  | ec2.amazonaws.com | 
|   **AWSElasticDisasterRecoveryRecoveryInstanceRole**   |  /service-role/  | ec2.amazonaws.com | 
|   **AWSElasticDisasterRecoveryReplicationServerRole**   |  /service-role/  | ec2.amazonaws.com | 
|   **AWSElasticDisasterRecoveryRecoveryInstanceWithLaunchActionsRole**   |  /service-role/  | ec2.amazonaws.com | 

Example using the AWS CLI: `aws iam create-role --path "/service-role/" --role-name AWSElasticDisasterRecoveryReplicationServerRole --assume-role-policy-document '{"Version": "2012-10-17", "Statement":[{"Effect":"Allow","Principal": {"Service":"ec2.amazonaws.com"},"Action":"sts:AssumeRole"}]}' ` 

After the roles have been created, attach the following AWS managed policies to the roles through the [IAM AttachRolePolicy API](https://docs.aws.amazon.com/IAM/latest/APIReference/API_AttachRolePolicy.html). Learn more about [adding and removing IAM identity permissions in the AWS IAM documentation](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html). 

1. Attach Managed Policy **AWSElasticDisasterRecoveryAgentPolicy ** to Role **AWSElasticDisasterRecoveryAgentRole ** 

1. Attach Managed Policy **AWSElasticDisasterRecoveryFailbackPolicy ** to Role **AWSElasticDisasterRecoveryFailbackRole ** 

1. Attach Managed Policy **AWSElasticDisasterRecoveryConversionServerPolicy ** to Role **AWSElasticDisasterRecoveryConversionServerRole ** 

1. Attach Managed Policy **AWSElasticDisasterRecoveryRecoveryInstancePolicy ** to Role **AWSElasticDisasterRecoveryRecoveryInstanceRole ** 

1. Attach Managed Policy **AWSElasticDisasterRecoveryReplicationServerPolicy ** to Role **AWSElasticDisasterRecoveryReplicationServerRole ** 

1. Attach Managed Policy **AWSElasticDisasterRecoveryRecoveryInstancePolicy ** and **AmazonSSMManagedInstanceCore** to Role **AWSElasticDisasterRecoveryRecoveryInstanceWithLaunchActionsRole ** 

**Note**  
Roles must also have a trust policy defined. The trust policy needs to define source identity and source account for security reasons, and allow the service to call SetSourceIdentity and AssumeRole. See the following policy examples.   


Example 1: creating a role for the **AWSElasticDisasterRecoveryAgentRole** with trusted entity relationships via the CreateRole API: 

 **Role: AWSElasticDisasterRecoveryAgentRole** 

```
$ aws iam create-role --path "/service-role/" --role-name
			AWSElasticDisasterRecoveryAgentRole --assume-role-policy-document file://agent-source-drs-trust-policy.json
```

 **agent-source-drs-trust-policy.json** 

Example 2: creating a role for the **AWSElasticDisasterRecoveryFailbackRole** with trusted entity relationships via the CreateRole API: 

 **Role: AWSElasticDisasterRecoveryFailbackRole** 

```
$ aws iam create-role --path "/service-role/" --role-name
			AWSElasticDisasterRecoveryFailbackRole --assume-role-policy-document file://failback-source-drs-trust-policy.json
```

 **failback-source-drs-trust-policy.json** 

Example 3: creating roles for the **AWSElasticDisasterRecoveryConversionServerRole**, **AWSElasticDisasterRecoveryRecoveryInstanceRole**, and **AWSElasticDisasterRecoveryReplicationServerRole ** with trusted entity relationships via the CreateRole API: 

 **Role: AWSElasticDisasterRecoveryConversionServerRole** 

```
$ aws iam create-role --path "/service-role/" --role-name
			AWSElasticDisasterRecoveryConversionServerRole --assume-role-policy-document file://source-drs-trust-policy.json
```

 **Role: AWSElasticDisasterRecoveryRecoveryInstanceRole** 

```
$ aws iam create-role --path "/service-role/" --role-name
			AWSElasticDisasterRecoveryRecoveryInstanceRole --assume-role-policy-document file://source-drs-trust-policy.json
```

 **Role: AWSElasticDisasterRecoveryReplicationServerRole** 

```
$ aws iam create-role --path "/service-role/" --role-name
			AWSElasticDisasterRecoveryReplicationServerRole --assume-role-policy-document file://source-drs-trust-policy.json
```

 **source-drs-trust-policy.json** 

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
     "Statement": [
        {
             "Effect":  "Allow",
             "Principal": {
                 "Service":  "ec2.amazonaws.com"
            },
             "Action":  "sts:AssumeRole"
        }
    ]
}
```

------

Once the policies are attached to the roles, run the `aws drs initialize-service ` command. This automatically creates the service-linked role (**AWSServiceRoleForElasticDisasterRecovery**), creates instance profiles, adds roles to instance profiles, and finishes service initialization. 

Learn more about [AWS Elastic Disaster Recovery roles and managed policies](security-iam-awsmanpol.md). 

## Programmatically initializing DRS


 To programmatically initialize the service, create an IAM role with the following IAM policy: 

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "iam:AttachRolePolicy",
            "Resource": "*",
            "Condition": {
                "ForAnyValue:ArnEquals": {
                    "iam:PolicyARN": [
                        "arn:aws:iam::aws:policy/service-role/AWSElasticDisasterRecoveryAgentPolicy",
                        "arn:aws:iam::aws:policy/service-role/AWSElasticDisasterRecoveryFailbackPolicy",
                        "arn:aws:iam::aws:policy/service-role/AWSElasticDisasterRecoveryConversionServerPolicy",
                        "arn:aws:iam::aws:policy/service-role/AWSElasticDisasterRecoveryRecoveryInstancePolicy",
                        "arn:aws:iam::aws:policy/service-role/AWSElasticDisasterRecoveryReplicationServerPolicy"
                    ]
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::*:role/*",
            "Condition": {
                "ForAnyValue:StringLike": {
                    "iam:PassedToService": [
                        "ec2.amazonaws.com",
                        "drs.amazonaws.com"
                    ]
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": [
                "drs:InitializeService",
                "drs:ListTagsForResource",
                "drs:GetReplicationConfiguration",
                "drs:CreateLaunchConfigurationTemplate",
                "drs:GetLaunchConfiguration",
                "drs:CreateReplicationConfigurationTemplate",
                "drs:*ReplicationConfigurationTemplate*",
                "iam:TagRole",
                "iam:CreateRole",
                "iam:GetServiceLinkedRoleDeletionStatus",
                "iam:ListAttachedRolePolicies",
                "iam:ListRolePolicies",
                "iam:GetRole",
                "iam:DeleteRole",
                "iam:DeleteServiceLinkedRole",
                "ec2:CreateSecurityGroup",
                "ec2:CreateTags",
                "sts:DecodeAuthorizationMessage",
                "ec2:DescribeSecurityGroups",
                "ec2:Get*"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": "iam:CreateServiceLinkedRole",
            "Resource": "arn:aws:iam::*:role/aws-service-role/drs.amazonaws.com/AWSServiceRoleForElasticDisasterRecovery"
        },
        {
            "Effect": "Allow",
            "Action": [
                "iam:CreateInstanceProfile",
                "iam:ListInstanceProfilesForRole",
                "iam:GetInstanceProfile",
                "iam:ListInstanceProfiles",
                "iam:AddRoleToInstanceProfile"
            ],
            "Resource": [
                "arn:aws:iam::*:instance-profile/*",
                "arn:aws:iam::*:role/*"
            ]
        }
    ]
}
```

------

Once the policies are attached to the roles, run the `aws drs initialize-service ` command. This automatically creates the service-linked role (**AWSServiceRoleForElasticDisasterRecovery**), creates instance profiles, adds roles to instance profiles, and finishes service initialization. 

Learn more about [AWS Elastic Disaster Recovery roles and managed policies](security-iam-awsmanpol.md). 

# Accessing the AWS Elastic Disaster Recovery Console
Accessing the console

You can access AWS Elastic Disaster Recovery directly through the AWS Console or through the following links: 
+ Commercial AWS Regions: [https://console.aws.amazon.com/drs/home ](https://console.aws.amazon.com/drs/home) 
+ AWS GovCloud Regions: [https://console.amazonaws-us-gov.com/drs/home ](https://console.amazonaws-us-gov.com/drs/home) 

# AWS Elastic Disaster Recovery supported AWS Regions
Supported AWS Regions

The following AWS Regions are supported by AWS Elastic Disaster Recovery:


****  

| Region name | Region identity | Support in AWS Elastic Disaster Recovery | 
| --- | --- | --- | 
| AWS GovCloud (US-West) | us-gov-west-1 | Yes | 
| AWS GovCloud (US-East) | us-gov-east-1 | Yes | 
| US East (Ohio) | us-east-2 | Yes | 
| US East (N. Virginia) | us-east-1 | Yes | 
| US West (N. California) | us-west-1 | Yes | 
| US West (Oregon) | us-west-2 | Yes | 
| Africa (Cape Town) | af-south-1 | Yes | 
| Asia Pacific (Hong Kong) | ap-east-1 | Yes | 
| Asia Pacific (Mumbai) | ap-south-1 | Yes | 
| Asia Pacific (Hyderabad) | ap-south-2 | Yes | 
| Asia Pacific (Osaka) | ap-northeast-3 | Yes | 
| Asia Pacific (Seoul) | ap-northeast-2 | Yes | 
| Asia Pacific (Singapore) | ap-southeast-1 | Yes | 
| Asia Pacific (Sydney) | ap-southeast-2 | Yes | 
| Asia Pacific (Jakarta) | ap-southeast-3 | Yes | 
| Asia Pacific (Melbourne) | ap-southeast-4 | Yes | 
| Asia Pacific (Tokyo) | ap-northeast-1 | Yes | 
| Canada (Central) | ca-central-1 | Yes | 
| Europe (Frankfurt) | eu-central-1 | Yes | 
| Europe (Zurich) | eu-central-2 | Yes | 
| Europe (Ireland) | eu-west-1 | Yes | 
| Europe (London) | eu-west-2 | Yes | 
| Europe (Milan) | eu-south-1 | Yes | 
| Europe (Spain) | eu-south-2 | Yes | 
| Europe (Paris) | eu-west-3 | Yes | 
| Europe (Stockholm) | eu-north-1 | Yes | 
| Middle East (UAE) | me-central-1 | Yes | 
| Middle East (Bahrain) | me-south-1 | Yes | 
| Israel (Tel Aviv) | il-central-1 | Yes | 
| South America (São Paulo) | sa-east-1 | Yes | 

 Learn more about [AWS Services by Region](https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/).

AWS Elastic Disaster Recovery regional support includes [AWS Local Zones](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html#concepts-local-zones) associated with the above supported regions.

# Using the AWS Elastic Disaster Recovery Console
Using the console

AWS Elastic Disaster Recovery is AWS Region-specific. Make sure that you select the correct Region from the **Select a Region** menu when using AWS Elastic Disaster Recovery, just like you would with other AWS Region-specific services such as Amazon EC2.

AWS Elastic Disaster Recovery is divided into several primary pages. Each page contains additional tabs and actions. The default view for the AWS Elastic Disaster Recovery Console is the **Source servers** page. This page automatically opens every time you open AWS Elastic Disaster Recovery. You can navigate to other AWS Elastic Disaster Recovery pages through the left pane **AWS Elastic Disaster Recovery** navigation menu. 

Each Elastic Disaster Recovery page opens in the right pane. 

## Source servers page


The Source Servers page lists all of the source servers you added to AWS Elastic Disaster Recovery and allows you to interact with your servers and perform a actions. [[Learn more about the Source servers page. ](server-list.md)](source-servers.md)

Control your source servers in the AWS Elastic Disaster Recovery console through the **Actions**, **Replication**, and **Initiate recovery job ** menus. 

Review the progress of commands through the **Recovery job history** tab. [Learn more about recovery job history. ](recovery-job.md)

The commands in the **Actions** and **Initiate recovery job** menus influence the specific source servers you selected. You can select a single source server or multiple source servers for any command. 

Use the **Filter source servers by property or value** field to filter servers. 

AWS Elastic Disaster Recovery color codes the state of each source server. Use the **Alerts** column to easily determine the state of your server. 
+ A server that is ready to launch Drill or Recovery instances displays the green checkmark and states **Ready**.   
![\[Green checkmark icon indicating a server is ready for Drill or Recovery instances.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drs-sourceservers-ready.png)

  A server that is ready to launch Drill or Recovery instances, but is experiencing a non-critical issue such as lag displays the blue info sign and states **Ready ** and displays the lag duration to the right. You may need to take action to fix the lag.   
![\[Blue info icon with "Ready" status and "lag 2 hr" indication.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drs-new-ss5-lag.png)

  A server that is still undergoing initial sync displays a gray circle with three dots and states **Initial sync**.   
![\[Gray circle with three dots indicating a server undergoing initial synchronization.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drs-sourceservers-initialsync.png)

  A server that is disconnected displays the gray warning sign and states **Disconnected**.   
![\[Gray warning icon with "Disconnected" text indicating server status.\]](http://docs.aws.amazon.com/drs/latest/userguide/images/drs-new-ss5-disc.png)

  A server that is not ready due to a significant error, such as a stall, displays a red **X** and states **Not ready**. The Not Ready state is only shown for servers that are not replicating and do not have any previously created Points in Time. Action must be taken in order to fix the issue. 

When some commands are initiated AWS Elastic Disaster Recovery displays information messages at the top of the **Source servers** page. AWS Elastic Disaster Recovery color codes these messages for clarity. A green message means that a command was completed successfully. A red message means that a command was not completed successfully. Each message provids details and links to supplemental information.

AWS Elastic Disaster Recovery allows you to interact with and manage each server. Choose the server hostname to be redirected to the server details view. 

The **Server details** view tab shows specific details for an individual server. From here, you can see an overview of the server's recovery state, as well as various technical details, manage tags, manage disks, edit the server's replication settings, and edit the server's launch settings through the various tabs. [Learn more about the Server Details view](server-details.md). 

Certain Elastic Disaster Recovery commands, such as **Edit replication settings**, allow you to interact with multiple source servers at once. When multiple source servers are selected and the **Replication > Edit replication settings ** option is chosen, AWS Elastic Disaster Recovery indicates which servers are being edited. 

In order for setting changes you have made in the AWS Elastic Disaster Recovery Console to take effect, be sure to choose **Save** at the bottom of each Settings page. 

# Best practices for Elastic Disaster Recovery
Best practices

For a more complete discussion of best practices for planning, implementing, and maintaining disaster recovery for on-premises applications using AWS, [see this white paper](https://docs.aws.amazon.com/whitepapers/latest/disaster-recovery-of-on-premises-applications-to-aws/disaster-recovery-of-on-premises-applications-to-aws.html). 

## Planning


1. Being ready for a real recovery event requires pre-planning. Simply having your servers replicating to AWS, and even having launched them once is not enough. You should have a written recovery plan of what to do in the event of a real recovery event. To learn more, read this [Checklist for your IT disaster recovery plan](https://pages.awscloud.com/GLOBAL-aware-PT-disaster-recovery-plan-checklist-2019-learn.html). 

1. Once your source servers have reached the Healthy state (after initial sync has completed), you should launch Drill instances for each of your applications and ensure that each application as a whole is working as expected when running in your recovery AWS Region. As you go through this process, you will likely create the necessary network resources required (together with security groups and other related resources). While you can keep these recovery networks (and related resources) up and running even when not in use, it is recommended that once you have them set up properly, create a CloudFormation template that can create them on demand, should the need arise. You should discover and record the order in which servers, and applications need to be launched, and record this in the recovery plan. 

## Drilling


Regular drills are an integral part of any Disaster Recovery solution. With DRS, drilling is simple and nondisruptive (both to the servers at the source, and to the replication process itself). We recommend drilling as often as is practical, and at least several times a year, and updating the recovery plan with any findings and required changes. Testing and [understanding failback](failback.md#failback-overview) is also important. Be sure to include it in your initial drill, and in at least some of your regular drills. 

Regular testing can help ensure that your resources are properly prepared for both disasters and scheduled drills. Before conducting large-scale scheduled drills, make sure you meet all the prerequisites and run the required tests. To allow our support team to assist you in case of misconfiguration or other issues, conduct the preliminary testing a week or 2 before the scheduled drill.

**Note**  
While your drill instances are up and running, you are paying for them as per your standard Amazon EC2 rates. Make sure to terminate the drill instances when the drill is done, and include this as a step in your recovery plan. 

## Monitoring


You can monitor the health of the ongoing replication using the DRS console or programmatically. In the AWS DRS console, go to the **Servers list** page, and look at the **Ready for recovery** column. Any server that is not showing as **Ready** with a green checkmark, may require attention. Servers that show **stalled** in the **Data replication status ** column require your intervention to resolve. Servers that are showing **Lag**, may resolve themselves (unless they are also stalled). You should monitor and explore to see if the Lag is a persistent problem (for example, due to insufficient network bandwidth). You can use a scripted solution and the [DRS API](https://docs.aws.amazon.com/drs/latest/APIReference/Welcome.html) to respond to servers becoming stalled, or going into lag, or you can use [Amazon EventBridge ](https://docs.aws.amazon.com/eventbridge/) and the [EventBridge events generated by AWS DRS](https://docs.aws.amazon.com/drs/latest/userguide/monitoring-event-bridge-sample.html). 

## Limits


Due to Amazon EBS limits on the rate at which EBS snapshots can be taken, the maximum number of servers that can be replicated using DRS in a single AWS account is limited to 300. To replicate more than the maximum number of servers, use multiple AWS accounts, or multiple target AWS Regions (you need to set up DRS separately for each account/ Region. 

You can also use multiple staging or target accounts, as described in [Using multiple staging accounts with AWS DRS](https://docs.aws.amazon.com/drs/latest/userguide/multi-account.html).

## Protecting Point-In-Time snapshots


DRS uses EBS snapshots to maintain [recovery Points-In-Time](https://docs.aws.amazon.com/drs/latest/userguide/failback-overview.html#point-in-time-faq). If these are deleted, then you can only recover from the latest state, as maintained on the replication server (and if it is terminated, then you can no longer recover at all). In the event of a breach, which includes not just corruption of your data at source, but also access to your AWS account, then the malicious actor could delete your Point-In-Time snapshots, unless you take extra measures to protect them. 

## Controlling agent installation permissions


You should control who can install the AWS Replication Agent in your account. Once an agent is installed you immediately begin accruing charges for DRS, and for replication resources (such as EBS, etc.) The agent installation permissions should be as limited as is practical. The recommended way for controlling who can install agents is to create an IAM role, and to [allow users to assume the role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use.html). 

1. Create an IAM role ([IAM docs link ](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user.html) \$1 [IAM console link](https://console.aws.amazon.com/iamv2/home#/roles/create?step=selectEntities&trustedEntityType=AWS_ACCOUNT)), based on the [DRS managed permission for agent installation](https://docs.aws.amazon.com/drs/latest/userguide/security-iam-awsmanpol-AWSElasticDisasterRecoveryAgentInstallationPolicy.html). If this role is to be used by someone outside of your AWS account make sure to use [the external ID functionality](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user_externalid.html). Send the role ARN to the users who need to install agents (ARN is not secret and can be sent via email). Use [permission boundaries](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_boundaries.html) to further limit what can be done using that role. For example, you can control which AWS Region it can be used for, how long the temporary credentials created with the role are good for, specify tags that must be provided (or may not be provided) during agent installation, and more. 

1. Users who install the agents [assumes that role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use.html) (must be a user of an AWS account, either yours, or another; you configure who the role is for in step 1). This creates temporary IAM credentials for that users which are used for [agent installation](https://docs.aws.amazon.com/drs/latest/userguide/agent-installation.html). These credentials are limited to only the permissions required for agent installation (and further limited by the permission boundaries you defined), yet are associated with the user (for example, so their usage can be tracked using CloudTrail). 

## Recovery best practices


1. **Overview:** DRS makes successful recovery possible, by handling ongoing replication, and the on-demand launching of actual Recovery instances. The re-routing of traffic (failover) is not done via DRS, and should be done using your preferred DNS routing service, such as [Amazon Route 53](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/Welcome.html). Your recovery plan should include details of which service to use, who in your organization owns this service, and what conditions must be met to perform the re-routing (for example: launch Recovery instances using DRS, perform successful launch-validation test, wait for system X, Y, and Z to also launch and pass test, then re-route). 

1. **Termination protection for recovery instances:** When you launch recovery instances in case of a real event, you should prevent them from being inadvertently terminated. This should be done after you have performed launch-validation test, and before data re-routing. You can turn on termination protection directly from the [Amazon EC2 console](https://console.aws.amazon.com/ec2/), by selecting the instances, and from the **Actions** menu choosing **Instance settings, change termination protection**, and choosing **Yes, Enable.** You should document this step in you recovery plan. [Learn more about termination protection](https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/terminating-instances.html#Using_ChangingDisableAPITermination). 

1. **Understanding failover costs:** Your EC2 recovery instances are created according to the [launch settings](https://docs.aws.amazon.com/drs/latest/userguide/launch-settings.html) you have configured for each source server. Recovery instances accrue EC2 and EBS charges as per AWS rates for your account in the target AWS Region. While you use the Recovery instances, you also continue paying for DRS, and the replication resources it created. 

1. **Recovery dos and don’ts:** Do not use the **Disconnect from AWS** action in the DRS console for servers for which you launched Recovery instances, even in the case of a real recovery event. Performing a disconnect terminates all replication resources related to these source servers, including your Point-In-Time (PIT) recovery points. You may need these PITs while you are in failover state, for regulatory reasons, or to re-launch a Recovery instances for any reason (for instance if you discover that the PIT from which you launched includes corrupt or malicious data, and you want to relaunch from an earlier PIT). While you use your Recovery instances as your primary, and new data is presumably written to them, these recovery instances are not themselves being replicated, and you are not creating any new PITs for these changes. It is possible to configure the recovery instances as new source servers and [replicate them cross-Region](https://docs.aws.amazon.com/drs/latest/userguide/failback-failover-region-region.html), to have disaster recovery for your recovery site. This carries with it additional costs, as noted in [Performing a cross-Region failback](https://docs.aws.amazon.com/drs/latest/userguide/failback-failover-region-region.html) 

1. **Using recovery for migration:** Once you launch and use recovery instances on AWS for a real event, you may wish to go on using them permanently, instead of your original servers. The primary additional steps you need to do are: 

   1. Set up cross region replication, so that these recovery instances become new source servers; 

   1. Wait for these new source servers to have to full number of daily PITs that you need to maintain; 

   1. Perform the **Disconnect from AWS** action on the original source servers, so as to avoid confusion, and to stop paying for DRS and related replication resources for these original source servers. You can also then choose **Delete ** from the **Actions** menu, and this causes DRS to forget everything it knows about these source servers, and for them to no longer appear in the Elastic Disaster Recovery console. 

1. **Recover into existing instance:** Use if you want to recover into an instance that already exists instead of launching a new one for recovery, drill or failback. The instance to recover into must be of the same operating system platform (Linux or Windows) as the source instance, it must be stopped and it must have the tag key *AWSDRS* and tag value *AllowLaunchingIntoThisInstance*. [Learn more about recover into existing instance](launch-general-settings.md#server-launch-settings-parameters). 

## Failback best practices


1. **Mass failback:** If you are failing back more than several servers, and your source environment is VMware vCenter, then consider using [DRS Mass Failback Automation client](https://docs.aws.amazon.com/drs/latest/userguide/failback-failover-drsfa.html). 

1. **Return to normal operation:** make sure that the failed-back servers at the source are replicating back to AWS, and appear as source servers in the DRS console. If they do appear in the DRS console and are not replicating, explore the reason (such as firewall settings, etc.) If they do not appear in the DRS console you may need to install / re-install the AWS Replication Agent on them. Make sure that you do not end up with two source server entities in the DRS console, one representing the original server, and one the failed-back server. 

1. **Cleanup after return to normal operation:** Once you have completed failback, there may be multiple AWS resources left behind that you no longer need and that are costly to maintain: 

    After performing a failback to on-premises environment, perform the following steps: 
   + Clean Recovery instances: Terminate these instances from the **Recovery instances** page of the DRS Console. 
   + Source servers: These appear in the Source Servers page of the DRS console. Make sure that you only have one source server in the DRS console for each actual server at the source. Source servers are billed by DRS and consume replication resources (billed by other AWS services) until you perform the **Disconnect from AWS** action. If you do have duplicate source servers, do not disconnect/delete the original ones until the new ones have accumulated all the Point-In-Time recovery points (PITs) you need. Performing the **disconnect from AWS** action causes the PITs from the original sources servers to be discarded. If your source is also in AWS, then you have more resources that need to be cleaned up. [Learn more about cleaning up these resources](https://docs.aws.amazon.com/drs/latest/userguide/failback-failover-region-region.html).
**Note**  
The cleanup process following a cross-region failback is different. [Learn how to perform a cleanup following a cross-region failback](failback-failover-region-region.md).

## Security best practices


You can review security best practices in the [Security chapter](security.md). 

# Disaster recovery at scale
Disaster recovery at scale

When protecting a large number of servers (100\$1) with AWS Elastic Disaster Recovery, additional planning is required to ensure reliable replication, successful recovery, and manageable operations. This section provides guidance for operating Elastic Disaster Recovery at scale.

**Topics**
+ [

## Account and Region planning
](#at-scale-account-planning)
+ [

## Network planning and benchmarking
](#at-scale-network-planning)
+ [

## Storage benchmarking
](#at-scale-storage-benchmarking)
+ [

## Agent deployment at scale
](#at-scale-agent-deployment)
+ [

## DR readiness and compliance monitoring
](#at-scale-dr-readiness)
+ [

## Service quotas and API limits
](#at-scale-service-quotas)
+ [

## Recovery planning at scale
](#at-scale-recovery-planning)

## Account and Region planning


A single AWS account supports up to 300 concurrently replicating source servers. For larger environments, distribute source servers across multiple staging accounts or target AWS Regions.
+ Use [multiple staging accounts](multi-account.md) to scale beyond the 300-server limit per account.
+ Plan your account structure early — moving source servers between accounts requires reinstalling the agent.
+ When using multiple accounts, ensure that EBS encryption keys (KMS) are shared across accounts if you use custom encryption.
+ Establish a consistent IAM policy management strategy across all accounts. Use AWS Organizations and Service Control Policies (SCPs) to enforce guardrails.

## Network planning and benchmarking


Network bandwidth is a critical factor for replication performance at scale. Before deploying agents, benchmark your network to ensure it can sustain the required throughput.

1. **Benchmark network bandwidth:** Test the bandwidth between your source environment and the staging area subnet using the SSL connectivity and bandwidth test AMI. This test uses encryption, accurately simulating the replication agent's behavior. Instructions are available for [performing the bandwidth test](https://docs.aws.amazon.com/drs/latest/userguide/Replication-Related-FAQ.html#perform-connectivity-bandwidth-test).

1. **Plan for aggregate bandwidth:** Calculate the total write throughput across all source servers (see [calculating required bandwidth](Troubleshooting-Communication-Errors.md#Calculating-Bandwidth)). Ensure your network connection (Direct Connect, VPN, or internet) can sustain this aggregate throughput with headroom for spikes.

1. **IP planning:** Plan your staging area and recovery VPC CIDR ranges to accommodate the number of replication servers, recovery instances, and any network infrastructure (NAT gateways, transit gateways, load balancers). Ensure there is no IP overlap between source and recovery environments if using VPN or Direct Connect.

## Storage benchmarking


Understanding the storage write patterns of your source servers helps you provision appropriate replication resources and avoid replication lag.

1. Capture storage performance metrics on your source servers using `iostat` on Linux or Performance Monitor on Windows. Focus on write IOPS and write throughput (MB/s) per disk.

1. Servers with high write rates (such as database servers) may require [dedicated replication servers](https://docs.aws.amazon.com/drs/latest/userguide/replication-server-settings.html#dedicated-replication-server) with an instance type that can handle the required EBS IOPS and throughput.

1. Consider excluding high-churn volumes that are not needed for disaster recovery (such as database tempdb or backup disks) using the `--devices` installer parameter to reduce replication load.

## Agent deployment at scale


Deploying the AWS Replication Agent across hundreds of servers requires automation. Consider the following approaches:
+ Use configuration management tools (such as AWS Systems Manager Run Command, Ansible, or Chef) to deploy the agent across multiple servers simultaneously.
+ Use the `--no-prompt` installer parameter for unattended installation. Combine with `--devices` to specify disks explicitly when automatic detection is not suitable.
+ Deploy agents in batches rather than all at once to avoid overwhelming the staging area network and to stay within API limits.
+ Verify that all source servers meet the [installation prerequisites](agent-installation-instructions.md) before beginning deployment.

## DR readiness and compliance monitoring


At scale, manually monitoring replication health and drill compliance is impractical. Implement automated monitoring to maintain DR readiness.

1. **Replication health:** Use the Elastic Disaster Recovery API (`describe-source-servers`) or [Amazon EventBridge](https://docs.aws.amazon.com/eventbridge/) to monitor replication state across all source servers. Alert on servers in **Stalled**, **Disconnected**, or **Lag** states.

1. **Drill compliance:** Track when each source server was last tested. Use the `describe-source-servers` API to retrieve the last launch date and type (drill or recovery) from the `lifeCycle.lastLaunch` field. Flag servers that have not been drilled within your organization's required interval.

1. **Dashboard:** For multi-account environments, consider building a centralized dashboard using AWS Organizations and cross-account IAM roles to aggregate replication status across all accounts.

## Service quotas and API limits


Large-scale deployments can encounter service quota limits. Review and plan for the following:
+ **Elastic Disaster Recovery quotas:** Maximum 300 concurrently replicating source servers per account, 100 source servers per recovery job, 500 source servers across all active jobs, and 20 concurrent jobs. See [Elastic Disaster Recovery service quotas](https://docs.aws.amazon.com/general/latest/gr/drs.html#limits_drs).
+ **Amazon EC2 quotas:** Plan for the number of replication server instances, recovery instances, and associated EBS volumes that will run concurrently. Request quota increases in advance.
+ **EBS snapshot limits:** Elastic Disaster Recovery creates EBS snapshots for point-in-time recovery. At scale, the number of snapshots can grow significantly based on your retention policy.
+ **API throttling:** When using automation to manage large numbers of servers, implement exponential backoff and retry logic to handle API throttling gracefully.

## Recovery planning at scale


Recovering hundreds of servers simultaneously requires careful orchestration:
+ **Group servers by application:** Identify dependencies between servers and group them so that dependent servers are recovered together in the correct order.
+ **Stagger recovery jobs:** Launch recovery instances in batches to stay within the concurrent job limits (20 concurrent jobs, 100 servers per job, 500 servers across all active jobs) and to avoid overwhelming the target environment.
+ **Automate recovery orchestration:** Use the Elastic Disaster Recovery API and AWS Step Functions or similar orchestration tools to automate the recovery sequence, including post-launch validation.
+ **Plan VPC capacity:** Ensure your recovery VPCs have sufficient IP addresses, subnets, and network resources for all recovery instances.

# Elastic Disaster Recovery quick start guide
Quick start guide

This section guides you through your initial Elastic Disaster Recovery setup, including: 

**Topics**
+ [

## First time setup
](#first-time-setup-gs)
+ [

## Adding source servers
](#adding-servers-gs)
+ [

## Configuring launch settings
](#configuring-target-gs)
+ [

## Launching a drill instance
](#launching-test-gs)
+ [

## Launching a recovery instance
](#launch-recovery-gs)
+ [

## Performing a failback
](#failback-gs)

## First time setup


In order to use AWS Elastic Disaster Recovery (AWS DRS), you first need to set it up in each AWS Region in which you want to use it (the Region into which you will be replicating, and where you will launch Recovery instances). Setting up the service consists of defining default replication settings and creating the roles and permissions required for the service to operate. 

**Note**  
You need to be the admin user of the AWS account, or have a role with the AWSElasticDisasterRecoveryConsoleFullAccess permission in order to set up the service 

The first setup step for AWS DRS is setting the default replication settings. Choose **Set default replication settings** on the AWS Elastic Disaster Recovery landing page. You are guided through the steps of setting up your default replication settings, default launch settings, and EC2 template. These default settings are applied to every source server that is added to AWS Elastic Disaster Recovery. You can change both the default settings and individual source server settings for one or more source servers at any time. Learn more about editing [ your replication settings](default-replication-settings.md) and [ launch settings](launch-settings-overview.md). To learn more about each setting, select the **Info** links next to each section.

**Important**  
Before configuring your default settings, ensure that you meet the [Network requirements for running AWS Elastic Disaster Recovery](preparing-environments.md) 

On the first page of the wizard, you are asked to **Set up replication servers**. Replication servers are lightweight Amazon EC2 instances that are used to replicate data between your source servers and AWS. Replication servers are automatically launched and terminated as needed. You can start using AWS Elastic Disaster Recovery with the default replication server settings or you can configure your own settings. [Learn more about replication server settings.](individual-replication-settings.md#replication-server-settings) 
+ Configurable replication server settings include:
  + The subnet within which the replication server will be launched
  + Replication server instance type

During this step you can review the service linked role and additional policies created during Elastic Disaster Recovery initialization. Choose **View details** to learn more.

On the second page of the wizard you are asked to **Specify volumes and security groups**. For each disk on an added source server there is an identically-sized EBS volume attached to a replication server, and each replication server can handle replication of disks from multiple source servers. [Learn more about volumes.](volumes-drs.md) 

A security group acts as a virtual firewall, which controls the inbound and outbound traffic of the staging area. The best practice is to have AWS Elastic Disaster Recovery automatically attach to and monitor the default AWS Elastic Disaster Recovery security group. This group opens inbound TCP Port 1500 for receiving the transferred replicated data. [Learn more about security groups.](drs-security-group.md) 

Configurable volumes and security groups settings include:
+ EBS volume type
+ EBS encryption
+ Always use AWS Elastic Disaster Recovery security group

On the third page of the wizard you can **Configure additional replication settings**. These include **Data routing and throttling**, **Point in time (PIT) policy**, and **Tags**. 
+  **Data routing and throttling** controls how data flows from the external server to the replication servers. If you choose not to use a private IP, your replication servers are automatically assigned a public IP and data flows over the public internet. [Learn more about data routing and throttling.](data-routing.md) 
+ Point in Time (PIT) is a disaster recovery feature which allows launching an instance from a snapshot captured at a specific point in time. As source servers are replicated, snapshots are taken over time. The **Point in time (PIT) policy** section allows to configure a retention policy that determines which snapshots are not required after a defined duration. 
+ The **Tags** section allows you to add custom tags to resources created by AWS Elastic Disaster Recovery in your AWS account. 

Additional configurable settings include:
+ Use private IP for data replication
+ Create public IP
+ Throttle network bandwidth
+ Snapshot retention
+ Tags

On the fourth page of the wizard you can **Set default DRS launch settings**.

Default launch settings define how drill or recovery instances are launched in AWS. You can start using AWS Elastic Disaster Recovery with the default launch settings or configure your own. [Learn more about default DRS launch settings.](default-drs-launch-settings.md)

Configurable options include:
+ Instance type right sizing
+ Start instance upon launch
+ Copy private IP
+ Transfer server tags
+  OS licensing

The fifth page of the wizard: **Set default EC2 launch settings** is where you configure the default Amazon EC2 launch template which defines how instances are launched in AWS. Changes you make to the template only affect new servers, but you can edit the template for multiple servers according to your preferences. [Learn more about default EC2 launch template.](default-ec2-launch-template.md) The EC2 launch template includes basic and advanced settings.

Basic configurable options include:
+ Subnet
+ Security groups
+ Instance type
+ EBS volume type

You only need to change advance configurable options in specific operational scenarios. They include:
+ IAM instance profile
+ Tenancy

The sixth page is where you **Review and initialize**.

Review the settings you configured. To change a specific setting select **Edit**, which redirects you to the page in the wizard on which the setting appears. Go through the remaining pages to return to the **Review and create** page.

Once you have reviewed all of the settings you chose, select **Configure and initialize**. The default template is created and you return to the AWS Elastic Disaster Recovery console.

**Note**  
You can always edit the default replication or launch settings by choosing the appropriate item from the **Settings** page, which you can open from the left-hand navigation menu. Remember that changes you make are only applied to newly added servers and not to existing servers. 

## Adding source servers


Add source servers to AWS Elastic Disaster Recovery by installing the AWS Replication Agent (also referred to as "the Agent") on them. The Agent can be installed on both Linux and Windows servers. [Learn more about adding source servers.](adding-servers.md) 

Prior to adding your source servers, ensure that you meet all of the [Network requirements](preparing-environments.md). 

**Note**  
DRS agents can only be installed on instances that are in AWS Regions that are supported by Elastic Disaster Recovery.

## Configuring launch settings


After you have added your source servers to the AWS Elastic Disaster Recovery console, you need to configure the launch settings for each server. The launch settings are a set of instructions that determine how a recovery instance is launched for each source server on AWS. You must configure the launch settings prior to launching test or recovery instances. You can use the default settings or configure the settings to fit your requirements. 

**Note**  
You can change the launch settings after a drill or recovery instance has been launched. You need to launch a new Drill or Recovery instance for the new settings to take effect. 

You can access the launch settings by selecting the hostname of a source server on the **Source servers** page. 

Within the individual server view, navigate to the **Launch settings ** tab. 

Here you can see your **General launch settings** and your **EC2 launch template**. Select **Edit** to edit your launch settings or your EC2 launch template. 

Launch settings include:
+  **Instance type right-sizing** – The Instance type right-sizing feature allows AWS Elastic Disaster Recovery to launch a drill or recovery instance type that best matches the hardware configuration of the source server. When activated, this feature overrides the instance type selected in the EC2 launch template. 
+ **Start instance upon launch** – Choose whether you want to start your Initiate recovery job instances automatically upon launch or whether you want to start them manually through the Amazon EC2 Console. 
+  **Copy private IP** – Choose whether you want AWS Elastic Disaster Recovery to verify that the private IP used by the drill or recovery instance matches the private IP used by the source server. 
+  **Transfer server tags** – Choose whether you want AWS Elastic Disaster Recovery to transfer any user-configured custom tags from your source servers to your drill or recovery instance.

AWS Elastic Disaster Recovery automatically creates an **EC2 launch template** for each new source server. AWS Elastic Disaster Recovery bases the majority of the instance launch settings on this template. You can edit this template to fit your needs. 

 [Learn more about Launch settings.](launching-target-servers.md) 

## Launching a drill instance


After you have added all of your source servers and configured their launch settings, you are ready to launch a drill instance. It is crucial to drill the recovery of your source servers to AWS prior to initiating a recovery in order to verify that your source servers function properly within the AWS environment.

**Important**  
When launching a drill, recovery, or an in-AWS failback, you can launch up to 100 source servers in a single operation. Additional source servers can be launched in subsequent operations.
It is a best practice to perform drills regularly. After launching drill instances, use either SSH (Linux) or RDP (Windows) to connect to your instance and ensure that everything is working correctly. 

You can drill one source server at a time, or simultaneously drill multiple source servers. For each source server, you are informed of the success or failure of the drill. You can drill your source server as many times as you want. Each new drill first deletes any previously launched drill or recovery instance and dependent resources. Then, a new Drill instance is launched, which reflects the chosen Point-in-time state of the source server. After the drill, data replication continues as before. The new and modified data on the source server is transferred to the Staging Area Subnet and not to the Recovery instances that were launched during the test. 

**Note**  
Windows source servers need to have at least 2 GB of free space to successfully launch a recovery instance.
Take into consideration that once a drill instance is launched, actual resources are used in your AWS account and you will be billed for these resources. You can terminate the operation of launched Recovery instances once you verify that they are working properly without impact in order to data replication. 

 [Learn more about launching drill instances as part of the overall recovery and failback framework.](preparing-failover.md#recovery-drill-overview) 

## Launching a recovery instance


Once you have finalized the testing of all of your source servers, you are ready for recovery. You should perform the recovery at a set date and time. The recovery migrates your source servers to the recovery instances on AWS. 

You can recover one source server at a time, or simultaneously recover multiple source servers. For each source server you are informed of the success or failure of the Recovery. For each new recovery, AWS Elastic Disaster Recovery first deletes any previously launched recovery instance and dependent resources. Then, it launches a new Recovery instance which reflects the most up-to-date state of the source server. After the Recovery, data replication continues as before. The new and modified data on the source server is transferred to the Staging Area Subnet, and not to the recovery instances that were launched during the recovery. 

 [Learn more about launching Recovery instances as part of the overall recovery and failback framework.](failback-preparing-failover.md#failback-launching-instances) 

## Performing a failback


Once the disaster is over, you can perform a failback to your original source server or to any other AWS Elastic Disaster Recovery Failback Client on the server. In order to use the Failback Client, you need to generate Elastic Disaster Recovery-specific credentials. Once the failback is complete, you can opt to either terminate, delete, or disconnect the Recovery instance. 

 [Learn more about performing a failback.](failback-performing.md)