# Disaster recovery for SAP workloads on AWS using AWS Elastic Disaster Recovery
<a name="dr-sap"></a>

Disasters due to natural events (earthquakes, hurricanes, or floods), application failures, technical failures or human actions cause application downtime and potential data loss, impacting revenue. To mitigate such scenarios, you can create a business continuity plan with the key element of disaster recovery. Designing, implementing, and maintaining a disaster recovery plan is critical for organizations running mission-critical applications, such as SAP. For more information, see [Business Continuity Plan (BCP)](https://docs.aws.amazon.com/whitepapers/latest/disaster-recovery-workloads-on-aws/business-continuity-plan-bcp.html).

 AWS Elastic Disaster Recovery enables organizations to quickly and easily implement a new or migrate an existing disaster recovery plan to AWS. The source servers can be hosted on AWS, existing physical or virtual data centers, private cloud or with other cloud providers. We recommend using Elastic Disaster Recovery to implement a disaster recovery plan for your SAP workloads, where AWS is the disaster recovery environment, and the source environment may or may not be on AWS. You can access Elastic Disaster Recovery from the [Elastic Disaster Recovery console](https://console.aws.amazon.com/drs).

An initial setup of the AWS Replication Agent is required on the source systems for Elastic Disaster Recovery to initiate secure data replication. Your data is replicated using secure protocols, either directly over the internet, or via an encrypted and/or dedicated network connection, to any AWS Region supported by Elastic Disaster Recovery. By replicating the source systems to replication servers in a staging area, the cost of disaster recovery is optimized by using affordable storage, shared servers, and minimal compute resources to maintain ongoing replication.

You can perform non-disruptive tests, known as drills, to confirm that your Elastic Disaster Recovery implementation is ready for a disaster recovery scenario. Elastic Disaster Recovery automatically converts your servers to boot and run natively on AWS when you launch instances for drills or recovery. The service also automatically creates point in time (PIT) snapshots of your server state as it replicates. If you need to recover applications, you can launch recovery instances on AWS within minutes, using the latest snapshot or an earlier PIT snapshot. Once your applications are running on AWS, you can choose to keep them there or initiate data replication back to your primary site when the issue is resolved. You can fail back to your primary site with Elastic Disaster Recovery tools, such as Failback Client.

For more information, see [What is Elastic Disaster Recovery](https://docs.aws.amazon.com/drs/latest/userguide/what-is-drs.html)?

**Topics**
+ [Scenarios](#scenarios-overview)
+ [References](#references)
+ [Service-level agreements and SAP licenses](slas-licenses.md)
+ [Network, storage, and compute](key-considerations.md)
+ [Disaster recovery scenarios](scenarios.md)
+ [Shared storage resiliency](file-systems-storage.md)
+ [Implementing disaster recovery on AWS cloud for SAP workloads](implementation.md)

## Scenarios
<a name="scenarios-overview"></a>

The following disaster recovery scenarios are covered in this document.
+ in-region – source workload is running on AWS cloud and disaster recovery implementation uses a second Availability Zone in the same AWS Region.
+ cross-region – source workload is running on AWS cloud and disaster recovery implementation uses a different AWS Region. The choice of another Region can be for compliance reasons.
+ outside of AWS – source workload is running outside of AWS (on-premises, public or private cloud) and disaster recovery is implemented with AWS.

## References
<a name="references"></a>

This document does not provide detailed steps for setting up and using AWS Elastic Disaster Recovery. For more information, see [What is DRSlong;?](https://docs.aws.amazon.com/drs/latest/userguide/what-is-drs.html) in the AWS Elastic Disaster Recovery User Guide.

It is important to understand the key business requirements that guide a disaster recovery solution design and implementation, including recovery point objectives, recovery time objectives, along with the disaster recovery plan and disaster recovery drill. Check the following resources for concepts related to a disaster recovery implementation on AWS.
+  [AWS Elastic Disaster Recovery Core concepts](https://aws.amazon.com/disaster-recovery/faqs/#Core_concepts) 
+  [AWS Well-Architected Framework : Best Practice 10.1](https://docs.aws.amazon.com/wellarchitected/latest/sap-lens/best-practice-10-1.html) 
+  [Architecture guidance for availability and reliability of SAP on AWS](https://docs.aws.amazon.com/sap/latest/general/architecture-guidance-of-sap-on-aws.html) 

If you are new to AWS, see the following documents.
+  [Getting started with AWS](https://aws.amazon.com/getting-started) 
+  [What is Amazon EC2?](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/concepts.html) 
+  [What is Amazon VPC?](https://docs.aws.amazon.com/vpc/latest/userguide/what-is-amazon-vpc.html) 
+  [Amazon Elastic Block Store (Amazon EBS)](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AmazonEBS.html) 

 *To use this information provided here effectively, you must have previous experience installing, migrating, and operating SAP environments and systems on AWS, along with high availability and disaster recovery solution implementation.* 

# Service-level agreements and SAP licenses
<a name="slas-licenses"></a>

For a disaster recovery implementation, Service Level Agreements (SLA) are used to define how resilient your system is at avoiding loss of data and reducing downtime when your workload becomes unavailable due to a disaster event.

An SAP system disaster recovery approach requires replication of the application tier, database tier, and any file shares, such as NFS mounts. The following are some of the factors to consider for your disaster recovery implementation.

**Topics**
+ [Recovery time objective (RTO)](#recovery-time)
+ [Recovery point objective (RPO)](#recovery-point)
+ [Recovery consistency objective (RCO)](#recovery-consistency)
+ [SAP licenses](#sap-licenses)

## Recovery time objective (RTO)
<a name="recovery-time"></a>

Recovery Time Objective (RTO) refers to how quickly can your application recover after an outage. In the event of a disaster, Elastic Disaster Recovery enables you to launch your replicated servers to a fully provisioned state at the target Region within minutes and continue operations. This automated approach supports a low RTO. It can be faster and more effective than a manual approach.

 *Consideration* 

As RTO is usually evaluated in the impact to business processes, other factors such as Domain Name System (DNS) propagation, environmental factors, including your disaster recovery team’s reaction time, your target environment’s storage architecture, operating system boot, and application startup times, influence this target value.

## Recovery point objective (RPO)
<a name="recovery-point"></a>

Elastic Disaster Recovery continuously replicates the changes to the disk at the block level asynchronously, to the target site. The RPO of Elastic Disaster Recovery is typically in the sub-second range. RPO can be influenced by external factors such as, the time taken by the source system to send changes to the staging area. This is further impacted by the volume of transactions on the source system. Other factors include network throughput and latency, source and replication server performance, etc. These factors should be measured to calculate the potential amount of data loss during a disaster recovery event.

 *Consideration* 

SAP workloads may from time to time may observe longer amounts of data loss than what is seen with a sub-second RPO due to how Elastic Disaster Recovery manages certain scenarios.

In the event of hard reboots, disk changes, and crashes, Elastic Disaster Recovery triggers a rescan of the disks. During the rescan, the Replication Agent does not replicate the changes of the source server to the target. This creates a lag between the two servers. If the primary system fails during this time, customers may experience a longer amount of data loss (measured in RPO) than expected.

The rescan time depends on multiple factors, and cannot be predicted without testing. A rescan may occur after a reboot of the source server. The rescan time will vary depending on the size of the source disks. The time depends on the performance of the disks (linear read), staging area disk performance, and the rate of write operations on the source server (which are sent in parallel with the rescan). The rescan is functioning normally as long as its moving forward, and is not "stuck".

SAP databases can have large disk sizes and high change rates. We recommend conducting tests to ensure that your SLA requirements are met in such events. Additionally, you must ensure that the primary and target databases are in sync during peak activity cycles.

## Recovery consistency objective (RCO)
<a name="recovery-consistency"></a>

Many disaster recovery solutions consider only RTO and RPO as SLAs for resiliency. You must also consider Recovery Consistency Objective (RCO) for your SAP workloads. RCO is a measurement for the consistency of distributed business data within interlinked systems. In a typical customer environment, SAP systems are tightly integrated and data is frequently exchanged between these systems, like SAP ECC or SAP S/4HANA, SAP BW or SAP BW/4HANA, SAP CRM, SAP SRM, SAP GTS etc. This group of tightly integrated systems is called a *system group*. In case of disaster recovery failover, you may have zero RCO requirement within the system group. This means that in case of disaster recovery failover, all of the databases within the SAP system group must be recovered to the same point-in-time.

 *Consideration* 

Elastic Disaster Recovery does not guarantee consistency across multiple source instances. If you have zero RCO requirement, you can use database native replication technology with point-in-time recovery or backtrack with secondary time travel.

For more information, see [SAP Note 434647 - Point-in-time recovery in an SAP system group](https://me.sap.com/notes/434647) (requires SAP portal access).

## SAP licenses
<a name="sap-licenses"></a>

The SAP system is secured by a license using a hardware key. On AWS, the hardware key is based on your Amazon EC2 instance ID. Your Amazon EC2 instance must be launched before you can generate your SAP license. When you recover your SAP system in the disaster recovery site, the SAP license becomes invalid as the disaster recovery site is a new Amazon EC2 instance. The hardware key will no longer match. A temporary SAP license is created when the recovery instance is launched, and it is valid for 28 days. You do not need to create a new SAP license. If you need the disaster recovery instance to continue running after 28 days, you can request a new SAP license with the recovery Amazon EC2 instance ID.

# Network, storage, and compute
<a name="key-considerations"></a>

This section provides information about configuring network, storage, and compute for staging and target environments to achieve disaster recovery goals for your SAP workloads on AWS with Elastic Disaster Recovery.

**Topics**
+ [Network](#key-considerations-network)
+ [Storage](#key-considerations-storage)
+ [Compute](#key-considerations-compute)

## Network
<a name="key-considerations-network"></a>

Your network architecture and configuration used for disaster recovery can play a significant role in supporting an effective RTO and RPO SLA. You must consider network design and redirecting traffic to recovery instance when disaster recovery is triggered.

The following are the four steps to design network for disaster recovery.
+  [Connecting the source and target network](#step-connection) 
+  [Defining the staging and recovery subnets](#step-subnets) 
+  [Configuring the network security settings](#step-security) 
+  [SAP end user and integration traffic](#step-user) 

### Connecting the source and target network
<a name="step-connection"></a>

The first step is to choose and configure the network connection method from the source network to the replication servers. You can choose between private or public. For more information, see [Data routing and throttling](https://docs.aws.amazon.com/drs/latest/userguide/data-routing.html).

Regardless of the method, transferred data is always encrypted in transit. The default method is public, where data is routed over the internet to a public network interface on the replication servers. In the private method, the data is replicated over a private network. A private network selection depends on the disaster recovery scenario in use.
+  [AWS In-Region disaster recovery](scenarios.md#same-region) – Private networks are generally between VPCs, using either Amazon VPC peering or AWS Transit Gateway for connectivity. We recommend using a different AWS account, and separate Amazon VPC for disaster recovery. For more information, see [What is Amazon VPC peering?](https://docs.aws.amazon.com/vpc/latest/peering/what-is-vpc-peering.html) and [What is a transit gateway?](https://docs.aws.amazon.com/vpc/latest/tgw/what-is-transit-gateway.html).
+  [AWS Cross-Region disaster recovery](scenarios.md#different-regions) – We recommend using the fully redundant AWS network backbone that connects different AWS Regions together. Amazon VPC peering and AWS Transit Gateway enable connectivity between Regions. For more details, see [Introduction to Network Transformation on AWS](https://aws.amazon.com/blogs/networking-and-content-delivery/introduction-to-network-transformation-on-aws-part-1/).
+  [Outside of AWS to AWS disaster recovery](scenarios.md#other-aws) – In this scenario, your physical network between your source network and AWS are provided through various telecommunications or internet services providers (ISP). The following solutions are available on AWS.
  +  [AWS Direct Connect](https://aws.amazon.com/directconnect/) 
  +  [AWS Site-to-Site VPN](https://aws.amazon.com/vpn/site-to-site-vpn/) 
  + SD-WAN available on [AWS marketplace](https://aws.amazon.com/marketplace) 

   AWS Direct Connect is commonly used by SAP on AWS customers. It provides more predictable performance against service level agreement (SLA) based targets such as throughput, jitter, and latency, versus VPN or SD-WAN based solutions. You can work with [AWS Direct Connect Delivery Partners](https://aws.amazon.com/directconnect/partners/) for guidance on which options are the best fit for your environment.

### Defining the staging and recovery subnets
<a name="step-subnets"></a>

One subnet is recommended to host the replication servers, called the *staging area subnet*. Additional subnets, called the recovery subnets, are necessary as the target of your disaster recovery action. For scenarios where the source network is on AWS, consider how your subnets should be allocated based on your selected AWS account strategy and landing zone. Often this may mean that the staging area subnets should be in a different Amazon VPC than your source servers. For a simplified environment, this may just use different subnets in the same Amazon VPC. This would mean reduced isolation between your production and non-production disaster recovery environments. For more information, see [AWS Well-Architected Framework : Best Practice 5.3](https://docs.aws.amazon.com/wellarchitected/latest/sap-lens/best-practice-5-3.html).

Ultimately, the number and design of these subnets should follow similar concepts as your source environment. For more information, see [Network diagrams](https://docs.aws.amazon.com/drs/latest/userguide/Network-diagrams.html).

For [AWS In-Region disaster recovery](scenarios.md#same-region) scenario, we recommend hosting the staging area subnet in a different Availability Zone than the recovery subnets. This design enables an additional redundancy for disaster recovery. The launched recovery instances are protected by a staging area in a separate Availability Zone. This follows the design principle of using multiple Availability Zones to maintain resiliency.

### Configuring the network security settings
<a name="step-security"></a>

Ensure that the required network security settings are configured. This includes enabling access through a number of ports in your on-premises firewall, network security devices, security groups, or network access control lists (network ACL), and possibly other tasks depending on the location of your source environment. For more information, see [Replication network requirements](https://docs.aws.amazon.com/drs/latest/userguide/preparing-environments.html).

### SAP end user and integration traffic
<a name="step-user"></a>

The following are some of the factors that affect how the end user and integration related network traffic can affect your RTO and RPO.
+ DNS propagation time for clients to identify and resolve to new IP
+ Delays in network components (if any used) to reroute traffic, such as global or local load balancers, including AWS Application Load Balancers, AWS Global Accelerator, or Amazon Route 53 Public Data Plane

For more information, see [Disaster recovery options in the cloud](https://docs.aws.amazon.com/whitepapers/latest/disaster-recovery-workloads-on-aws/disaster-recovery-options-in-the-cloud.html).

## Storage
<a name="key-considerations-storage"></a>

 AWS Elastic Disaster Recovery is designed to evaluate and define the optimal Amazon EBS volume settings for your staging environment based on the source server performance. A default performance setting is used for drill and recovery servers. These volumes are sized to match the capacity needs of the source systems. You must review these settings with the specific requirements of your SAP workloads. This ensures an efficient and disaster recovery SLA compliant environment. These different server types have different requirements, and methods of managing storage.

**Topics**
+ [Replication servers](#servers-storage)
+ [Drill and recovery instances](#instances-storage)
+ [Point in time recovery](#recovery-storage)

### Replication servers
<a name="servers-storage"></a>

The staging area requires storage to support ongoing replication from source machines. These Amazon EBS volumes are usually low-cost, hard disk drive (HDD) type storage volumes. However, if the replicated disk write throughput is high, the default Replication server settings dynamically change to a higher performance, solid state drive (SSD) storage type. The default Amazon EBS volume type setting – **Auto volume type selection** for replication servers, is the recommended setting for SAP workloads. It automatically chooses the high-performing, cost-efficient Amazon EBS volumes for your workload requirements.

You have the option to increase the performance of the staging area by selecting solid state drives (SSD). This can help SAP workloads, such as bursty or consistently high transaction rate databases which have a high rate of create, update, and/or delete operations that must be applied to its storage. For such workloads, we recommend monitoring Amazon CloudWatch metrics and check for any persistent or increasing delays. You can use the following CloudWatch metrics for Elastic Disaster Recovery.
+  **LagDuration** – the age of the latest consistent snapshot, in seconds
+  **Backlog** – the amount of data yet to be synced, in bytes

If Amazon EBS metrics on the replication server also indicate performance issues, you can change Amazon EBS volume type. See the following resources to learn more.
+  [Amazon EBS volume performance on Linux instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSPerformance.html) 
+  [Volumes](https://docs.aws.amazon.com/drs/latest/userguide/volumes-drs.html) 
+  [Disk settings](https://docs.aws.amazon.com/drs/latest/userguide/disk-settings.html) 

### Drill and recovery instances
<a name="instances-storage"></a>

SAP workloads require at least the `gp3` volume type for 90% or more of the use cases, including SAP applications and databases (SAP HANA and any other). If you have a higher per-volume IOPS requirement of more than 16,000 IOPS, or per-volume throughput requirement greater than 1,000 MiB/s, consider `io2` or `io2 Block Express` volumes.

When you launch drill or recovery instances, Elastic Disaster Recovery creates Amazon EBS storage volumes based on the types defined in the launch template. For more information, see [Amazon EC2 launch template](https://docs.aws.amazon.com/drs/latest/userguide/ec2-launch.html). The launch template is automatically generated by Elastic Disaster Recovery, with default values for storage performance, using general purpose SSD (volumes sized to match the source system capacity requirements). Review the launch template to confirm that your workload’s storage requirements are being met by the default allocations of the launch template.

You can modify the launch template for a different volume type or performance setting. Before modifying, confirm that your target Amazon EC2 instance type supports higher storage. For more details, see [Supported instance types](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-optimized.html#ebs-optimization-support). For SAP HANA databases, see [Storage configuration](https://docs.aws.amazon.com/sap/latest/sap-hana/hana-ops-storage-config.html). Define the modified version as the default launch template for your server once your changes are applied to the template. We do not recommend adding or removing Amazon EBS volumes in the template when using it with Elastic Disaster Recovery.

For servers that require loading larger amounts of data before they become active, such as database servers, you can configure higher performance settings and types of storage in the launch template. For example, if your server is configured with `gp3` storage, then defining more provisioned throughput and IOPS for your storage, and/or using a higher performance scaling storage such as `io2 Block Express` (with a supported Amazon EC2 instance type), can reduce the time it takes for your drill or recovery instance to handle the expected workload quantity. Once your drill or recovery instance is fully online, you can change revert your storage settings. For more information, see [Amazon EBS Elastic Volumes](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-modify-volume.html). You can increase the volume size, change the volume type, or adjust the performance of your Amazon EBS volumes, without detaching the volume or restarting the instance.

### Point in time recovery
<a name="recovery-storage"></a>

 AWS Elastic Disaster Recovery uses Amazon EBS snapshots to give Point in Time (PiT) recovery options that can be used during a drill or recovery. Amazon EBS snapshots of the staging are volumes are continuously taken to provide recovery points of latest (sub-second RPO), 10-minute increments for the first hour, in one hour increments for 24 hours. A daily PiT is retained for the amount of days specified in your Point in Time (PiT) policy. You can specify between 1 to 365 days, with 7 days being the default. For more information, see [Understanding Point In Time states](https://docs.aws.amazon.com/drs/latest/userguide/failback-overview.html#point-in-time-faq).

## Compute
<a name="key-considerations-compute"></a>

You must choose an Amazon EC2 instance type for both the replication server and the recovery server.

**Topics**
+ [Replication servers](#servers-compute)
+ [Drill and recovery instances](#instances-compute)
+ [Source server](#source-servers-compute)

### Replication servers
<a name="servers-compute"></a>

The replication server is normally smaller than the source system. `t3.small` is the default instance type, and it can replicate up to 15 volumes. You can use a shared replication server between SAP application servers, or other servers with low change rates.

If you have a workload that is bursty or has consistently high transaction rate databases, with a high rate of create, update, and/or delete operations that must be applied to its storage, you may require different configurations for the staging area. If you see lag in the replication for your workload, change the default replication server to a different instance family. For example, General Purpose Amazon EC2 instance family or use a dedicated replication server. This change can impact cost. For more information, see [Replication server configuration](https://docs.aws.amazon.com/drs/latest/userguide/replication-server-settings.html).

### Drill and recovery instances
<a name="instances-compute"></a>

For recovery instances, configure the Amazon EC2 launch template settings to match AWS target instances with source. See the following resources for a list of SAP certified instances.
+  [SAP NetWeaver certified instances](https://docs.aws.amazon.com/sap/latest/general/sap-netweaver-aws-ec2.html) 
+  [SAP HANA certified instances](https://docs.aws.amazon.com/sap/latest/general/sap-hana-aws-ec2.html) 

The following are some of the compute-related factors impacting the RTO of your disaster recovery solutions.
+ Server startup time
+ SAP running on Microsoft Windows Server operating system
+ Large SAP HANA database that takes more than 10 minutes to start up
+ SAP application(s) installed on the server, and their startup times
+ Mismatch in the source and target server and storage configurations – configuring a lesser compute power or storage performance at the target side increases the RTO

You must consider application startup times as a factor in recovery. We recommend choosing an Amazon EC2 instance type and storage configuration that provides an effective startup time. This helps your optimize the RTO for your disaster recovery solutions. Also, performing a disaster recovery test or drill enables you to measure the RTO based on your operating system and database.

SAP systems can run on a variety of operating systems, infrastructure platforms, and processor instruction sets. If your source servers is on-premises or with another cloud provider, it must be compatible with Amazon EC2 and Elastic Disaster Recovery. The source server must have a 64-bit based operating system built for the x86 system architecture. Various x86 based CPUs are available on AWS, being used on source servers, especially if the servers are old models. Using an SAP sizing-based approach to map the source system to an Amazon EC2 instance type is recommended. To learn more, see SAP’s [Sizing](https://www.sap.com/about/benchmark.html) information.

### Source server
<a name="source-servers-compute"></a>

While the system requirements for the Replication Agent are relatively low, consider the constraints on the source server for CPU, memory, network, storage, and other resources that can impact the performance of your disaster recovery solution. Size the source server based on these factors. For more information, see [Source server requirements](https://docs.aws.amazon.com/drs/latest/userguide/installation-requiremets.html#general-requirements2).

# Disaster recovery scenarios
<a name="scenarios"></a>

The following are the three disaster recovery scenarios.

**Topics**
+ [AWS In-Region disaster recovery](#same-region)
+ [AWS Cross-Region disaster recovery](#different-regions)
+ [Outside of AWS to AWS disaster recovery](#other-aws)

## AWS In-Region disaster recovery
<a name="same-region"></a>

In AWS cloud, Availability Zones are separated by a meaningful distance, within 100 km (60 miles away from each other). This distance provides isolation from the most common disasters that could affect data centers, for instance, floods, fire, severe storms, earthquakes, etc. It is used by many AWS customers today to support their resiliency requirements for SAP workloads. Based on your business continuity requirements, in-Region disaster recovery maybe suitable for you. For more information, see [Single Region architecture patterns](https://docs.aws.amazon.com/sap/latest/general/arch-guide-single-region-architecture-patterns.html).

 *Best practices* 
+ Separate Amazon VPCs for the source and recovery areas in the same Region
+ Separate AWS accounts for source and recovery areas
+  AWS Transit Gateway or Amazon VPC peering for supporting replication traffic and end user connectivity

  For more information, see [Network](key-considerations.md#key-considerations-network).
+ Multiple Availability Zones resiliency of Amazon S3 buckets and Amazon EFS for data protection
+ Separate Availability Zones for source, staging, and recovery areas

The following two sections cover the reference architectures for this scenario.

 **Full in-Region disaster recovery implementation** 

![\[Implementation for full in-Region disaster recovery\]](http://docs.aws.amazon.com/sap/latest/general/images/in-region-full.png)


In full in-Region disaster recovery implementation, the source servers running SAP application components, such as central services instance ((A)SCS), primary application server (PAS), additional application server (AAS), and the database, are replicated using Elastic Disaster Recovery.

 **Hybrid in-Region disaster recovery implementation** 

![\[Implementation for hybrid in-Region disaster recovery\]](http://docs.aws.amazon.com/sap/latest/general/images/in-region-hybrid.png)


In hybrid in-Region disaster recovery implementation, the source servers running SAP application components, such as central services instance [(A)SCS], primary application server (PAS), and additional application server (AAS) are replicated using Elastic Disaster Recovery. The database is replicated using a database native replication method.

## AWS Cross-Region disaster recovery
<a name="different-regions"></a>

A disaster recovery scenario with multiple AWS Regions enables business continuity with your data storage in two separate geographical locations. For more information, see [Multi-Region architecture patterns](https://docs.aws.amazon.com/sap/latest/general/arch-guide-multi-region-architecture-patterns.html).

 *Best practices* 
+ Separate Amazon VPCs for the source and recovery areas different Regions
+ A shared AWS account for source and recovery areas
+  AWS Transit Gateway or Amazon VPC peering for supporting replication traffic and end user connectivity

  For more information, see [Network](key-considerations.md#key-considerations-network).
+ Replication via Amazon EFS or other file systems to protect shared storage between Regions

  For more information, see [AWS Cross-Region disaster recovery](file-systems-storage.md#file-systems-different-regions).
**Note**  
You can only replicate in the same AWS account with Amazon EFS.
+ Amazon S3 cross-Region replication to provide a copy of your database backups and other Amazon S3 bucket data to disaster recovery Amazon VPC
+ Separate subnets for the source, staging, and recovery areas

The following two sections cover the reference architectures for this scenario.

 **Full cross-Region disaster recovery implementation** 

![\[Implementation for on-premises disaster recovery\]](http://docs.aws.amazon.com/sap/latest/general/images/cross-region-full.png)


In full cross-Region disaster recovery implementation, the source servers running SAP application components, such as central services instance ((A)SCS), primary application server (PAS), additional application server (AAS), and the database, are replicated using Elastic Disaster Recovery.

 **Hybrid cross-Region disaster recovery implementation** 

![\[Implementation for hybrid on-premises disaster recovery\]](http://docs.aws.amazon.com/sap/latest/general/images/cross-region-hybrid.png)


In hybrid cross-Region disaster recovery implementation, the source servers running SAP application components, such as central services instance [(A)SCS], primary application server (PAS), and additional application server (AAS) are replicated using Elastic Disaster Recovery. The database is replicated using a database native replication method.

## Outside of AWS to AWS disaster recovery
<a name="other-aws"></a>

In this scenario, the source systems are running in a non-AWS environment. A hybrid disaster recovery solution like this is can be implemented to quickly add resiliency to your existing production environments on other platforms.

 *Best practices* 
+  AWS Direct Connect for supporting replication traffic and end user connectivity

  For more information, see [Network](key-considerations.md#key-considerations-network).
+  AWS DataSync to protect shared storage

  For more information, see [Outside of AWS to AWS disaster recovery](file-systems-storage.md#file-systems-other-aws).
+ Separate subnets for the staging and recovery areas

The following two sections cover the reference architectures for this scenario.

 **Full non-AWS to AWS disaster recovery implementation** 

![\[Implementation for on-premises disaster recovery\]](http://docs.aws.amazon.com/sap/latest/general/images/different-region-full.png)


In full non-AWS to AWS disaster recovery implementation, the source servers running SAP application components, such as central services instance ((A)SCS), primary application server (PAS), additional application server (AAS), and the database, are replicated using Elastic Disaster Recovery.

 **Hybrid non-AWS to AWS disaster recovery implementation** 

![\[Implementation for hybrid on-premises disaster recovery\]](http://docs.aws.amazon.com/sap/latest/general/images/different-region-hybrid.png)


In hybrid non-AWS to AWS disaster recovery implementation, the source servers running SAP application components, such as central services instance [(A)SCS], primary application server (PAS), and additional application server (AAS) are replicated using Elastic Disaster Recovery. The database is replicated using a database native replication method.

For more disaster recovery options and information, you can reach out to [AWS Support](https://aws.amazon.com/premiumsupport/).

# Shared storage resiliency
<a name="file-systems-storage"></a>

File systems on an SAP server can be created on block type storage, for instance, on locally attached disks or Enterprise Storage Area Network (SAN) devices, or may be based on shared file systems such as SMB or NFS shared volumes from servers or Network Attached Storage (NAS) devices.

As Elastic Disaster Recovery is a block level replication service, it only replicates the disks if they are represented as block storage devices. Other tools and processes must be used to provide resiliency for shared file systems. To address these requirements, we recommend using the fully managed shared storage services of AWS that are easy and cost-effective to launch, run, and scale feature-rich, high performance, and resilient file systems in the cloud. The choice of your file system depends on the operating system of your disaster recovery scenario.
+ Linux – [Amazon Elastic File System](https://aws.amazon.com/efs/) (Amazon EFS)
+ Microsoft Windows Server – [Amazon FSx for Windows File Server](https://aws.amazon.com/fsx/windows/) (Amazon FSx)
+ Mixed – Amazon FSx for Windows File Server or [Amazon FSx for NetApp ONTAP](https://aws.amazon.com/fsx/netapp-ontap/) (FSx for ONTAP)

The following sections provide guidance on file systems based on your disaster recovery scenario.

**Topics**
+ [AWS In-Region disaster recovery](#file-systems-same-region)
+ [AWS Cross-Region disaster recovery](#file-systems-different-regions)
+ [Outside of AWS to AWS disaster recovery](#file-systems-other-aws)

## AWS In-Region disaster recovery
<a name="file-systems-same-region"></a>

When using managed services such as Amazon EFS, FSx for ONTAP or FSx for Windows File Server to host your shared file systems, the built-in resiliency offered through their multi-Availability Zone design means that your shared storage is already disaster recovery ready. For further resiliency, ensure that your shared storage is backed up regularly, to protect against potential data corruption.

If you are sharing a file system using NFS or SMB protocols directly from one of your Amazon EC2 instances, you may not need additional steps if it is on Amazon EBS and attached to a server with the Replication Agent. This ensures replication via Elastic Disaster Recovery. If the shared file system is hosted on another Amazon EC2 instance along with other content that is not part of your SAP workload, use OS native tools like `rsync` to manage the replication of this file system to the recovery area.

You can also use AWS DataSync to provide selective replication. It can be scheduled to run once an hour at minimum, and replicate these files to the target storage in the recovery area. You must have an additional agent installed on an Amazon EC2 instance that has access to the file system. For more information, see [How AWS DataSync works](https://docs.aws.amazon.com/datasync/latest/userguide/create-task.html#configure-scheduling-queueing).

## AWS Cross-Region disaster recovery
<a name="file-systems-different-regions"></a>

To support cross-Region disaster recovery, another shared file system must be available in the second Region. The data from the primary shared file system must be replicated on the shared file system in the second Region. Your implementation will differ based on your choice of AWS service.
+ Amazon Elastic File System – Amazon EFS native replication can support cross-Region replication within a single AWS account.
+ Amazon FSx for Windows File Server – You can also use AWS DataSync to replicate data between your primary to secondary shared storage. For more information, see [How AWS DataSync works](https://docs.aws.amazon.com/datasync/latest/userguide/create-task.html#configure-scheduling-queueing).
+ Amazon FSx for NetApp ONTAP – You can use NetApp SnapMirror to copy your files between FSx for ONTAP file systems on your source and target instances, as frequently as every 5 minutes, to maintain a current copy of your shared file systems. For more information, see [Scheduled replication using NetApp SnapMirror](https://docs.aws.amazon.com/fsx/latest/ONTAPGuide/scheduled-replication.html).

## Outside of AWS to AWS disaster recovery
<a name="file-systems-other-aws"></a>

Depending on your source area design for shared storage, you must consider replicating these files on your disaster recovery instance in AWS. We recommend using [AWS DataSync](https://aws.amazon.com/datasync/). It can copy data to and from services, such as NFS and SMB shares, along with file systems using Amazon EFS, FSx for Windows File Server, and FSx for ONTAP.

In certain scenarios, you can consider using other options to protect your source area SAP shared file systems, such as if the following is being used on your source environment.
+ FSx for ONTAP – You can use NetApp SnapMirror to copy your files between FSx for ONTAP file systems on your source and target instances, as frequently as every 5 minutes, to maintain a current copy of your shared file systems. For more information, see [Scheduled replication using NetApp SnapMirror](https://docs.aws.amazon.com/fsx/latest/ONTAPGuide/scheduled-replication.html).
+ Local storage – Elastic Disaster Recovery will replicate it to your disaster recovery environment on AWS, if Replication Agent can be configured on the source server hosting the local storage.

# Implementing disaster recovery on AWS cloud for SAP workloads
<a name="implementation"></a>

Using Elastic Disaster Recovery to implement a disaster recovery solution for SAP workloads on AWS follows different considerations for different parts of a typical SAP workload, such as S/4HANA deployment. The following sections provide guidance on the differences in how to design, implement, and manage Elastic Disaster Recovery when used for the application and the database layers.

**Topics**
+ [SAP application layer](#application-layer)
+ [SAP database layer](#database-layer)

## SAP application layer
<a name="application-layer"></a>

We recommend using AWS Elastic Disaster Recovery to protect your SAP application servers, such as SAP ASCS/SCS, PAS, AAS etc. Elastic Disaster Recovery supports the SAP application layer based on SAP NetWeaver, ABAP foundation, and stand-alone applications like TREX, content servers, and so on. You can use Elastic Disaster Recovery for Amazon EBS backed storages, such as SAP instance binaries, local files stored on an Amazon EBS volume.

The application layer also contains shared file systems, such as SAP mount, transport, and interface directories. These file systems usually need to be managed separately. For more information, see [Shared storage resiliency](file-systems-storage.md).

To setup, install Elastic Disaster Recovery agent on the application servers. Create an IAM user with required permissions. Provide Elastic Disaster Recovery agent with the user information to establish a connection with Elastic Disaster Recovery APIs. Once the agent is configured, it engages in an authentication handshake with the TLS 1.3-encrypted Elastic Disaster Recovery API endpoint. The service produces identically sized Amazon EBS volumes in the staging area subnet, for each source volume that is duplicated, for data synchronization. The type of Amazon EBS volumes can be configured in the replication server settings. Replication starts after the staging area subnet resources are generated and the agent is installed. The data is transported with encryption from the source servers to the replication server directly. The service automatically manages the subnet resources for the staging area, scaling them up or down based on the concurrent replication of the source servers and disks.

## SAP database layer
<a name="database-layer"></a>

 AWS Elastic Disaster Recovery is fully supported as disaster recovery solution for SAP applications running on any database, and for SAP applications running on SAP HANA database in scale-up configuration. It is not supported for replication of multi-node SAP databases, such as SAP HANA scale-out cluster.

The data in an SAP system is stored in a database. This data includes master data, transactional data, and ABAP artifacts. You must consider your business RPO and RTO requirements when evaluation Elastic Disaster Recovery for a disaster recovery solution. The service is not application aware but works at the operating system layer by replicating the attached storage to the target staging environment. Based on your RTO and RPO requirements, you can select Elastic Disaster Recovery or database native replication methods, such as SAP HANA System Replication (HSR) for SAP HANA.

The following are the important considerations to choose your database replication method.

**Topics**
+ [Network bandwidth](#network-bandwidth)
+ [RPO](#rpo)
+ [Change rate](#change-rate)
+ [RTO](#rto)
+ [Cost](#cost)
+ [RCO](#rco)
+ [Storage limits](#storage-limits)

### Network bandwidth
<a name="network-bandwidth"></a>

 AWS Elastic Disaster Recovery works at the operating system layer, with block level replication of attached storage devices. Depending on the change rate at the source, you may need higher network bandwidth to stay current with replication. Database aware technologies such as SAP HSR require lesser network bandwidth, enabling faster replications for systems with high rate of change.

### RPO
<a name="rpo"></a>

Elastic Disaster Recovery supports sub-second RPO. For SAP workloads, ensure that your network can support peaks in change rate. If your RPO is very small, we recommend testing database native replication methods along with Elastic Disaster Recovery.

Actions that lead to significant changes to the data of your database cause delays data replication on staging area. It can include a partial or full recovery of a backup to protected volumes for a database on source server. The changes made to your storage volumes are much higher than your usual change rates on source server. Data restored from backup to protected volumes on the source server is treated as changed blocks and is replicated by Elastic Disaster Recovery. The replication servers need additional time to receive and write this larger amount of changed data from the source system. This can impact your business RPO.

It is recommended to manage actions, such as recovery from backups, at less critical workload times. This way, longer RPO values won’t impact your workload. You can track the amount of changed data still waiting to be replicated with Elastic Disaster Recovery. For more information, see [Recovery dashboard](https://docs.aws.amazon.com/drs/latest/userguide/recovery-dashboard.html).

### Change rate
<a name="change-rate"></a>

For databases with high change rates, you can meet the performance requirements with sufficiently performing networks, along with the storage and compute configuration of the replication server. If these changes are insufficient to meet the business performance requirements, you can choose database native replication methods to optimize your RPO.

### RTO
<a name="rto"></a>

With Elastic Disaster Recovery, the target disaster recovery environment is provisioned once the disaster recovery event is triggered. The total time depends on the size of your database, and the chosen Point-in-Time (PiT). You must test your disaster recovery scenario before implementation on production environments.

### Cost
<a name="cost"></a>

As Elastic Disaster Recovery is not using a warm or hot standby approach, the compute costs are minimized for your disaster recovery environment as compared to many other disaster recovery options. For more information, see [AWS Elastic Disaster Recovery pricing](https://aws.amazon.com/disaster-recovery/pricing/). With database native replication methods, costs can increase with the compute resources in the disaster recovery area.

### RCO
<a name="rco"></a>

If you have multiple tightly coupled systems, then you need to use database native replication methods.

### Storage limits
<a name="storage-limits"></a>

In most cases, the available Amazon EBS volume types are sufficient to address any storage capacity and performance needs. Depending on the source environment architecture, in some cases, the storage volume on the recovery instance exceeds the capacity and/or performance limits of individual Amazon EBS volumes. This can happen in a non-AWS to AWS disaster recovery implementation with `data` and `log` volumes attached to high workload database servers. For more information, see [Amazon EBS volume types](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volume-types.html).

When migrating servers to AWS, such storage volumes must be refactored to a new storage architecture, for instance, creating striped volume sets. Striped volume sets are defined and maintained using logical volume manager tools in your recovery instance’s operating system. For more information, see [RAID configuration on Linux](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/raid-config.html). These volume sets will span two or more Amazon EBS volumes, up to the total needed to meet the required volume size and performance. The storage volume data is then copied to the new striped volume set. While it may be possible to automate this process through Elastic Disaster Recovery post-launch scripts or alarm events which trigger code through Amazon EventBridge event rules, the additional steps can cause longer recovery time.

In these cases, implementing a hybrid disaster recovery solution is suitable. Most of the servers are managed by Elastic Disaster Recovery and select servers (with storage performance considerations) use alternative disaster recovery approaches, such as native database replication technologies. The storage architecture refactoring is done when the standby replication server is set up during the initial disaster recovery environment implementation. As the replication now happens at an application level, the disaster recovery server is able to write to a storage architecture that is different from what is on the source server.