# SAP NetWeaver on AWS: high availability configuration for Netweaver (ASCS)
NetWeaver High availability (ASCS)

This topic applies to SAP with high availability and update services operating system for SAP NetWeaver applications on AWS cloud. It covers the instructions for configuration of a pacemaker cluster for the ABAP SAP Central Service (ASCS) and the Enqueue Replication Server (ERS) when deployed on Amazon EC2 instances in two different Availability Zones within an AWS Region.

**Topics**
+ [

# SAP NetWeaver on AWS: high availability configuration for SUSE Linux Enterprise Server (SLES) for SAP applications
](sap-nw-on-aws-sles-pacemaker.md)
+ [

# SAP NetWeaver on AWS: high availability configuration for Red Hat Enterprise Linux (RHEL) for SAP applications
](sap-nw-on-aws-rhel-pacemaker.md)

# SAP NetWeaver on AWS: high availability configuration for SUSE Linux Enterprise Server (SLES) for SAP applications
SLES Pacemaker

This topic applies to SUSE Linux Enterprise Server (SLES) operating system for SAP NetWeaver applications on AWS cloud. It covers the instructions for configuration of a pacemaker cluster for the ABAP SAP Central Service (ASCS) and the Enqueue Replication Server (ERS) when deployed on Amazon EC2 instances in two different Availability Zones within an AWS Region.

This topic covers instructions for the following configuration options.

**Topics**
+ [

# Planning
](sap-nw-pacemaker-sles-planning.md)
+ [

# Prerequisites
](sap-nw-pacemaker-sles-prerequisites.md)
+ [

# SAP ASCS and Cluster Setup
](sap-nw-pacemaker-sles-setup.md)
+ [

# Operations
](sles-netweaver-ha-operations.md)
+ [

# Testing
](testing-nw-sles.md)

# Planning


This section covers the following topics.

**Topics**
+ [

# Setup Overview
](sap-nw-pacemaker-sles-setup-overview.md)
+ [

# Deployment Guidance
](sap-nw-pacemaker-sles-references.md)
+ [

# Concepts
](sap-nw-pacemaker-sles-concepts.md)
+ [

# Parameter Reference
](sap-nw-pacemaker-sles-parameters.md)
+ [

# Architecture diagrams
](sles-netweaver-ha-diagrams.md)

# Setup Overview


You must meet the following prerequisites before commencing setup.

**Topics**
+ [

## Deployed Cluster Infrastructure
](#cluster-nw-sles)
+ [

## Supported Operating System
](#supported-os-nw-sles)
+ [

## SAP and SUSE references
](#references-nw-sles)
+ [

## Required Access for Setup
](#access-nw-sles)
+ [

## Reliability Requirements Defined
](#reliability-nw-sles)

## Deployed Cluster Infrastructure


Ensure that your AWS networking requirements and Amazon EC2 instances where SAP workloads are installed, are correctly configured for SAP. For more information, see [SAP NetWeaver Environment Setup for Linux on AWS](https://docs.aws.amazon.com/sap/latest/sap-netweaver/std-sap-netweaver-environment-setup.html).

See the following ASCS cluster specific requirements.
+ Two cluster nodes created in private subnets in separate Availability Zones within the same Amazon VPC and AWS Region
+ Access to the route table(s) that are associated with the chosen subnets

  For more information, see [AWS – Overlay IP](sap-nw-pacemaker-sles-concepts.md#overlay-ip-sles).
+ Amazon EC2 instances must have connectivity to the Amazon EC2 endpoint via either internet or an Amazon VPC endpoint.

## Supported Operating System


Protecting the ABAP SAP Central Services (ASCS) with a pacemaker cluster requires packages from SUSE, including targeted cluster resource agents for SAP and AWS that may not be available in standard repositories.

For deploying SAP applications on SUSE, SAP and SUSE recommend using SUSE Linux Enterprise Server for SAP applications (SLES for SAP). SLES for SAP provides additional benefits, including Extended Service Pack Overlap Support (ESPOS), configuration and tuning packages for SAP applications, and High Availability Extensions (HAE). For more details, see SUSE website at [SUSE Linux Enterprise Server for SAP Applications](https://www.suse.com/products/sles-for-sap/).

SLES for SAP is available at [AWS Marketplace](https://aws.amazon.com/marketplace) with an hourly or annual subscription. You can also use the bring your own subscription (BYOS) model.

## SAP and SUSE references


In addition to this guide, see the following references for more details.
+  [SUSE documentation – SAP S/4 HANA - Enqueue Replication 2 High Availability Cluster With Simple Mount](https://documentation.suse.com/sbp/sap-15/html/SAP-S4HA10-setupguide-simplemount-sle15/index.html) 
+  [SUSE documentation – SAP S/4 HANA - Enqueue Replication 2 High Availability Cluster](https://documentation.suse.com/sbp/all/single-html/SAP-S4HA10-setupguide-sle15/#id-1) 
+  [SAP Note: 1656099 - SAP Applications on AWS: Supported DB/OS and Amazon EC2 products](https://me.sap.com/notes/1656099) 
+  [SAP Note: 1984787 - SUSE Linux Enterprise Server 12: Installation Notes](https://me.sap.com/notes/1984787) 
+  [SAP Note: 2578899 - SUSE Linux Enterprise Server 15: Installation Notes](https://me.sap.com/notes/2578899) 
+  [SAP Note: 1275776 - Linux: Preparing SLES for SAP environments](https://me.sap.com/notes/1275776) 

You must have SAP portal access for reading all SAP Notes.

## Required Access for Setup


The following access is required for setting up the cluster.
+ An IAM user with the following privileges.
  + modify Amazon VPC route tables
  + modify Amazon EC2 instance properties
  + create IAM policies and roles
  + create Amazon EFS file systems
+ Root access to the operating system of both cluster nodes
+ SAP administrative user access – `<sid>adm` 

  In case of a new install, this user is created by the install process.

## Reliability Requirements Defined


The SAP Lens of the Well-Architected framework, in particular the Reliability pillar, can be used to understand the reliability requirements for your SAP workload.

The ASCS is a single point of failure in a highly available SAP architecture. The impact of an outage of this component must be evaluated against factors, such as, recovery point objective (RPO), recovery time objective (RTO), cost and operation complexity. For more information, see [Reliability](https://docs.aws.amazon.com/wellarchitected/latest/sap-lens/reliability.html) in SAP Lens - AWS Well-Architected Framework.

# Deployment Guidance


The following section details the documentation and deployment guidance from SUSE.

**Topics**
+ [

## Deployment Patterns
](#deployments-sles)
+ [

## Automated Deployment
](#automation-nw-sles)
+ [

## Pacemaker - simple-mount and classic architecture
](#simple-classic-nw-sles)

## Deployment Patterns


The following table outlines the supported SAP deployment types and their corresponding AWS configuration patterns for high availability clustering.


| SAP Deployment Type | Support Status |  AWS Configuration Patterns | Notes | 
| --- | --- | --- | --- | 
|  SAP NetWeaver ASCS/ERS (ENSA1)  |   AWS Documented & Supported  |  SAPNetweaver-Classic, SAPNetweaver-Simple-mount  |  | 
|  SAP NetWeaver ASCS/ERS (ENSA2)  |   AWS Documented & Supported  |  SAPNetweaver-Classic, SAPNetweaver-Simple-mount  |  | 
|  SAP S/4HANA ASCS/ERS  |   AWS Documented & Supported  |  SAPNetweaver-Classic, SAPNetweaver-Simple-mount  |  S/4HANA only supports ERS2  | 
|  SAP SCS (Java)  |  Vendor Documented & Supported  |  |  Follows SAP Documentation  | 

## Automated Deployment


You can set up a cluster manually using the instructions provided here. You can also automate parts of this process to ensure consistency and repeatability.

Use AWS Launch Wizard for SAP for automated deployments of SAP NetWeaver, SAP S/4 HANA, SAP B/4HANA, and Solution Manager. Launch Wizard uses AWS CloudFormation scripts to quickly provision the resources needed to deploy SAP NetWeaver and S/4 HANA. The automation performs SAP enqueue replication and pacemaker setup so that only validation and testing are required. For more information, see [AWS Launch Wizard for SAP](https://docs.aws.amazon.com/launchwizard/latest/userguide/launch-wizard-sap.html).

To ensure that the behavior and operation of your cluster is well understood regardless of how your system is set up, we recommend a thorough test cycle. See [Testing](https://docs.aws.amazon.com/sap/latest/sap-netweaver/testing.html) for more details.

## Pacemaker - simple-mount and classic architecture


This guide covers two architectures for SAP cluster solutions on SLES for SAP – simple-mount and classic (previous standard). Simple-mount was certified as the SLES for SAP Applications cluster solution in late 2021. It is now the recommended architecture for both ENSA1 and ENSA2 deployments running on SLES for SAP 15 and above. For more details, see SUSE blog [Simple Mount Structure for SAP Application Platform](https://www.suse.com/c/simple-mount-structure-for-sap-application-platform/).

If you are configuring a new SAP installation, we recommend the simple-mount architecture. If you already have the classic architecture, and wish to migrate to the simple-mount architecture, see [Switching architecture to simple-mount](#switching-architecture-sles).

The following are the differences between the classic and simple-mount architectures.
+ Removing file system resources from cluster – a file system is required but it is not mounted and unmounted by the cluster. The executable directory for the ASCS and ERS can be permanently mounted on both nodes.
+ Addition of SAPStartSrv – SAPStartSrv controls the matching SAPStartSrv framework process.
+ Sapping and sappong services – these services manage the start of SAPStartSrv services with sapinit.

See the [Architecture diagrams](https://docs.aws.amazon.com/sap/latest/sap-netweaver/sles-netweaver-ha-diagrams.html) for more details.

### Switching architecture to simple-mount


Follow along these steps if you want to switch an existing cluster with classic architecture to use the recommended configuration of simple-mount architecture.

These steps must be performed in an outage window, allowing stop/start of services and basic testing.

1. Put the cluster in maintenance mode. See [Maintenance mode](planned-maintenance-nw-sles.md#maintenance-mode-nw-sles) 

1. Stop SAP services, including application servers connected to the cluster as well as ASCS and ERS.

1. Install any missing operating system packages. See [Install Missing Operating System Packages](sap-nw-pacemaker-sles-os-settings.md#packages-nw-sles).

   It might be necessary to install `sapstartsrv-resource-agents`. However, all operating system prerequisites must be checked and updated to ensure that versions are compatible.

1. Add entries for ASCS and ERS mount point on both nodes (if not already added). See [Update /etc/fstab](sap-shared-filesystems-nw-sles.md#update-fstab-nw-sles) 

1. Enable `sapping`/`sappong` services. See [Enable sapping and sappong Services (Simple-Mount Only)](sap-ascs-service-control-nw-sles.md#sapping-sappong-services-nw-sles) 

1. Align and disable `systemd` services. See [Ensure ASCS and ERS SAP Services can run on either node (systemd)](sap-ascs-service-control-nw-sles.md#modify-sapservices-nw-sles) 

1. Backup the configuration with the following command.

   ```
   # crm config show >> /tmp/classic_ha_setup.txt
   ```

   See [Prepare for Resource Creation](cluster-config-nw-sles.md#prepare-resource-nw-sles) 

1.  *Optional* – delete the configuration. You can edit in place but we recommend starting with a blank configuration. This ensures that latest timeout and priority parameters are in place. See [Reset Configuration – Optional](cluster-config-nw-sles.md#reset-config-nw-sles) 

   ```
   # crm config erase
   # crm config show
   ```

1. Configure cluster resources again.

1. Check the cluster and perform some tests.

1. Resume standard operations by starting any additional services, including application servers.

# Concepts


This section covers AWS, SAP, and SUSE concepts.

**Topics**
+ [

## SAP – ABAP SAP Central Services (ASCS)
](#ascs-nw-sles)
+ [

## SAP – Enqueue Replication Server (ERS)
](#ers-nw-sles)
+ [

## AWS – Availability Zones
](#availability-zones-nw-sles)
+ [

## AWS – Overlay IP
](#overlay-ip-sles)
+ [

## AWS – Shared VPC
](#shared-vpc)
+ [

## Pacemaker - STONITH fencing agent
](#stonith-nw-sles)

## SAP – ABAP SAP Central Services (ASCS)


The ABAP SAP Central Services (ASCS) is an SAP instance consisting of the following two services. It is considered a single point of failure (SPOF) in a resilient SAP architecture.
+  **Message server** – Responsible for application load distribution (GUI and RFC), communication between application servers, and centralised configuration information for web dispatchers and application servers.
+  **Enqueue server (standalone)** – Maintains a lock table in main memory (shared memory). Unlike a database lock, an enqueue lock can exist across multiple logical units of work (LUW), and is set by a SAP Dialog work process. The lock mechanism prevents two transactions from changing the same data in the database simultaneously.

**Note**  
With ABAP Release 7.53 (ABAP Platform 1809), the new Standalone Enqueue Server 2 (ENSA2) is installed by default. It replaces the previous version (ENSA1) but can be configured for the previous versions. See [SAP Note 2630416 - Support for Standalone Enqueue Server 2](https://me.sap.com/notes/2630416) (SAP portal access required) for more information.  
This document includes modifications to align with the correct ENSA version.

## SAP – Enqueue Replication Server (ERS)


The Enqueue Replication Server (ERS) is an SAP instance containing a replica of the lock table (replication table).

In a resilient setup, if the standalone enqueue server (EN/ENQ) fails, it can be restarted either by restart parameters or by high availability software, such as Pacemaker. The enqueue server retrieves the replication table remotely or by failing over to the host where the ERS is running.

## AWS – Availability Zones


Availability Zone is one or more discreet data centers with redundant power, networking, and connectivity in an AWS Region. For more information, see [Regions and Availability Zones](https://aws.amazon.com/about-aws/global-infrastructure/regions_az/).

For mission critical deployments of SAP on AWS where the goal is to minimise the recovery time objective (RTO), we suggest distributing single points of failure across Availability Zones. Compared with single instance or single Availability Zone deployments, this increases resilience and isolation against a broad range of failure scenarios and issues, including natural disasters.

Each Availability Zone is physically separated by a meaningful distance (many kilometers) from another Availability Zone. All Availability Zones in an AWS Region are interconnected with high-bandwidth, low-latency network, over fully redundant, dedicated metro fiber. This enables synchronous replication. All traffic between Availability Zones is encrypted.

## AWS – Overlay IP


Overlay IP enables a connection to the application, regardless of which Availability Zone (and subnet) contains the active primary node.

When deploying an Amazon EC2 instance in AWS, IP addresses are allocated from the CIDR range of the allocated subnet. The subnet cannot span across multiple Availability Zones, and therefore the subnet IP addresses may be unavailable after faults, including network connectivity or hardware issues which require a failover to the replication target in a different Availability Zone.

To address this, we suggest that you configure an overlay IP, and use this in the connection parameters for the application. This IP address is a non-overlapping RFC1918 private IP address from outside of VPC CIDR block and is configured as an entry in the route table or tables. The route directs the connection to the active node and is updated during a failover by the cluster software.

You can select any one of the following RFC1918 private IP addresses for your overlay IP address.
+ 10.0.0.0 – 10.255.255.255 (10/8 prefix)
+ 172.16.0.0 – 172.31.255.255 (172.16/12 prefix)
+ 192.168.0.0 – 192.168.255.255 (192.168/16 prefix)

If, for example, you use the 10/8 prefix in your SAP VPC, selecting a 172 or a 192 IP address may help to differentiate the overlay IP. Consider the use of an IP Address Management (IPAM) tool such as Amazon VPC IP Address Manager to plan, track, and monitor IP addresses for your AWS workloads. For more information, see [What is IPAM?](https://docs.aws.amazon.com/vpc/latest/ipam/what-it-is-ipam.html) 

The overlay IP agent in the cluster can also be configured to update multiple route tables which contain the Overlay IP entry if your subnet association or connectivity requires it.

 **Access to overlay IP** 

The overlay IP is outside of the range of the VPC, and therefore cannot be reached from locations that are not associated with the route table, including on-premises and other VPCs.

Use [AWS Transit Gateway](https://docs.aws.amazon.com/vpc/latest/tgw/what-is-transit-gateway.html) as a central hub to facilitate the network connection to an overlay IP address from multiple locations, including Amazon VPCs, other AWS Regions, and on-premises using [AWS Direct Connect](https://docs.aws.amazon.com/directconnect/latest/UserGuide/Welcome.html) or [AWS Client VPN](https://docs.aws.amazon.com/vpn/latest/clientvpn-admin/what-is.html).

If you do not have AWS Transit Gateway set up as a network transit hub or if it is not available in your preferred AWS Region, you can use a [Network Load Balancer](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/introduction.html) to enable network access to an overlay IP.

For more information, see [SAP on AWS High Availability with Overlay IP Address Routing](https://docs.aws.amazon.com/sap/latest/sap-hana/sap-ha-overlay-ip.html).

## AWS – Shared VPC


An enterprise landing zone setup or security requirements may require the use of a separate cluster account to restrict the route table access required for the Overlay IP to an isolated account. For more information, see [Share your VPC with other accounts](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-sharing.html).

Evaluate the operational impact against your security posture before setting up shared VPC. To set up, see [Shared VPC – optional](https://docs.aws.amazon.com/sap/latest/sap-netweaver/sles-netweaver-ha-settings.html#sles-netweaver-ha-shared-vpc).

## Pacemaker - STONITH fencing agent


In a two-node cluster setup for a primary resource and its replication pair, it is important that there is only one node in the primary role with the ability to modify your data. In the event of a failure scenario where a node is unresponsive or incommunicable, ensuring data consistency requires that the faulty node is isolated by powering it down before the cluster commences other actions, such as promoting a new primary. This arbitration is the role of the fencing agent.

Since a two-node cluster introduces the possibility of a fence race in which a dual shoot out can occur with communication failures resulting in both nodes simultaneously claiming, "I can’t see you, so I am going to power you off". The fencing agent is designed to minimise this risk by providing an external witness.

SLES supports several fencing agents, including the one recommended for use with Amazon EC2 Instances (external/ec2). This resource uses API commands to check its own instance status - "Is my instance state anything other than running?" before proceeding to power off its pair. If it is already in a stopping or stopped state it will admit defeat and leave the surviving node untouched.

# Parameter Reference


The cluster setup relies on the following parameters. Gather this information prior to configuring Pacemaker to ensure a smooth setup process.

**Topics**
+ [

## Global AWS parameters
](#global-aws-parameters-nw-sles)
+ [

## Amazon EC2 instance parameters
](#ec2-parameters-nw-sles)
+ [

## SAP Instance Parameters
](#sap-pacemaker-resource-parameters-nw-sles)
+ [

## Pacemaker Parameters
](#sles-cluster-parameters)

## Global AWS parameters


| Name | Parameter | Example | 
| --- | --- | --- | 
|   AWS account ID  |   `<account_id>`   |   `123456789100`   | 
|   AWS Region  |   `<region_id>`   |   `us-east-1`   | 
+  AWS account – For more details, see [Your AWS account ID and its alias](https://docs.aws.amazon.com/IAM/latest/UserGuide/console-account-alias.html).
+  AWS Region – For more details, see [Describe your Regions](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html#using-regions-availability-zones-describe).

## Amazon EC2 instance parameters


| Name | Parameter | Host 1 | Host 2 | 
| --- | --- | --- | --- | 
|  Amazon EC2 instance ID  |   `<instance_id>`   |   `i-xxxxinstidforhost1`   |   `i-xxxxinstidforhost2`   | 
|  Hostname  |   `<hostname>`   |   `slxhost01`   |   `slxhost02`   | 
|  Host IP  |   `<host_ip>`   |   `10.1.10.1`   |   `10.1.20.1`   | 
|  Host additional IP  |   `<host_additional_ip>`   |   `10.1.10.2`   |   `10.1.20.2`   | 
|  Configured subnet  |   `<subnet_id>`   |   `subnet-xxxxxxxxxxsubnet1`   |   `subnet-xxxxxxxxxxsubnet2`   | 
|  Associated VPC Route Table(s)  |   `<routetable_id>`   |   `rtb-xxxxxroutetable1 [,rtb-xxxxxroutetable2]`   |  | 
|  Sapmnt NFS ID or CNAME  |   `<sapmnt_nfs_id>`   |   `fs-xxxxxxxxxxxxxefs1`   |  | 
+  **Hostname** – Hostnames must comply with SAP requirements outlined in [SAP Note 611361 - Hostnames of SAP ABAP Platform servers](https://me.sap.com/notes/611361) (requires SAP portal access).

  Run the following command on your instances to retrieve the hostname.

  ```
  # hostname
  ```
+  **Amazon EC2 instance ID** – run the following command (IMDSv2 compatible) on your instances to retrieve instance metadata.

  ```
  # /usr/bin/curl --noproxy '*' -w "\n" -s -H "X-aws-ec2-metadata-token: $(curl --noproxy '*' -s -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")" http://169.254.169.254/latest/meta-data/instance-id
  ```

  For more details, see [Retrieve instance metadata](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html) and [Instance identity documents](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-identity-documents.html).
+  **Amazon EC2 subnet ID** – run the following command to retrieve the subnet ID for each of your instances.

  ```
  # INSTANCE_ID=i-xxxxinstidforhost1
  # aws ec2 describe-instances --instance-ids $INSTANCE_ID --query 'Reservations[0].Instances[0].SubnetId' --output text
  ```

  For more details, see [describe-instances](https://docs.aws.amazon.com/cli/latest/reference/ec2/describe-instances.html) and [VPC subnets](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Subnets.html).
+  **Route table(s) for subnets** – run the following AWS CLI commands to retrieve the route table(s) associated with both cluster node subnets.

  ```
  # SUBNET_ID_1=subnet-xxxxxxxxxxsubnet1
  # SUBNET_ID_2=subnet-xxxxxxxxxxsubnet2
  # aws ec2 describe-route-tables --filters "Name=association.subnet-id,Values=$SUBNET_ID_1,$SUBNET_ID_2" --query 'RouteTables[].RouteTableId' --output text
  ```

  If both cluster nodes are in subnets associated with the same route table, only one route table ID will be returned. If they are associated with different route tables, both route table IDs will be returned.

  For more details, see [describe-route-tables](https://docs.aws.amazon.com/cli/latest/reference/ec2/describe-route-tables.html) and [Route tables](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Route_Tables.html).

## SAP Instance Parameters


| Name | Parameter | Example | 
| --- | --- | --- | 
|  SID  |   `<SID>` or `<sid>`   |   `SLX`   | 
|  ASCS Alias  |   `<ascs_virt_hostname>`   |   `slxascs`   | 
|  ASCS System Number  |   `<ascs_sys_nr>`   |   `00`   | 
|  ASCS Overlay IP  |   `<ascs_overlayip>`   |   `172.16.1.50`   | 
|  ASCS NFS Mount Point  |   `<ascs_nfs_mount_point>`   |   `/SLX_ASCS00`   | 
|  ERS Alias  |   `<ers_virt_hostname>`   |   `slxers`   | 
|  ERS System Number  |   `<ers_sys_nr>`   |   `10`   | 
|  ERS Overlay IP  |   `<ers_overlayip>`   |   `172.16.1.51`   | 
|  ERS NFS Mount Point  |   `<ers_nfs_mount_point>`   |   `/SLX_ERS10`   | 
|  ENSA Type  |   `<ensa_type>`   |   `ENSA2`   | 

## Pacemaker Parameters


| Name | Parameter | Example | 
| --- | --- | --- | 
|  Cluster user  |   `cluster_user`   |   `hacluster`   | 
|  Cluster password  |   `cluster_password`   |  | 
|  Cluster tag  |   `cluster_tag`   |   `pacemaker`   | 
|   AWS CLI cluster profile  |   `aws_cli_cluster_profile`   |   `cluster`   | 
|  Cluster connector  |   `cluster_connector`   |   `sap-suse-cluster-connector`   | 

# Architecture diagrams


This guide covers two architectures for SAP cluster solutions on SLES for SAP – simple-mount and classic (previous standard). See the following images to learn more.

**Topics**
+ [

## Pacemaker - simple-mount architecture
](#simple-mount-diagram-nw-sles)
+ [

## Pacemaker - classic architecture
](#classic-diagram-nw-sles)

## Pacemaker - simple-mount architecture


See the following image for more details.

![\[Simple Mount Achitecture\]](http://docs.aws.amazon.com/sap/latest/sap-netweaver/images/image-pacemaker-nw-sles-simplemount.png)


## Pacemaker - classic architecture


See the following image for more details.

![\[Classic Architecture.\]](http://docs.aws.amazon.com/sap/latest/sap-netweaver/images/image-pacemaker-nw-sles-classic.png)


# Prerequisites
Prerequisites

**Topics**
+ [

# AWS Infrastructure Setup
](sap-nw-pacemaker-sles-infra-setup.md)
+ [

# EC2 Instance Configuration
](sap-nw-pacemaker-sles-ec2-configuration.md)
+ [

# Operating System Requirements
](sap-nw-pacemaker-sles-os-settings.md)

# AWS Infrastructure Setup


This section covers the one-time setup tasks required to prepare your AWS environment for the cluster deployment:

**Note**  
We recommend using administrative privileges from an administrative workstation or AWS Console for the initial infrastructure setup instead of granting instance-based privileges, as this maintains the principle of least privilege. Infrastructure setup APIs (such as CreateRoute, ModifyInstanceAttribute, and CreateTags) are only required during initial configuration and are not needed for ongoing cluster operations.

**Topics**
+ [

## Create IAM Roles and Policies for Pacemaker
](#iam-roles-sles)
+ [

## Modify Security Groups for Cluster Communication
](#sg-sles)
+ [

## Add VPC Route Table Entries for Overlay IPs
](#rt-sles)

## Create IAM Roles and Policies for Pacemaker


In addition to the permissions required for standard SAP operations, two IAM policies are required for the cluster to control AWS resources. These policies must be assigned to your Amazon EC2 instance using an IAM role. This enables Amazon EC2 instance, and therefore the cluster to call AWS services.

**Note**  
Create policies with least-privilege permissions, granting access to only the specific resources that are required within the cluster. For multiple clusters, you may need to create multiple policies.

For more information, see [IAM roles for Amazon EC2](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html#ec2-instance-profile).

### STONITH Policy


The SLES STONITH resource agent (external/ec2) requires permission to start and stop both the nodes of the cluster. Create a policy as shown in the following example. Attach this policy to the IAM role assigned to both Amazon EC2 instances in the cluster.

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeInstances",
        "ec2:DescribeTags"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "ec2:StartInstances",
        "ec2:StopInstances"
      ],
      "Resource": [
        "arn:aws:ec2:us-east-1:123456789012:instance/arn:aws:ec2:us-east-1:123456789012:instance/i-1234567890abcdef0",
        "arn:aws:ec2:us-east-1:123456789012:instance/arn:aws:ec2:us-east-1:123456789012:instance/i-1234567890abcdef0"
      ]
    }
  ]
}
```

### AWS Overlay IP Policy


The SLES Overlay IP resource agent (aws-vpc-move-ip) requires permission to modify a routing entry in route tables. Create a policy as shown in the following example. Attach this policy to the IAM role assigned to both Amazon EC2 instances in the cluster.

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "ec2:ReplaceRoute",
            "Resource": [
                 "arn:aws:ec2:us-east-1:123456789012:route-table/rtb-0123456789abcdef0",
                 "arn:aws:ec2:us-east-1:123456789012:route-table/rtb-0123456789abcdef0"
                        ]
        },
        {
            "Effect": "Allow",
            "Action": "ec2:DescribeRouteTables",
            "Resource": "*"
        }
    ]
}
```

### Shared VPC (optional)


**Note**  
The following directions are only required for setups which include a Shared VPC.

Amazon VPC sharing enables you to share subnets with other AWS accounts within the same AWS Organizations. Amazon EC2 instances can be deployed using the subnets of the shared Amazon VPC.

In the pacemaker cluster, the aws-vpc-move-ip resource agent has been enhanced to support a shared VPC setup while maintaining backward compatibility with previous existing features.

The following checks and changes are required. We refer to the AWS account that owns Amazon VPC as the sharing VPC account, and to the consumer account where the cluster nodes are going to be deployed as the cluster account.

**Minimum Version Requirements**  
The latest version of the aws-vpc-move-ip agent shipped with SLES15 SP3 supports the shared VPC setup by default. The following are the minimum version required to support a shared VPC Setup:
+ SLES 12 SP5 - resource-agents-4.3.018.a7fb5035-3.79.1.x86\$164
+ SLES 15 SP2 - resource-agents-4.4.0\$1git57.70549516-3.30.1.x86\$164
+ SLES 15 SP3 - resource-agents-4.8.0\$1git30.d0077df0-8.5.1

**IAM Roles and Policies**  
Using the Overlay IP agent with a shared Amazon VPC requires a different set of IAM permissions to be granted on both AWS accounts (sharing VPC account and cluster account).

**Sharing VPC Account**  
In sharing VPC account, create an IAM role to delegate permissions to the EC2 instances that will be part of the cluster. During the IAM Role creation, select "Another AWS account" as the type of trusted entity, and enter the AWS account ID where the EC2 instances will be deployed/running from.

After the IAM role has been created, create the following IAM policy on the sharing VPC account, and attach it to an IAM role. Add or remove route table entries as needed.

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "VisualEditor0",
      "Effect": "Allow",
      "Action": "ec2:ReplaceRoute",
      "Resource": [
        "arn:aws:ec2:us-east-1:123456789012:route-table/rtb-0123456789abcdef0",
        "arn:aws:ec2:us-east-1:123456789012:route-table/rtb-0123456789abcdef0"
      ]
    },
    {
      "Sid": "VisualEditor1",
      "Effect": "Allow",
      "Action": "ec2:DescribeRouteTables",
      "Resource": "*"
    }
  ]
}
```

Next, edit move to the "Trust relationships" tab in the IAM role, and ensure that the AWS account you entered while creating the role has been correctly added.

In cluster account, create the following IAM policy, and attach it to an IAM role. This is the IAM Role that is going to be attached to the EC2 instances.

 **STS Policy** 

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "VisualEditor0",
      "Effect": "Allow",
      "Action": "sts:AssumeRole",
      "Resource": "arn:aws:iam::123456789012:role/sharing-vpc-account-cluster-role"
    }
  ]
}
```

 **STONITH Policy** 

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "VisualEditor0",
      "Effect": "Allow",
      "Action": [
        "ec2:StartInstances",
        "ec2:StopInstances"
      ],
      "Resource": [
        "arn:aws:ec2:us-east-1:123456789012:instance/arn:aws:ec2:us-east-1:123456789012:instance/i-1234567890abcdef0",
        "arn:aws:ec2:us-east-1:123456789012:instance/arn:aws:ec2:us-east-1:123456789012:instance/i-1234567890abcdef0"
      ]
    },
    {
      "Sid": "VisualEditor1",
      "Effect": "Allow",
      "Action": "ec2:DescribeInstances",
      "Resource": "*"
    }
  ]
}
```

## Modify Security Groups for Cluster Communication


A security group controls the traffic that is allowed to reach and leave the resources that it is associated with. For more information, see [Control traffic to your AWS resources using security groups](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-security-groups.html).

In addition to the standard ports required to access SAP and administrative functions, the following rules must be applied to the security groups assigned to all Amazon EC2 instances in the cluster.


| Source | Protocol | Port range | Description | 
| --- | --- | --- | --- | 
|  The security group ID (its own resource ID)  |  UDP  |  5405  |  Allows UDP traffic between cluster resources for corosync communication  | 
|  Bastion host security group or CIDR range for administration  |  TCP  |  7630  |  (optional) Used for SLES Hawk2 Interface for monitoring and administration using a Web Interface. For more details, see SUSE documentation [Configuring and Managing Cluster Resources with Hawk2](https://documentation.suse.com/sle-ha/15-SP6/html/SLE-HA-all/cha-ha-manage-resources.html#sec-conf-hawk2-manage-edit).  | 
+ Note the use of the `UDP` protocol.
+ If you are running a local firewall, such as iptables, ensure that communication on the preceding ports is allowed between two Amazon EC2 instances.

## Add VPC Route Table Entries for Overlay IPs


You need to add initial route table entries for the Overlay IP. For more information on Overlay IP, see [AWS – Overlay IP](sap-nw-pacemaker-sles-concepts.md#overlay-ip-sles).

Add entries to the VPC route table or tables associated with the subnets of your Amazon EC2 instance for the cluster. The entries for destination (Overlay IP CIDR) and target (Amazon EC2 instance or ENI) must be added manually for the ASCS and the ERS. This ensures that the cluster resource has a route to modify. It also supports the install of SAP using the virtual names associated with the Overlay IP before the configuration of the cluster.

Using either the Amazon VPC console, or an AWS CLI command add a route to the table or tables for the Overlay IP.

------
#### [  AWS Console ]

1. Identify the EC2 instance IDs for both cluster nodes and determine which route tables are associated with their subnets. For details, see [Parameter Reference](sap-nw-pacemaker-sles-parameters.md#sap-pacemaker-resource-parameters-nw-sles) 

1. Open the Amazon VPC console at https://console.aws.amazon.com/vpc

1. In the navigation pane, choose **Route Tables**, select the first route table.

1. Choose **Actions** → **Edit routes**.

1. Choose **Add route** and configure the ASCS route:    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sap/latest/sap-netweaver/sap-nw-pacemaker-sles-infra-setup.html)

1. Choose **Add route** and configure the ERS route:    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sap/latest/sap-netweaver/sap-nw-pacemaker-sles-infra-setup.html)

1. Choose **Save changes**.

1. Repeat for any additional associated route tables or route tables from the VPC which require connectivity to the ASCS.

   Your route table now includes entries for required Overlay IPs, in addition to the standard routes.

------
#### [  AWS CLI ]

Identify the EC2 instance IDs for both cluster nodes and determine which route tables are associated with their subnets. For details, see. [Parameter Reference](sap-nw-pacemaker-sles-parameters.md#sap-pacemaker-resource-parameters-nw-sles).

For the ASCS:

```
$ aws ec2 create-route --route-table-id <routetable_id> --destination-cidr-block <ascs_overlayip>/32 --instance-id <instance_id_1>
```

For the ERS:

```
$ aws ec2 create-route --route-table-id <routetable_id> --destination-cidr-block <ers_overlayip>/32 --instance-id <instance_id_2>
```

------

# EC2 Instance Configuration


Amazon EC2 instance settings can be applied using Infrastructure as Code or manually using AWS Command Line Interface or AWS Console. We recommend Infrastructure as Code automation to reduce manual steps, and ensure consistency.

**Topics**
+ [

## Assign or Review Pacemaker IAM Role
](#assign-review-pacemaker-iam-role-nw-sles)
+ [

## Assign or Review Security Groups
](#assign-review-security-groups-nw-sles)
+ [

## Assign Secondary IP Addresses
](#assign-secondary-ip-addresses-nw-sles)
+ [

## Disable Source/Destination Check
](#source-dest-nw-sles)
+ [

## Review Stop Protection
](#stop-protection-nw-sles)
+ [

## Review Automatic Recovery
](#auto-recovery-nw-sles)
+ [

## Create Amazon EC2 Resource Tags Used by Amazon EC2 STONITH Agent
](#create-cluster-tags-nw-sles)

**Important**  
The following configurations must be performed on all cluster nodes. Ensure consistency across nodes to prevent cluster issues.

## Assign or Review Pacemaker IAM Role


The two cluster resource IAM policies must be assigned to an IAM role associated with your Amazon EC2 instance. If an IAM role is not associated to your instance, create a new IAM role for cluster operations.

The following AWS Console or AWS CLI commands can be used to modify the IAM role assignment.

------
#### [  AWS Console ]

1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2.

1. Select one of your cluster nodes.

1. In the navigation pane, choose **Actions** → **Security** → **Modify IAM role**.

1. Choose the IAM role that contains the policies created in [Create IAM Roles and Policies for Pacemaker](sap-nw-pacemaker-sles-infra-setup.md#iam-roles-sles).

1. Choose **Update IAM role**.

1. Repeat these steps for all nodes in the cluster.

------
#### [  AWS CLI ]

To assign an IAM role using the AWS CLI:

```
$ aws ec2 associate-iam-instance-profile --instance-id <instance_id> --iam-instance-profile Name=<iam_instance_profile_name>
```

Repeat for all nodes in the cluster.

------

You can verify the IAM role assignment on your instances using the AWS CLI:

```
$ aws ec2 describe-instances --instance-ids <instance_id> --query 'Reservations[0].Instances[0].IamInstanceProfile' --output table
```

You can check the specific permissions of the roles created for pacemaker in [Create IAM Roles and Policies for Pacemaker](sap-nw-pacemaker-sles-infra-setup.md#iam-roles-sles) by running the following on both your instances.

When --dry-run is used, the AWS CLI or SDK sends the request to the EC2 service with this flag. EC2 then performs all necessary permission checks and validates the request parameters. If the user has the required permissions and the request is well-formed, the service returns a DryRunOperation error response, indicating that the operation would have succeeded.

Check that the tags are correctly set and can be queried from both instances if using the ec2/stonith fencing agent:

```
$ aws ec2 describe-tags --filters "Name=resource-id,Values=<instance_id_1>" "Name=key,Values=
<cluster_tag>" --region=<region> --output=text | cut -f5
```

Check that the fencing resource has the permission to shut down both instances:

```
$ aws ec2 stop-instances --instance-ids <instance_id_1> --dry-run
$ aws ec2 stop-instances --instance-ids <instance_id_2> --dry-run
```

Check that the overlay IP resource has the pemissions to update the route tables:

```
$ aws ec2 replace-route --route-table-id <routetable_id> --destination-cidr-block <ascs_overlayip>/32 --instance-id <instance_id_1> --dry-run
```

## Assign or Review Security Groups


The security group rules created in the AWS [Modify Security Groups for Cluster Communication](sap-nw-pacemaker-sles-infra-setup.md#sg-sles) section must be assigned to your Amazon EC2 instances. If a security group is not associated with your instance, or if the required rules are not present in the assigned security group, add the security group or update the rules.

The following AWS Console or AWS CLI commands can be used to modify security group assignments.

------
#### [  AWS Console ]

1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2.

1. Select one of your cluster nodes.

1. In the **Security** tab, review the security groups, ports, and source of traffic.

1. If required, choose **Actions** → **Security** → **Change security groups**.

1. Under **Associated security groups**, search for and select the required groups.

1. Choose **Save**.

1. Repeat these steps for all nodes in the cluster.

------
#### [  AWS CLI ]

To modify security groups using the AWS CLI:

```
$ aws ec2 modify-instance-attribute --instance-id <instance_id> --groups <security_group_id1> <security_group_id2>
```

Repeat for all nodes in the cluster.

------

You can verify the security group rules on your instances using the AWS CLI:

```
$ aws ec2 describe-instance-attribute --instance-id <instance_id> --attribute groupSet
```

## Assign Secondary IP Addresses


Secondary IP addresses are used to create a redundant communication channel (secondary ring) in corosync for clusters. The cluster nodes can use the secondary ring to communicate in case of underlying network disruptions.

These IPs are only used in cluster configurations. The secondary IPs provide the same fault tolerance as a secondary Elastic Network Interface (ENI). For more information, see [Secondary IP addresses for your EC2 Instance](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-secondary-ip-addresses.html).

The following AWS Console or AWS CLI commands can be used to assign secondary IP addresses.

------
#### [  AWS Console ]

1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2.

1. Select one of your cluster nodes.

1. In the **Networking** tab, choose the network interface ID.

1. Choose **Actions** → **Manage IP addresses**.

1. Choose **Assign new IP address**.

1. Select **Auto-assign** or specify an IP from the subnet range.

1. Choose **Yes, Update**.

1. Repeat these steps for all nodes in the cluster.

------
#### [  AWS CLI ]

To assign secondary IP addresses using the AWS CLI:

```
$ ENI_ID=$(aws ec2 describe-instances --instance-id <instance_id> \
    --query 'Reservations[0].Instances[0].NetworkInterfaces[0].NetworkInterfaceId' \
    --output text)
$ aws ec2 assign-private-ip-addresses --network-interface-id $ENI_ID --secondary-private-ip-address-count 1
```

Repeat for all nodes in the cluster.

------

You can verify the secondary IP configuration on your instances using the AWS CLI:

```
$ aws ec2 describe-instances --instance-id <instance_id> \
    --query 'Reservations[*].Instances[*].NetworkInterfaces[*].PrivateIpAddresses[*].PrivateIpAddress' \
    --output text
```

Verify that:
+ Each instance returns two IP addresses from the same subnet
+ The primary network interface (eth0) has both IPs assigned
+ The secondary IPs will be used later for ring0\$1addr and ring1\$1addr in corosync.conf

## Disable Source/Destination Check


Amazon EC2 instances perform source/destination checks by default, requiring that an instance is either the source or the destination of any traffic it sends or receives. In the pacemaker cluster, source/destination check must be disabled on both instances receiving traffic from the Overlay IP.

The following AWS Console or AWS CLI commands can be used to modify the attribute.

------
#### [  AWS Console ]

1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2.

1. Select one of your cluster nodes.

1. In the navigation pane, choose **Actions** → **Networking** → **Change source/destination check**.

1. For Source/Destination Checking, choose **Stop** to allow traffic when the source or destination is not the instance itself.

1. Repeat these steps for all nodes in the cluster.

------
#### [  AWS CLI ]

To modify using the AWS CLI (requires appropriate configuration permissions):

```
$ aws ec2 modify-instance-attribute --instance-id <instance_id> --no-source-dest-check
```

Repeat for all nodes in the cluster.

------

To confirm the value of an attribute for a particular instance, use the following command. The value `false` means source/destination checking is disabled

```
$ aws ec2 describe-instance-attribute --instance-id <instance_id> --attribute sourceDestCheck
```

The output

```
{
    "InstanceId": "i-xxxxinstidforhost1",
    "SourceDestCheck": {
        "Value": false
    }
}
```

## Review Stop Protection


To ensure that STONITH actions can be executed, you must ensure that stop protection is disabled for Amazon EC2 instances that are part of a pacemaker cluster. If the default settings have been modified, use the following commands for both instances to disable stop protection via AWS CLI.

The following AWS Console or CLI commands can be used to modify the attribute.

------
#### [  AWS Console ]

1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2.

1. Select one of your cluster nodes.

1. Choose **Actions** → **Instance settings** → **Change stop protection**.

1. Ensure **Stop protection** is not enabled.

1. Repeat these steps for all nodes in the cluster.

------
#### [  AWS CLI ]

To modify using the AWS CLI (requires appropriate configuration permissions):

```
$ aws ec2 modify-instance-attribute --instance-id <instance_id> --no-disable-api-stop
```

Repeat this command for all nodes in the cluster.

------

To confirm the value of an attribute for a particular instance, use the following command. The value `false` means it is possible to stop the instance using an AWS CLI.

```
$ aws ec2 describe-instance-attribute --instance-id <instance_id> --attribute disableApiStop
```

The output

```
{
    "InstanceId": "i-xxxxinstidforhost1",
    "DisableApiStop": {
        "Value": false
    }
}
```

## Review Automatic Recovery


After a failure, cluster-controlled operations must be resumed in a coordinated way. This helps ensure that the cause of failure is known and addressed, and the status of the cluster is as expected. For example, verifying that there are no pending fencing actions.

The following AWS Console or CLI commands can be used to modify the attribute.

------
#### [  AWS Console ]

1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2.

1. Select one of your cluster nodes.

1. Choose **Actions** → **Instance settings** → **Change auto-recovery behavior**.

1. Select **Off** to disable auto-recovery for system status check failures.

1. Repeat these steps for all nodes in the cluster.

------
#### [  AWS CLI ]

To modify auto-recovery settings (requires appropriate configuration permissions):

```
$ aws ec2 modify-instance-maintenance-options --instance-id <instance_id> --auto-recovery disabled
```

Repeat this command for all nodes in the cluster.

------

To confirm the value of an attribute for a particular instance, use the following command. The value `disabled` means autorecovery will not be attempted.

```
$ aws ec2 describe-instances --instance-ids <instance_id> --query 'Reservations[*].Instances[*].MaintenanceOptions.AutoRecovery'
```

The output:

```
[
    [
        "disabled"
    ]
]
```

## Create Amazon EC2 Resource Tags Used by Amazon EC2 STONITH Agent


Amazon EC2 STONITH agent uses AWS resource tags to identify Amazon EC2 instances. Create tag for the primary and secondary Amazon EC2 instances via AWS Console or AWS CLI. For more information, see [Using Tags](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Using_Tags.html).

Use the same tag key and the local hostname returned using the command hostname across instances. For example, a configuration with the values defined in Global AWS parameters would require the tags shown in the following table.


| Amazon EC2 | Key example | Value example | 
| --- | --- | --- | 
|   `<instance_id>`   |   `<cluster_tag>`   |   `<hostname>`   | 
|   `i-xxxxinstidforhost1`   |   `pacemaker`   |   `slxhost01`   | 
|   `i-xxxxinstidforhost2`   |   `pacemaker`   |   `slxhost02`   | 

The following AWS Console or AWS CLI commands can be used to create resource tags.

------
#### [  AWS Console ]

1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2.

1. Select one of your cluster nodes.

1. In the **Tags** tab, choose **Manage tags**.

1. Choose **Add tag**.

1. For **Key**, enter the cluster tag (for example, `pacemaker`).

1. For **Value**, enter the hostname of the instance.

1. Choose **Save**.

1. Repeat these steps for all nodes in the cluster.

------
#### [  AWS CLI ]

To create tags using the AWS CLI:

```
$ aws ec2 create-tags --resources <instance_id> --tags Key=<cluster_tag>,Value=<hostname>
```

Repeat for all nodes in the cluster with their respective hostnames.

------

You can run the following command locally to validate the tag values and IAM permissions to describe the tags. Run this command on all instances in the cluster, for all instances in the cluster.

```
$ aws ec2 describe-tags --filters "Name=resource-id,Values=<instance_id>" "Name=key,Values=<cluster_tag>" --region=<region> --output=text | cut -f5
```

# Operating System Requirements


This section outlines the required operating system configurations for SUSE Linux Enterprise Server for SAP (SLES for SAP) cluster nodes. Note that this is not a comprehensive list of configuration requirements for running SAP on AWS, but rather focuses specifically on cluster management prerequisites.

Consider using configuration management tools or automated deployment scripts to ensure accurate and repeatable setup across your cluster infrastructure.

**Topics**
+ [

## Root Access
](#_root_access)
+ [

## Install Missing Operating System Packages
](#packages-nw-sles)
+ [

## Update and Check Operating System Versions
](#_update_and_check_operating_system_versions)
+ [

## System Logging
](#_system_logging)
+ [

## Time Synchronization Services
](#_time_synchronization_services)
+ [

## Install AWS CLI and Configure Profiles
](#install_shared_aws_cli_and_configure_profiles)
+ [

## Pacemaker Proxy Settings (Optional)
](#_pacemaker_proxy_settings_optional)

**Important**  
The following configurations must be performed on all cluster nodes. Ensure consistency across nodes to prevent cluster issues.

## Root Access


Verify root access on both cluster nodes. The majority of the setup commands in this document are performed with the root user. Assume that commands should be run as root unless there is an explicit call out to choose otherwise.

## Install Missing Operating System Packages


This is applicable to all cluster nodes. You must install any missing operating system packages.

The following packages and their dependencies are required for the pacemaker setup. Depending on your baseline image, for example, SLES for SAP, these packages may already be installed.


| Package | Description | Category | Required | Configuration Pattern | 
| --- | --- | --- | --- | --- | 
|  chrony  |  Time Synchronization  |  System Support  |  Mandatory  |  All  | 
|  rsyslog  |  System Logging  |  System Support  |  Mandatory  |  All  | 
|  pacemaker  |  Cluster Resource Manager  |  Core Cluster  |  Mandatory  |  All  | 
|  corosync  |  Cluster Communication Engine  |  Core Cluster  |  Mandatory  |  All  | 
|  resource-agents  |  Resource Agents including SAPInstance  |  Core Cluster  |  Mandatory  |  All  | 
|  fence-agents  |  Fencing Capabilities  |  Core Cluster  |  Mandatory  |  All  | 
|  sap-suse-cluster-connector  |  SAP HA-Script Connector (≥3.1.1 for SimpleMount)  |  SAP Integration  |  Mandatory  |  All  | 
|  sapstartsrv-resource-agents  |  SAP Start Service Resource Agents  |  SAP Integration  |  Mandatory\$1  |  SimpleMount  | 
|  supportutils  |  System Information Gathering  |  Support Tools  |  Recommended  |  All  | 
|  sysstat  |  Performance Monitoring Tools  |  Support Tools  |  Recommended  |  All  | 
|  zypper-lifecycle-plugin  |  Software Lifecycle Management  |  Support Tools  |  Recommended  |  All  | 
|  supportutils-plugin-ha-sap  |  HA/SAP Support Data Collection  |  Support Tools  |  Recommended  |  All  | 
|  supportutils-plugin-suse-public-cloud  |  Cloud Support Data Collection  |  Support Tools  |  Recommended  |  All  | 
|  dstat  |  System Resource Statistics  |  Monitoring  |  Recommended  |  All  | 
|  iotop  |  I/O Monitoring  |  Monitoring  |  Recommended  |  All  | 

**Note**  
Refer to [Vendor Support of Deployment Types](sap-nw-pacemaker-sles-references.md#deployments-sles) for more information on Configuration Patterns. `Mandatory*` indicates that this package is mandatory based on the Configuration Pattern.

```
#!/bin/bash
# Mandatory core packages for SAP NetWeaver HA on AWS
mandatory_packages="corosync pacemaker resource-agents fence-agents rsyslog chrony sap-suse-cluster-connector"

# SimpleMount specific packages
simplemount_packages="sapstartsrv-resource-agents"

# Recommended monitoring and support packages
support_packages="supportutils supportutils-plugin-ha-sap supportutils-plugin-suse-public-cloud sysstat dstat iotop zypper-lifecycle-plugin"

# Default to checking all packages
packages="${mandatory_packages} ${simplemount_packages} ${support_packages}"

missingpackages=""

echo "Checking SAP NetWeaver HA package requirements..."
echo "Note: sapstartsrv-resource-agents is only required for SimpleMount architecture"

for package in ${packages}; do
    echo "Checking if ${package} is installed..."
    if ! rpm -q ${package} --quiet; then
        echo " ${package} is missing and needs to be installed"
        missingpackages="${missingpackages} ${package}"
    fi
done

if [ -z "$missingpackages" ]; then
    echo "All packages are installed."
else
    echo "Missing mandatory packages: $(echo ${missingpackages} | tr ' ' '\n' | grep -E "^($(echo ${mandatory_packages} | tr ' ' '|'))$")"
    echo "Missing SimpleMount packages: $(echo ${missingpackages} | tr ' ' '\n' | grep -E "^($(echo ${simplemount_packages} | tr ' ' '|'))$")"
    echo "Missing support packages: $(echo ${missingpackages} | tr ' ' '\n' | grep -E "^($(echo ${support_packages} | tr ' ' '|'))$")"

    echo -n "Do you want to install the missing packages (y/n)? "
    read response
    if [ "$response" = "y" ]; then
        zypper install -y $missingpackages
    fi
fi

# Check sap-suse-cluster-connector version if installed
if rpm -q sap-suse-cluster-connector --quiet; then
    version=$(rpm -q sap-suse-cluster-connector --qf '%{VERSION}')
    echo "sap-suse-cluster-connector version: $version"
    if [[ $(echo "$version" | cut -d. -f1) -ge 3 ]] && [[ $(echo "$version" | cut -d. -f2) -ge 1 ]] && [[ $(echo "$version" | cut -d. -f3) -ge 1 ]]; then
        echo "sap-suse-cluster-connector version is suitable for SimpleMount architecture"
    else
        echo "WARNING: SimpleMount architecture requires sap-suse-cluster-connector version 3.1.1 or higher"
    fi
fi
```

If a package is not installed, and you are unable to install it using zypper, it may be because SUSE Linux Enterprise High Availability extension is not available as a repository in your chosen image. You can verify the availability of the extension using the following command:

```
$ sudo zypper repos
```

To install or update a package or packages with confirmation, use the following command:

```
$ sudo zypper install <package_name(s)>
```

## Update and Check Operating System Versions


You must update and confirm versions across nodes. Apply all the latest patches to your operating system versions. This ensures that bugs are addressed and new features are available.

You can update the patches individually or update all system patches using the `zypper update` command. A clean reboot is recommended prior to setting up a cluster.

```
$ sudo zypper update
$ sudo reboot
```

Compare the operating system package versions on the two cluster nodes and ensure that the versions match on both nodes.

## System Logging


Both systemd-journald and rsyslog are suggested for comprehensive logging. Systemd-journald (enabled by default) provides structured, indexed logging with immediate access to events, while rsyslog is maintained for backward compatibility and traditional file-based logging. This dual approach ensures both modern logging capabilities and compatibility with existing log management tools and practices.

 **1. Enable and start rsyslog:** 

```
# systemctl enable --now rsyslog
```

**2. (Optional) Configure persistent logging for systemd-journald:**  
If you are not using a logging agent (like the AWS CloudWatch Unified Agent or Vector) to ship logs to a centralized location, you may want to configure persistent logging to retain logs after system reboots.

```
# mkdir -p /etc/systemd/journald.conf.d
```

Create `/etc/systemd/journald.conf.d/99-logstorage.conf` with:

```
[Journal]
Storage=persistent
```

Persistent logging requires careful storage management. Configure appropriate retention and rotation settings in `journald.conf` to prevent logs from consuming excessive disk space. Review `man journald.conf` for available options such as SystemMaxUse, RuntimeMaxUse, and MaxRetentionSec.

To apply the changes, restart journald:

```
# systemctl restart systemd-journald
```

After enabling persistent storage, only new logs will be stored persistently. Existing logs from the current boot session will remain in volatile storage until the next reboot.

 **3. Verify services are running:** 

```
# systemctl status systemd-journald
# systemctl status rsyslog
```

## Time Synchronization Services


Time synchronization is important for cluster operation. Ensure that chrony rpm is installed, and configure appropriate time servers in the configuration file.

You can use Amazon Time Sync Service that is available on any instance running in a VPC. It does not require internet access. To ensure consistency in the handling of leap seconds, don’t mix Amazon Time Sync Service with any other ntp time sync servers or pools.

Create or check the `/etc/chrony.d/ec2.conf` file to define the server:

```
# Amazon EC2 time source config
server 169.254.169.123 prefer iburst minpoll 4 maxpoll 4
```

Start the chronyd.service, using the following command:

```
# systemctl enable --now chronyd.service
# systemctl status chronyd
```

Verify time synchronization is working:

```
# chronyc tracking
```

Ensure the output shows `Reference ID : A9FEA97B (169.254.169.123)` confirming synchronization with Amazon Time Sync Service.

For more information, see [Set the time for your Linux instance](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-time.html).

## Install AWS CLI and Configure Profiles


The AWS cluster resource agents require AWS Command Line Interface (AWS CLI). Check if AWS CLI is already installed, and install it if necessary.

Check if AWS CLI is installed:

```
# aws --version
```

If the command is not found, install AWS CLI v2 using the following commands:

```
# cd /tmp
# curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
# dnf install -y unzip
# unzip awscliv2.zip
# sudo ./aws/install --update
```

Create symlinks to ensure AWS CLI is in the system PATH:

```
# sudo ln -sf /usr/local/bin/aws /usr/bin/aws
```

Verify the installation:

```
# aws --version
```

The installation creates a symbolic link at `/usr/local/bin/aws` which is typically in the system PATH by default.

For more information, see [Installing or updating to the latest version of the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html).

After installing AWS CLI, you need to create an AWS CLI profile for the root account.

You can either edit the config file at `/root/.aws` manually or by using the `aws configure` AWS CLI command.

You should skip providing the information for the access and secret access keys. The permissions are provided through IAM roles attached to Amazon EC2 instances.

```
# aws configure
AWS Access Key ID [None]:
AWS Secret Access Key [None]:
Default region name [None]: <region>
Default output format [None]:
```

The profile name is `default` unless configured. If you choose to use a different name you can specify `--profile`. The name chosen in this example is cluster. It is used in the AWS resource agent definition for pacemaker. The AWS Region must be the default AWS Region of the instance.

```
# aws configure --profile cluster
AWS Access Key ID [None]:
AWS Secret Access Key [None]:
Default region name [None]: <region>
Default output format [None]:
```

On the hosts, you can verify the available profiles using the following command:

```
# aws configure list-profiles
```

And review that an assumed role is associated by querying the caller identity:

```
# aws sts get-caller-identity --profile=<profile_name>
```

## Pacemaker Proxy Settings (Optional)


If your Amazon EC2 instance has been configured to access the internet and/or AWS Cloud through proxy servers, then you need to replicate the settings in the pacemaker configuration. For more information, see [Using an HTTP Proxy](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-proxy.html).

Add the following lines to `/etc/sysconfig/pacemaker`:

```
http_proxy=http://<proxyhost>:<proxyport>
https_proxy=http://<proxyhost>:<proxyport>
no_proxy=127.0.0.1,localhost,169.254.169.254,fd00:ec2::254
```
+ Modify proxyhost and proxyport to match your settings.
+ Ensure that you exempt the address used to access the instance metadata.
+ Configure no\$1proxy to include the IP address of the instance metadata service – 169.254.169.254 (IPV4) and fd00:ec2::254 (IPV6). This address does not vary.

# SAP ASCS and Cluster Setup


This section covers the following topics.

**Topics**
+ [

# SAP Shared File Systems
](sap-shared-filesystems-nw-sles.md)
+ [

# Check IP availability and resolution
](check-ip-availability-resolution-nw-sles.md)
+ [

# Install SAP
](install-sap-nw-sles.md)
+ [

# Configure SAP for Cluster Control
](sap-ascs-service-control-nw-sles.md)
+ [

# Cluster Node Setup
](cluster-node-setup-nw-sles.md)
+ [

# Cluster Configuration
](cluster-config-nw-sles.md)

# SAP Shared File Systems


**Topics**
+ [

## Select Shared Storage
](#select-storage-type-nw-sles)
+ [

## Create file systems
](#create-filesystems-nw-sles)
+ [

## Create mount point directories
](#create-mount-dirs-nw-sles)
+ [

## Update /etc/fstab
](#update-fstab-nw-sles)
+ [

## Temporarily mount ASCS and ERS directories for installation (classic only)
](#temp-mount-dirs-nw-sles)

## Select Shared Storage


SAP NetWeaver high availability deployments require shared file systems. On Linux, you can use either [Amazon Elastic File System](https://aws.amazon.com/efs/) or [Amazon FSx for NetApp ONTAP](https://aws.amazon.com/fsx/netapp-ontap/). Choose between these options based on your requirements for resilience, performance, and cost. For detailed setup information, see [Getting started with Amazon Elastic File System](https://docs.aws.amazon.com/efs/latest/ug/getting-started.html) or [Getting started with Amazon FSx for NetApp ONTAP](https://docs.aws.amazon.com/fsx/latest/ONTAPGuide/getting-started.html).

We recommend sharing a single Amazon EFS or FSx for ONTAP file system across multiple SIDs within an account.

The file system’s DNS name is the simplest mounting option. When connecting from an Amazon EC2 instance, the DNS automatically resolves to the mount target’s IP address in that instance’s Availability Zone. You can also create an alias (CNAME) to help identify the shared file system’s purpose. Throughout this document, we use `<nfs.fqdn>`.

Examples:
+  `file-system-id.efs.aws-region.amazonaws.com` 
+  `svm-id.fs-id.fsx.aws-region.amazonaws.com` 
+  `qas_sapmnt_share.example.com` 

**Note**  
Review the `enableDnsHostnames` and `enableDnsSupport` DNS attributes for your VPC. For more information, see [View and update DNS attributes for your VPC](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-dns.html#vpc-dns-updating).

## Create file systems


The following shared file systems are covered in this document:


| NFS Location Structure | NFS Location Example | File System Location Structure | File System Location Example | 
| --- | --- | --- | --- | 
|  <SID>\$1sapmnt  |   `SLX_sapmnt`   |  /sapmnt/<SID>  |   `/sapmnt/SLX`   | 
|  <SID>\$1ASCS<ascs\$1sys\$1nr>  |   `SLX_ASCS00`   |  /usr/sap/<SID>/ASCS<ascs\$1sys\$1nr>  |   `/usr/sap/SLX/ASCS00`   | 
|  <SID>\$1ERS<ers\$1sys\$1nr>  |   `SLX_ERS10`   |  /usr/sap/<SID>/ERS<ers\$1sys\$1nr>  |   `/usr/sap/SLX/ERS10`   | 

The following options can differ depending on how you architect and operate your systems:
+ ASCS and ERS mount points - In simple-mount architecture, you can share the entire `/usr/sap/<SID>` directory. This document uses separate mount points to simplify migration and follow SAP’s recommendation for local application server executables when co-hosting ASCS/ERS.
+ Transport directory - `/usr/sap/trans` is optional for ASCS installations. Add this shared directory if your change management processes require it.
+ Home directory - This document uses local home directories to ensure `<sid>adm` access during NFS issues. Consider a shared home directory if you need consistent user environments across nodes.
+ NFS location naming - The "NFS Location" names are arbitrary and can be chosen based on your naming conventions (e.g., `myEFSMount1`, `prod_sapmnt`, etc.). The "File system location" follows the standard SAP directory structure and should use the parameter references shown.

For more information, see [SAP System Directories on UNIX](https://help.sap.com/docs/SAP_NETWEAVER_750/ff18034f08af4d7bb33894c2047c3b71/2744f17a26a74a8abfd202c4f5dc9a0f.html).

Using the NFS ID created in the previous step, temporarily mount the root directory of the NFS. `/mnt` is available by default; it can also be substituted with another temporary location.

**Note**  
The following commands use the NFS location names from the table above. Replace `<SID>_sapmnt`, `<SID>_ASCS<ascs_sys_nr>`, and `<SID>_ERS<ers_sys_nr>` with your chosen NFS location names and parameter values.

```
# mount <nfs.fqdn>:/ /mnt
# mkdir -p /mnt/<SID>_sapmnt
# mkdir -p /mnt/<SID>_ASCS<ascs_sys_nr>
# mkdir -p /mnt/<SID>_ERS<ers_sys_nr>
```
+  *Example using values from [Parameter Reference](sap-nw-pacemaker-sles-parameters.md) *:

  ```
  # mount fs-xxxxxxxxxxxxxefs1.efs.us-east-1.amazonaws.com:/ /mnt
  # mkdir -p /mnt/SLX_sapmnt
  # mkdir -p /mnt/SLX_ASCS00
  # mkdir -p /mnt/SLX_ERS10
  ```

During SAP installation, the `<sid>adm` user and proper directory ownership will be created. Until then, we need to ensure the installation process has sufficient access. Set temporary permissions on the directories:

```
# chmod 777 /mnt/<SID>_sapmnt /mnt/<SID>_ASCS<ascs_sys_nr> /mnt/<SID>_ERS<ers_sys_nr>
```
+  *Example using values from [Parameter Reference](sap-nw-pacemaker-sles-parameters.md) *:

  ```
  # chmod 777 /mnt/SLX_sapmnt /mnt/SLX_ASCS00 /mnt/SLX_ERS10
  ```

The SAP installation process will automatically set the correct ownership and permissions for operational use.

Unmount the temporary mount:

```
# umount /mnt
```

## Create mount point directories


This is applicable to both cluster nodes. Create the directories for the required mount points (permanent or cluster controlled):

```
# mkdir /sapmnt
# mkdir /usr/sap/<SID>/ASCS<ascs_sys_nr>
# mkdir /usr/sap/<SID>/ERS<ers_sys_nr>
```
+  *Example using values from [Parameter Reference](sap-nw-pacemaker-sles-parameters.md) *:

  ```
  # mkdir /sapmnt
  # mkdir /usr/sap/SLX/ASCS00
  # mkdir /usr/sap/SLX/ERS10
  ```

## Update /etc/fstab


This is applicable to both cluster nodes. `/etc/fstab` is a configuration table containing the details required for mounting and unmounting file systems to a host.

Add the file systems not managed by the cluster to `/etc/fstab`.

For both **simple-mount** and **classic** architectures, prepare and append an entry for the `sapmnt` file system to `/etc/fstab`:

```
<nfs.fqdn>/<SID>_sapmnt    /sapmnt    nfs    nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport    0    0
```

 **Simple-mount only** – prepare and append entries for the ASCS and ERS file systems to `/etc/fstab`:

```
<nfs.fqdn>:/<SID>_ASCS<ascs_sys_nr>   /usr/sap/<SID>/ASCS<ascs_sys_nr>  nfs    nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport    0    0
<nfs.fqdn>:/<SID>_ERS<ers_sys_nr>     /usr/sap/<SID>/ERS<ers_sys_nr>    nfs    nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport    0    0
```
+  *Example using values from [Parameter Reference](sap-nw-pacemaker-sles-parameters.md) *:

  ```
  fs-xxxxxxxxxxxxxefs1.efs.us-east-1.amazonaws.com:/SLX_sapmnt    /sapmnt               nfs    nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport    0    0
  fs-xxxxxxxxxxxxxefs1.efs.us-east-1.amazonaws.com:/SLX_ASCS00    /usr/sap/SLX/ASCS00   nfs    nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport    0    0
  fs-xxxxxxxxxxxxxefs1.efs.us-east-1.amazonaws.com:/SLX_ERS10     /usr/sap/SLX/ERS10    nfs    nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport    0    0
  ```

Verify that your mount options are:
+ Compatible with your operating system version
+ Supported by your chosen NFS file system type (EFS or FSx for ONTAP)
+ Aligned with current SAP recommendations

Consult SAP and AWS documentation for the latest mount option recommendations.

Use the following command to mount the file systems defined in `/etc/fstab`:

```
# mount -a
```

Use the following command to check that the required file systems are available:

```
# df -h
```

## Temporarily mount ASCS and ERS directories for installation (classic only)


This is only applicable to the classic architecture. Simple-mount architecture has these directories permanently available in `/etc/fstab`.

Mount ASCS and ERS directories for installation.

Use the following command on the instance where you plan to install ASCS:

```
# mount <nfs.fqdn>:/<SID>_ASCS<ascs_sys_nr>  /usr/sap/<SID>/ASCS<ascs_sys_nr>
```

Use the following command on the instance where you plan to install ERS:

```
# mount <nfs.fqdn>:/<SID>_ERS<ers_sys_nr>  /usr/sap/<SID>/ERS<ers_sys_nr>
```
+  *Example using values from [Parameter Reference](sap-nw-pacemaker-sles-parameters.md) *:

  ```
  # mount fs-xxxxxxxxxxxxxefs1.efs.us-east-1.amazonaws.com:/SLX_ASCS00  /usr/sap/SLX/ASCS00
  # mount fs-xxxxxxxxxxxxxefs1.efs.us-east-1.amazonaws.com:/SLX_ERS10   /usr/sap/SLX/ERS10
  ```

# Check IP availability and resolution


## Add Overlay IP for SAP Installation


SAP Installation should be done using the virtual names assigned to the overlay IP. Before adding the overlay IPs to the instances, ensure that the VPC route table entries have been created as described in [Add VPC Route Table Entries for Overlay IPs](sap-nw-pacemaker-sles-infra-setup.md#rt-sles).

To facilitate SAP installation, manually add the Overlay IPs to the instances:

1. To the instance where you intend to install the **ASCS** 

   ```
   # ip addr add <ascs_overlayip>/32 dev eth0
   ```

1. To the instance where you intend to install the **ERS** 

   ```
   # ip addr add <ers_overlayip>/32 dev eth0
   ```

Note the following:
+ Route table entries for the overlay IPs must be created first (see [Add VPC Route Table Entries for Overlay IPs](sap-nw-pacemaker-sles-infra-setup.md#rt-sles))
+ This IP configuration is temporary and will be lost after instance reboot
+ The cluster will take over management of these IPs once configured

## Hostname Resolution


You must ensure that all instances can resolve all hostnames in use. Add the hostnames for cluster nodes to `/etc/hosts` file on all cluster nodes. This ensures that hostnames for cluster nodes can be resolved even in case of DNS issues. Configure the `/etc/hosts` file for a two-node cluster:

```
# cat /etc/hosts
<primary_ip_1> <hostname_1>.example.com <hostname_1>
<primary_ip_2> <hostname_2>.example.com <hostname_2>
<ascs_overlayip> <ascs_virt_hostname>.example.com <ascs_virt_hostname>
<ers_overlayip> <ers_virt_hostname>.example.com <ers_virt_hostname>
```
+  *Example using values from [Parameter Reference](sap-nw-pacemaker-sles-parameters.md) *:

  ```
  # cat /etc/hosts
  10.1.10.1 slxhost01.example.com slxhost01
  10.1.20.1 slxhost02.example.com slxhost02
  172.16.30.5 slxascs.example.com slxascs
  172.16.30.6 slxers.example.com slxers
  ```

In this configuration, the secondary IPs used for the second cluster ring are not mentioned. They are only used in the cluster configuration. You can allocate virtual hostnames for administration and identification purposes.

**Important**  
The Overlay IP is out of VPC range, and cannot be reached from locations not associated with the route table, including on-premises.

# Install SAP


The following topics provide information about installing SAP on AWS Cloud in a highly available cluster. Review SAP Documentation for more details.

**Topics**
+ [

## Final checks for software provisioning
](#final-checks-software-provisioning-nw-sles)
+ [

## Install SAP ASCS and ERS instances
](#install-sap-instances-nw-sles)
+ [

## Kernel upgrade and ENSA2 – optional
](#kernel-ensa2-nw-sles)
+ [

## Check SAP host agent version
](#check-host-agent-nw-sles)

## Final checks for software provisioning


Before running SAP Software Provisioning Manager (SWPM), ensure that the following prerequisites are consistent across both cluster nodes:
+ Collect any missing details and populate the [Parameter Reference](sap-nw-pacemaker-sles-parameters.md) section to ensure clarity on the specific values used in installation commands.
+  **User and Group Configuration** - If operating system groups are pre-defined, ensure matching UID and GID values for `<sid>adm` and `sapsys` across both cluster nodes.
+  **Installation Software** - Download the latest version of Software Provisioning Manager (SWPM) and SAP installation media for your SAP release from [Software Provisioning Manager](https://support.sap.com/en/tools/software-logistics-tools/software-provisioning-manager.html).
+  **Network Configuration** - Verify both cluster nodes have identical configuration with all routes, overlay IPs, and virtual hostnames accessible. This ensures that either node can run ASCS or ERS roles.
+  **File Systems** - Verify all shared file systems are mounted and accessible from both nodes with consistent mount points and permissions.

## Install SAP ASCS and ERS instances


Install the SAP ASCS and ERS instances using their virtual hostnames to ensure installation against the overlay IP addresses. This approach is required for proper cluster integration.

Install the ASCS instance on `<instance_id_1>` using virtual hostname `<ascs_virt_hostname>` with the `SAPINST_USE_HOSTNAME` parameter. This ensures the installation uses the overlay IP rather than the physical hostname:

 *Example using values from [Parameter Reference](sap-nw-pacemaker-sles-parameters.md) *:

```
# <swpm location>/sapinst SAPINST_USE_HOSTNAME=<ascs_virt_hostname>
```

Install the ERS instance on `<instance_id_2>` using virtual hostname `<ers_virt_hostname>` with the `SAPINST_USE_HOSTNAME` parameter. This ensures the installation uses the overlay IP rather than the physical hostname:

```
# <swpm location>/sapinst SAPINST_USE_HOSTNAME=<ers_virt_hostname>
```

Once the ASCS and ERS installations are complete, you will need to install and configure the database and SAP Primary Application Server (PAS) - these components are not covered in this cluster setup documentation. Optionally, you can also install and configure Additional Application Server (AAS). For more details on installing these SAP NetWeaver components, refer to SAP Help Portal.

For additional information on unattended installation options, see [SAP Note 2230669 – System Provisioning Using an Input Parameter File](https://me.sap.com/notes/2230669) (requires SAP portal access).

## Kernel upgrade and ENSA2 – optional


As of AS ABAP Release 7.53 (ABAP Platform 1809), the new Standalone Enqueue Server 2 (ENSA2) is installed by default. ENSA2 replaces the previous version – ENSA1.

If you have an older version of SAP NetWeaver, consider following the SAP guidance to upgrade the kernel and update the Enqueue Server configuration. An upgrade will allow you to take advantage of the features available in the latest version. For more information, see the following SAP Notes (require SAP portal access):
+  [SAP Note 2630416 – Support for Standalone Enqueue Server 2](https://me.sap.com/notes/2630416) 
+  [SAP Note 2711036 – Usage of the Standalone Enqueue Server 2 in an HA Environment](https://me.sap.com/notes/2711036) 

## Check SAP host agent version


This is applicable to both cluster nodes. The SAP host agent is used for system instance control and monitoring. This agent is used by SAP cluster resource agents and hooks. It is recommended that you have the latest version installed on both instances. For more details, see [SAP Note 2219592 – Upgrade Strategy of SAP Host Agent](https://me.sap.com/notes/2219592).

Use the following command to check the version of the host agent:

```
# /usr/sap/hostctrl/exe/saphostexec -version
```

# Configure SAP for Cluster Control


Modify SAP service configurations, user permissions, and system integration settings to enable proper cluster control of ASCS and ERS instances.

**Topics**
+ [

## Add <sid>adm to haclient group
](#add-sidadm-haclient-nw-sles)
+ [

## Modify SAP profiles for start operations and cluster hook
](#modify-sap-profiles-nw-sles)
+ [

## Enable sapping and sappong Services (Simple-Mount Only)
](#sapping-sappong-services-nw-sles)
+ [

## Ensure ASCS and ERS SAP Services can run on either node (systemd)
](#modify-sapservices-nw-sles)
+ [

## Configure dependencies for Pacemaker and SAP services (systemd)
](#configure-systemd-deps-nw-sles)
+ [

## (Alternative) Ensure ASCS and ERS SAP Services can run on either node (sysV)
](#modify-sapservices-sysv-nw-sles)

## Add <sid>adm to haclient group


This is applicable to both cluster nodes. An `haclient` operating system group is created when the cluster connector package is installed. Adding the `<sid>adm` user to this group ensures that your cluster has necessary access. Run the following command as root:

```
# usermod -a -G haclient <sid>adm
```
+  *Example using values from [Parameter Reference](sap-nw-pacemaker-sles-parameters.md) *:

  ```
  # usermod -a -G haclient slxadm
  ```

## Modify SAP profiles for start operations and cluster hook


This action ensures that there is compatibility between the SAP start framework and cluster actions. Modify SAP profiles to change the start behavior of the SAP instance and processes. Ensure that `sapcontrol` is aware that the system is being managed by a pacemaker cluster.
+ ASCS profile – `/usr/sap/<SID>/SYS/profile/<SID>_ASCS<ascs_sys_nr>_<ascs_virt_hostname>` 
+ ERS profile – `/usr/sap/<SID>/SYS/profile/<SID>_ERS<ers_sys_nr>_<ers_virt_hostname>` 

The profile directory /usr/sap/<SID>/SYS/profile/ is typically a symbolic link to /sapmnt/<SID>/profile/ on the shared NFS filesystem. This means profile modifications made on one node are immediately visible on all cluster nodes. You can modify the profiles from either node.
+  *Example using values from [Parameter Reference](sap-nw-pacemaker-sles-parameters.md) *:
  + ASCS profile example – `/usr/sap/SLX/SYS/profile/SLX_ASCS00_slxascs` 
  + ERS profile example – `/usr/sap/SLX/SYS/profile/SLX_ERS10_slxers` 

Follow the procedure outlined below to make the necessary changes:

1.  **Program or process start behavior** – In case of failure, processes must be restarted. Determining where the process starts and in what order needs to be controlled by the cluster, and not SAP start framework behavior defined in the profiles. Your locks can be lost if this parameter is not changed. In newer SAP installations, the profiles may already contain `Start_Program_XX` instead of `Restart_Program_XX`. If `Start_Program_XX` is already present, no changes are needed for this step.  
**Example**  

------
#### [ ENSA1 ]

    **ASCS** 

   ```
   #For ENSA1 (_EN)
   #Changing Restart to Start for Cluster compatibility
   #Old value: Restart_Program_XX = local $(_EN) pf=$(_PF)
   
   Start_Program_XX = local $(_EN) pf=$(_PF)
   ```

    **ERS** 

   ```
   #For ENSA1 (_ER)
   #Changing Restart to Start for Cluster compatibility
   #Old value: Restart_Program_XX = local $(_ER) pf=$(_PFL)NR=$(SCSID)
   
   Start_Program_XX = local $(_ER) pf=$(_PFL) NR=$(SCSID)
   ```

    *`XX` indicates the start-up order. This value may be different in your install; retain the unchanged value.* 

------
#### [ ENSA2 ]

    **ASCS** 

   ```
   #For ENSA2 (_ENQ)
   #Changing Restart to Start for Cluster compatibility
   #Old value: Restart_Program_XX = local $(_ENQ) pf=$(_PF)
   
   Start_Program_XX = local $(_ENQ) pf=$(_PF)
   ```

    **ERS** 

   ```
   #For ENSA2 (_ENQR)
   #Changing Restart to Start for Cluster compatibility
   #Old value: Restart_Program_XX = local $(_ENQR) pf=$(_PFL)NR=$(SCSID)
   
   Start_Program_XX = local $(_ENQR) pf=$(_PFL) NR=$(SCSID)
   ```

    *`XX` indicates the start order. This value may be different in your install; retain the unchanged value.* 

------

1.  **Disable instance auto start in both profiles** – When an instance restarts, SAP start framework should not start ASCS and ERS automatically. Add the following parameter on both profiles to prevent an auto start:

   ```
   # Disable instance auto start
   Autostart = 0
   ```

1.  **Add cluster connector details in both profiles** – The connector integrates the SAP start and control frameworks of SAP NetWeaver with SUSE cluster to assist with maintenance and awareness of state. Add the following parameters on both profiles:

   ```
   # Added for Cluster Connectivity
   service/halib = $(DIR_EXECUTABLE)/saphascriptco.so
   service/halib_cluster_connector = /usr/bin/sap_suse_cluster_connector
   ```
**Important**  
RPM package `sap-suse-cluster-connector` has *dashes*. The executable `/usr/bin/sap_suse_cluster_connector` available after installation has *underscores*. Ensure that the correct name, that is executable `/usr/bin/sap_suse_cluster_connector`, is used in both profiles.

1.  **Restart services** – Restart SAP services for ASCS and ERS to ensure that the preceding settings take effect. Adjust the system number to match the service.

    **ASCS** 

   ```
   # /usr/sap/hostctrl/exe/sapcontrol -nr <ascs_sys_nr> -function RestartService
   ```

    **ERS** 

   ```
   # /usr/sap/hostctrl/exe/sapcontrol -nr <ers_sys_nr> -function RestartService
   ```
   +  *Example using values from [Parameter Reference](sap-nw-pacemaker-sles-parameters.md) *:

      **ASCS** 

     ```
     # /usr/sap/hostctrl/exe/sapcontrol -nr 00 -function RestartService
     ```

      **ERS** 

     ```
     # /usr/sap/hostctrl/exe/sapcontrol -nr 10 -function RestartService
     ```

1.  **Check integration using `sapcontrol` ** – `sapcontrol` includes functions: `HACheckConfig` and `HACheckFailoverConfig`. These functions can be used to check configuration, including awareness of the cluster connector. These checks have limited value before the cluster is configured, but you can run HACheckFailoverConfig to ensure the base configuration is in place.

    **ASCS** 

   ```
   # /usr/sap/hostctrl/exe/sapcontrol -nr <ascs_sys_nr> -function HACheckFailoverConfig
   ```
   +  *Example using values from [Parameter Reference](sap-nw-pacemaker-sles-parameters.md) *:

      **ASCS** 

     ```
     # /usr/sap/hostctrl/exe/sapcontrol -nr 00 -function HACheckFailoverConfig
     
     10.10.2025 01:23:55
     HACheckFailoverConfig
     OK
     state, category, description, comment
     SUCCESS, SAP CONFIGURATION, SAPInstance RA sufficient version, SAPInstance includes is-ers patch
     ```

## Enable sapping and sappong Services (Simple-Mount Only)


For simple-mount architecture, enable the sapping and sappong systemd services on both cluster nodes. These services ensure proper SAP instance startup coordination between systemd and the cluster.

The sapping service runs before sapinit during boot and temporarily hides the `/usr/sap/sapservices` file to prevent automatic SAP instance startup. The sappong service runs after sapinit and restores the sapservices file, making it available for cluster management while maintaining compatibility with SAP management tools.

```
# systemctl enable sapping
# systemctl enable sappong
```

Verify the services are enabled:

```
# systemctl status sapping
# systemctl status sappong
```

**Note**  
Both services will show "inactive (dead)" status, which is normal for one-shot services that only run during system boot.

## Ensure ASCS and ERS SAP Services can run on either node (systemd)


This is applicable to both cluster nodes.

To ensure that the cluster can orchestrate availability by starting and stopping instances on either cluster node, the SAP Services must be registerd on both nodes and auto-start should be disabled.

In recent Operating System and SAP kernel versions, SAP offers systemd integration for sapstartsrv which controls how SAP instances are stopped and started. This is the recommended configuration and a requirement for Simple Mount Configuration.

For more details, see the following SAP Notes (require SAP portal access):
+  [SAP Note 3139184 – Linux: systemd integration for sapstartsrv and SAP Host Agent](https://me.sap.com/notes/3139184) 
+  [SAP Note 3115048 – sapstartsrv with native Linux systemd support](https://me.sap.com/notes/3115048) 

You can confirm whether systemd is in place by running the following command. Systemd is in place if SAP Services (e.g., SAPSLX\$100.service, SAPSLX\$110.service) are listed.

```
# systemctl list-unit-files SAP*
```

If you have installed an ASCS or ERS on this host but no SAP Services are returned, the classic SysV init may be in use. In that case you can skip to section [(Alternative) Ensure ASCS and ERS SAP Services can run on either node (sysV)](#modify-sapservices-sysv-nw-sles) 

1.  **On the instance where the ASCS was installed** 

   Register the missing ERS service on the node where you have installed ASCS.

   1. Temporarily mount the ERS directory (classic only):

      ```
      # mount <nfs.fqdn>:/<SID>_ERS<ers_sys_nr>  /usr/sap/<SID>/ERS<ers_sys_nr>
      ```

   1. Register the ERS service:

      ```
      # export LD_LIBRARY_PATH=/usr/sap/<SID>/ERS<ers_sys_nr>/exe
      # /usr/sap/<SID>/ERS<ers_sys_nr>/exe/sapstartsrv pf=/usr/sap/<SID>/SYS/profile/<SID>_ERS<ers_sys_nr>_<ers_virt_hostname> -reg
      # systemctl start SAP<SID>_<ers_sys_nr>
      ```

   1. Check the existence and state of SAP services (example):

      ```
      # systemctl list-unit-files SAP*
      UNIT FILE                    STATE   VENDOR PRESET
      SAPSLX.service            disabled disabled
      SAPSLX.service            disabled disabled
      SAP.slice                   static   -
      3 unit files listed.
      ```

   1. If the state is not disabled, run the following command to disable `sapservices` integration for `SAP<SID>_<ascs_sys_nr>` and `SAP<SID>_<ers_sys_nr>` on both nodes:
**Important**  
Stopping these services also stops the associated SAP instances.

      ```
      # systemctl stop SAP<SID>_<ascs_sys_nr>.service
      # systemctl disable SAP<SID>_<ascs_sys_nr>.service
      # systemctl stop SAP<SID>_<ers_sys_nr>.service
      # systemctl disable SAP<SID>_<ers_sys_nr>.service
      ```

   1. Unmount the ERS directory (classic only):

      ```
      # umount /usr/sap/<SID>/ERS<ers_sys_nr>
      ```
      +  *Example using values from [Parameter Reference](sap-nw-pacemaker-sles-parameters.md) *:

        ```
        # mount <nfs.fqdn>:/SLX_ERS10  /usr/sap/SLX/ERS10
        # export LD_LIBRARY_PATH=/usr/sap/SLX/ERS10/exe
        # /usr/sap/SLX/ERS10/exe/sapstartsrv pf=/usr/sap/SLX/SYS/profile/SLX_ERS10_slxers -reg
        # systemctl start SAPSLX_10
        # systemctl stop SAPSLX_00.service
        # systemctl disable SAPSLX_00.service
        # systemctl stop SAPSLX_10.service
        # systemctl disable SAPSLX_10.service
        # umount /usr/sap/SLX/ERS10
        ```

1.  **On the instance where the ERS was installed** 

   Register the missing ASCS service on the node where you have installed ERS.

   1. Temporarily mount the ASCS directory (classic only):

      ```
      # mount <nfs.fqdn>:/<SID>_ASCS<ascs_sys_nr> /usr/sap/<SID>/ASCS<ascs_sys_nr>
      ```

   1. Register the ASCS service:

      ```
      # export LD_LIBRARY_PATH=/usr/sap/<SID>/ASCS<ascs_sys_nr>/exe
      # /usr/sap/<SID>/ASCS<ascs_sys_nr>/exe/sapstartsrv pf=/usr/sap/<SID>/SYS/profile/<SID>_ASCS<ascs_sys_nr>_<ascs_virt_hostname> -reg
      # systemctl start SAP<SID>_<ascs_sys_nr>
      ```

   1. Check the existence and state of SAP services (example):

      ```
      # systemctl list-unit-files SAP*
      UNIT FILE                    STATE   VENDOR PRESET
      SAPSLX00.service           disabled disabled
      SAPSLX00.service           disabled disabled
      SAP.slice                   static   -
      3 unit files listed.
      ```

   1. If the state is not disabled, run the following command to disable `sapservices` integration for `SAP<SID>_<ascs_sys_nr>` and `SAP<SID>_<ers_sys_nr>` on both nodes:
**Important**  
Stopping these services also stops the associated SAP instances.

      ```
      # systemctl stop SAP<SID>_<ascs_sys_nr>.service
      # systemctl disable SAP<SID>_<ascs_sys_nr>.service
      # systemctl stop SAP<SID>_<ers_sys_nr>.service
      # systemctl disable SAP<SID>_<ers_sys_nr>.service
      ```

   1. Unmount the ASCS directory (classic only):

      ```
      # umount /usr/sap/<SID>/ASCS<ascs_sys_nr>
      ```
      +  *Example using values from [Parameter Reference](sap-nw-pacemaker-sles-parameters.md) *:

        ```
        # mount <nfs.fqdn>:/SLX_ASCS00 /usr/sap/SLX/ASCS00
        # export LD_LIBRARY_PATH=/usr/sap/SLX/ASCS00/exe
        # /usr/sap/SLX/ASCS00/exe/sapstartsrv pf=/usr/sap/SLX/SYS/profile/SLX_ASCS00_slxascs -reg
        # systemctl start SAPSLX_00
        # systemctl stop SAPSLX_00.service
        # systemctl disable SAPSLX_00.service
        # systemctl stop SAPSLX_10.service
        # systemctl disable SAPSLX_10.service
        # umount /usr/sap/SLX/ASCS00
        ```

## Configure dependencies for Pacemaker and SAP services (systemd)


This step is required on both cluster nodes when using systemd integration.

When an EC2 instance shuts down unexpectedly, Pacemaker (the cluster resource manager) may trigger unnecessary fencing actions because it cannot distinguish between planned SAP service shutdowns and system failures. To prevent this, configure systemd dependencies that inform Pacemaker about the relationship between SAP services and cluster operations.

Create a systemd drop-in configuration for the `resource-agents-deps.target`, which is a systemd target that Pacemaker uses to understand external service dependencies:

```
# mkdir -p /etc/systemd/system/resource-agents-deps.target.d/
# cd /etc/systemd/system/resource-agents-deps.target.d/

# cat > sap_systemd_<sid>.conf <<_EOF
[Unit]
Requires=sapinit.service
After=sapinit.service
After=SAP<SID>_<ascs_sys_nr>.service
After=SAP<SID>_<ers_sys_nr>.service
_EOF

# systemctl daemon-reload
```
+  *Example using values from [Parameter Reference](sap-nw-pacemaker-sles-parameters.md) *:

  ```
  # cat > sap_systemd_slx.conf <<_EOF
  [Unit]
  Requires=sapinit.service
  After=sapinit.service
  After=SAPSLX_00.service
  After=SAPSLX_10.service
  _EOF
  
  # systemctl daemon-reload
  ```

## (Alternative) Ensure ASCS and ERS SAP Services can run on either node (sysV)


This is only applicable for if systemd integration is not in place.

To ensure that SAP instance can be managed by the cluster and also manually during planned maintenance activities, add the missing entries for ASCS and ERS `sapstartsrv` service in `/usr/sap/sapservices` file on both cluster nodes (ASCS and ERS host). Copy the missing entry from both hosts. Post-modifications, the `/usr/sap/sapservices` file looks as follows on both hosts:

```
#!/bin/sh
LD_LIBRARY_PATH=/usr/sap/<SID>/ASCS<ascs_sys_nr>/exe:$LD_LIBRARY_PATH; export LD_LIBRARY_PATH; /usr/sap/<SID>/ASCS<ascs_sys_nr>/exe/sapstartsrv pf=/usr/sap/<SID>/SYS/profile/<SID>_ASCS<ascs_sys_nr>_<ascs_virt_hostname> -D -u <sid>adm
LD_LIBRARY_PATH=/usr/sap/<SID>/ERS<ers_sys_nr>/exe:$LD_LIBRARY_PATH; export LD_LIBRARY_PATH; /usr/sap/<SID>/ERS<ers_sys_nr>/exe/sapstartsrv pf=/usr/sap/<SID>/SYS/profile/<SID>_ERS<ers_sys_nr>_<ers_virt_hostname> -D -u <sid>adm
```
+  *Example using values from [Parameter Reference](sap-nw-pacemaker-sles-parameters.md) *:

  ```
  #!/bin/sh
  LD_LIBRARY_PATH=/usr/sap/SLX/ASCS00/exe:$LD_LIBRARY_PATH; export LD_LIBRARY_PATH; /usr/sap/SLX/ASCS00/exe/sapstartsrv pf=/usr/sap/SLX/SYS/profile/SLX_ASCS00_slxascs -D -u slxadm
  LD_LIBRARY_PATH=/usr/sap/SLX/ERS10/exe:$LD_LIBRARY_PATH; export LD_LIBRARY_PATH; /usr/sap/SLX/ERS10/exe/sapstartsrv pf=/usr/sap/SLX/SYS/profile/SLX_ERS10_slxers -D -u slxadm
  ```

# Cluster Node Setup


Establish cluster communication between nodes using Corosync and configure required authentication.

**Topics**
+ [

## Change the hacluster Password
](#change-hacluster-password-nw-sles)
+ [

## Setup Passwordless Authentication
](#setup-passwordless-auth-nw-sles)
+ [

## Configure the Cluster Nodes
](#configure-cluster-nodes-nw-sles)
+ [

## Modify Generated Corosync Configuration
](#modify-corosync-config-nw-sles)
+ [

## Verify Corosync Configuration
](#verify-corosync-config-nw-sles)
+ [

## Configure Cluster Services
](#configure-cluster-services-nw-sles)
+ [

## Verify Cluster Status
](#verify-cluster-status-nw-sles)

## Change the hacluster Password


On all cluster nodes, change the password of the operating system user hacluster:

```
# passwd hacluster
```

## Setup Passwordless Authentication


SUSE cluster tools provide comprehensive reporting and troubleshooting capabilities for cluster activity. Many of these tools require passwordless SSH access between nodes to collect cluster-wide information effectively. SUSE recommends configuring passwordless SSH for the root user to enable seamless cluster diagnostics and reporting.

EC2 instances typically have no root password set. Use the shared `/sapmnt` filesystem to exchange SSH keys:

 **On the primary node (<hostname1>):** 

```
# ssh-keygen -t rsa -b 4096 -f /root/.ssh/id_rsa -N ''
# cp /root/.ssh/id_rsa.pub /sapmnt/node1_key.pub
```

 **On the secondary node (<hostname2>):** 

```
# ssh-keygen -t rsa -b 4096 -f /root/.ssh/id_rsa -N ''
# cp /root/.ssh/id_rsa.pub /sapmnt/node2_key.pub
# cat /sapmnt/node1_key.pub >> /root/.ssh/authorized_keys
# chmod 600 /root/.ssh/authorized_keys
```

 **Back on the primary node (<hostname1>):** 

```
# cat /sapmnt/node2_key.pub >> /root/.ssh/authorized_keys
# chmod 600 /root/.ssh/authorized_keys
```

 **Test connectivity from both nodes:** 

```
# ssh root@<opposite_hostname> 'hostname'
```

 **Clean up temporary files (from either node):** 

```
# rm /sapmnt/node1_key.pub /sapmnt/node2_key.pub
```

An alternative is to review the SUSE Dcoumentation for [Running cluster reports without root access](https://documentation.suse.com/sle-ha/15-SP7/html/SLE-HA-all/app-crmreport-nonroot.html) 

**Warning**  
Review the security implications for your organization, including root access controls and network segmentation, before implementing this configuration.

## Configure the Cluster Nodes


Initialize the cluster framework on the first node to recognise both cluster nodes.

On the primary node as root, run:

```
# crm cluster init -u -n <cluster_name> -N <hostname_1> <hostname_2>
```

 *Example using values from [Parameter Reference](sap-nw-pacemaker-sles-parameters.md) *:

```
# crm cluster init -u -y -n slx-sap-cluster -N slxhost01 -N slxhost02
INFO: Detected "amazon-web-services" platform
INFO: Loading "default" profile from /etc/crm/profiles.yml
INFO: "amazon-web-services" profile does not exist in /etc/crm/profiles.yml

INFO: Configuring csync2
INFO: Starting csync2.socket service on slxhost01
INFO: BEGIN csync2 checking files
INFO: END csync2 checking files
INFO: Configuring corosync (unicast)
WARNING: Not configuring SBD - STONITH will be disabled.
INFO: Hawk cluster interface is now running. To see cluster status, open:
INFO:   https://10.2.10.1:7630/
INFO: Log in with username 'hacluster'
INFO: Starting pacemaker.service on slxhost01
INFO: BEGIN Waiting for cluster
...........
INFO: END Waiting for cluster
INFO: Loading initial cluster configuration
INFO: Done (log saved to /var/log/crmsh/crmsh.log on slxhost01)
INFO: Adding node slxhost02 to cluster
INFO: Running command on slxhost02: crm cluster join -y  -c root@slxhost01
INFO: Configuring csync2
INFO: Starting csync2.socket service
INFO: BEGIN csync2 syncing files in cluster
INFO: END csync2 syncing files in cluster
INFO: Merging known_hosts
INFO: BEGIN Probing for new partitions
INFO: END Probing for new partitions
INFO: Hawk cluster interface is now running. To see cluster status, open:
INFO:   https://10.1.20.7:7630/
INFO: Log in with username 'hacluster'
INFO: Starting pacemaker.service on slxhost02
INFO: BEGIN Waiting for cluster
INFO: END Waiting for cluster
INFO: Set property "priority" in rsc_defaults to 1
INFO: BEGIN Reloading cluster configuration
INFO: END Reloading cluster configuration
INFO: Done (log saved to /var/log/crmsh/crmsh.log on slxhost02)
```

This command:
+ Initializes a two-node cluster named `myCluster` 
+ Configures unicast communication (-u)
+ Sets up the basic corosync configuration
+ Automatically joins the second node to the cluster
+ We do not configure SBD as an AWS Fencing Agent will be used for STONITH in AWS environments.
+ QDevice configuration is possible but not covered in this document. Refer to [SUSE Linux Enterprise High Availability Documentation - QDevice and QNetD](https://documentation.suse.com/en-us/sle-ha/15-SP7/html/SLE-HA-all/cha-ha-qdevice.html).

## Modify Generated Corosync Configuration


After initializing the cluster, the generated corosync configuration requires some modification to be optimised for cloud envrironments.

 **1. Edit the corosync configuration:** 

```
# vi /etc/corosync/corosync.conf
```

The generated file typically looks like this:

```
# Please read the corosync.conf.5 manual page
totem {
        version: 2
        cluster_name: myCluster
        clear_node_high_bit: yes
        interface {
                ringnumber: 0
                mcastport: 5405
                ttl: 1
        }

        transport: udpu
        crypto_hash: sha1
        crypto_cipher: aes256
        token: 5000     # This needs to be changed
        join: 60
        max_messages: 20
        token_retransmits_before_loss_const: 10
}

logging {
        fileline: off
        to_stderr: no
        to_logfile: yes
        logfile: /var/log/cluster/corosync.log
        to_syslog: yes
        debug: off
        timestamp: on
        logger_subsys {
                subsys: QUORUM
                debug: off
        }

}

nodelist {
    node {
        ring0_addr: <node1_primary_ip>    # Only single ring configured
        nodeid: 1
    }
    node {
        ring0_addr: <node2_primary_ip>    # Only single ring configured
        nodeid: 2
    }
}

quorum {

        # Enable and configure quorum subsystem (default: off)
        # see also corosync.conf.5 and votequorum.5
        provider: corosync_votequorum
        expected_votes: 2
        two_node: 1
}

totem {
    version: 2
    token: 5000             # This needs to be changed
    transport: udpu
    interface {
        ringnumber: 0
        mcastport: 5405
    }
}
```

 **2. Modify the configuration to add the second ring and optimize settings:** 

```
totem {
    token: 15000           # Changed from 5000 to 15000
    rrp_mode: passive      # Added for dual ring support
}

nodelist {
    node {
        ring0_addr: <node1_primary_ip>     # Primary network
        ring1_addr: <node1_secondary_ip>   # Added secondary network
        nodeid: 1
    }
    node {
        ring0_addr: <node2_primary_ip>     # Primary network
        ring1_addr: <node2_secondary_ip>   # Added secondary network
        nodeid: 2
    }
}
```

 *Example IP configuration:* 


| Network Interface | Node 1 | Node 2 | 
| --- | --- | --- | 
|  ring0\$1addr  |  10.2.10.1  |  10.2.20.1  | 
|  ring1\$1addr  |  10.2.10.2  |  10.2.20.2  | 

 **3. Synchronize the modified configuration to all nodes:** 

```
# csync2 -xvF /etc/corosync/corosync.conf
```

 **4. Restart the cluster** 

```
# crm cluster restart
# ssh root@<hostname2> 'crm cluster restart'
```

## Verify Corosync Configuration


Verify network rings are active:

```
# corosync-cfgtool -s
```

 *Example output*:

```
Printing ring status.
Local node ID 1
RING ID 0
        id      = 10.2.10.1
        status  = ring 0 active with no faults
RING ID 1
        id      = 10.2.10.2
        status  = ring 1 active with no faults
```

Both network rings should report "active with no faults". If either ring is missing, review the corosync configuration and check that `/etc/corosync/corosync.conf` changes have been synced to the secondary node. You may need to do this manually. Restart the cluster if needed.

## Configure Cluster Services


Enable pacemaker to start automatically after reboot:

```
# systemctl enable pacemaker
```

Enabling pacemaker also handles corosync through service dependencies. The cluster will start automatically after reboot. For troubleshooting scenarios, you can choose to manually start services after boot instead.

## Verify Cluster Status


 **1. Check pacemaker service status:** 

```
# systemctl status pacemaker
```

 **2. Verify cluster status:** 

```
# crm_mon -1
```

 *Example output*:

```
Cluster Summary:
  * Stack: corosync
  * Current DC: slxhost01 (version 2.1.5+20221208.a3f44794f) - partition with quorum
  * 2 nodes configured
  * 0 resource instances configured

Node List:
  * Online: [ slxhost01 slxhost02 ]

Active Resources:
  * No active resources
```

# Cluster Configuration


The following sections provide details on the resources, groups and constraints necessary to ensure high availability of SAP Central Services.

**Topics**
+ [

## Prepare for Resource Creation
](#prepare-resource-nw-sles)
+ [

## Cluster Bootstrap
](#cluster-bootstrap-nw-sles)
+ [

## Create STONITH (external/ec2) resource
](#create-stonith-ec2-nw-sles)
+ [

## Create Filesystem resources (classic only)
](#filesystem-resources-nw-sles)
+ [

## Create Overlay IP (aws-vpc-move-ip) resources
](#overlay-ip-resources-nw-sles)
+ [

## Create SAPStartSrv resources (simple-mount only)
](#sapstartsrv-resources-nw-sles)
+ [

## Create SAPInstance resources (simple-mount only)
](#sap-resources-simple-nw-sles)
+ [

## Create SAPInstance resources (classic only)
](#sap-resources-classic-nw-sles)
+ [

## Create resource groups for aws-vpc-move-ip / SAPStartSrv / SAPInstance (simple-mount only)
](#resource-groups-simple-nw-sles)
+ [

## Create resource groups for Filesystem / aws-vpc-move-ip / SAPInstance (classic only)
](#resource-groups-classic-nw-sles)
+ [

## Create resource constraints
](#resource-constraints-nw-sles)
+ [

## Reset Configuration – Optional
](#reset-config-nw-sles)

## Prepare for Resource Creation


To ensure that the cluster does not perform unexpected actions during setup of resources and configuration, set the maintenance mode to true.

Run the following command to put the cluster in maintenance mode:

```
# crm maintenance on
```

To verify the current maintenance state:

```
# crm status
```

**Note**  
There are two types of maintenance mode:  
Cluster-wide maintenance (set with `crm maintenance on`)
Node-specific maintenance (set with `crm node maintenance nodename`)
Always use cluster-wide maintenance mode when making configuration changes. For node-specific operations like hardware maintenance, refer to the Operations for proper procedures.  
To disable maintenance mode after configuration is complete:  

```
# crm maintenance off
```

## Cluster Bootstrap


### Configure Cluster Properties


Configure cluster properties to establish fencing behavior and resource failover settings:

```
# crm configure property stonith-enabled="true"
# crm configure property stonith-timeout="600"
# crm configure property priority-fencing-delay="20"
```
+ The **priority-fencing-delay** is recommended for protecting the SAP ASCS nodes during network partitioning events. When a cluster partition occurs, this delay gives preference to nodes hosting higher priority resources, with the ASCS receiving additional priority weighting over the ERS . This helps ensure the ASCS node survives in split-brain scenarios. The recommended 20 second priority-fencing-delay works in conjunction with the pcmk\$1delay\$1max (10 seconds) configured in the stonith resource, providing a total potential delay of up to 30 seconds before fencing occurs

To verify your cluster property settings:

```
# crm configure show property
```

### Configure Resource Defaults


Configure resource default behaviors:

```
# crm configure rsc_defaults resource-stickiness="1"
# crm configure rsc_defaults migration-threshold="3"
# crm configure rsc_defaults failure-timeout="600s"
```
+ The **resource-stickiness** value of 1 encourages the ASCS resource to stay on its current node, avoiding unnecessary resource movement.
+ The **migration-threshold** of causes a resource to move to a different node after 3 consecutive failures, ensuring timely failover when issues persist.
+ The **failure-timeout** automatically removes a failure count after 10 minutes, preventing individual historical failures from accumulating and affecting long-term resource behavior. If testing failover scenarios in quick succession, it may be necessary to manually query and clear accumulated failure counts between tests. Use `crm resource failcount <resource_name> show <hostname>` and `crm resource refresh`.

Individual resources may override these defaults with their own defined values.

To verify your resource default settings:

```
# crm configure show rsc_defaults
```

### Configure Operation Defaults


Configure operation timeout defaults:

```
# crm configure op_defaults timeout="600"
```
+ The **op\$1defaults timeout** ensures all cluster operations have a reasonable default timeout of 600 seconds. Individual resources may override this with their own timeout values.

To verify your operation default settings:

```
# crm configure show op_defaults
```

## Create STONITH (external/ec2) resource


Create the STONITH or Fencing resource using resource agent ** `external/ec2` **:

```
# crm configure primitive <stonith_resource_name> stonith:external/ec2 \
params tag="<cluster_tag>" profile="<cli_cluster_profile>" pcmk_delay_max="<delay_value>" \
op start interval="0" timeout="180" \
op stop interval="0" timeout="180" \
op monitor interval="300" timeout="60"
```

Details:
+  **tag** - EC2 instance tag key name that associates instances with this cluster configuration. This tag key must be unique within the AWS account and have a value which matches the instance hostname. See [Create Amazon EC2 Resource Tags Used by Amazon EC2 STONITH Agent](sap-nw-pacemaker-sles-ec2-configuration.md#create-cluster-tags-nw-sles) for EC2 instance tagging configuration.
+  **profile** - (optional) AWS CLI profile name for API authentication. Verify profile exists with `aws configure list-profiles`. If a profile is not explicitly configured the default profile will be used.
+  **pcmk\$1delay\$1max** - Random delay before fencing operations. Works in conjunction with cluster property `priority-fencing-delay` to prevent simultaneous fencing om 2-node clusters. For ENSA1 use 30 seconds, for ENSA2 use 10 seconds (lower value sufficient as `priority-fencing-delay` handles primary node protection). Omit in clusters with real quorum (3\$1 nodes) to avoid unnecessary delay.

**Example**  
 *Example using values from [Parameter Reference](sap-nw-pacemaker-sles-parameters.md) *:  

```
# crm configure primitive res_stonith_ec2 stonith:external/ec2 \
params tag="pacemaker" profile="cluster" \
pcmk_delay_max="30" \
op start interval="0" timeout="180" \
op stop interval="0" timeout="180" \
op monitor interval="300" timeout="60"
```
 *Example using values from [Parameter Reference](sap-nw-pacemaker-sles-parameters.md) *:  

```
# crm configure primitive res_stonith_ec2 stonith:external/ec2 \
params tag="pacemaker" profile="cluster" \
pcmk_delay_max="10" \
op start interval="0" timeout="180" \
op stop interval="0" timeout="180" \
op monitor interval="300" timeout="60"
```

## Create Filesystem resources (classic only)


In classic configuration, the mounting and unmounting of file system resources to align with the location of the SAP services is done using cluster resources.

Create **ASCS** file system resources:

```
# crm configure primitive rsc_fs_<SID>_ASCS<ascs_sys_nr> ocf:heartbeat:Filesystem \
params \
device="<nfs.fqdn>:/<SID>_ASCS<ascs_sys_nr>" \
directory="/usr/sap/<SID>/ASCS<ascs_sys_nr>" \
fstype="nfs4" \
options="rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2" \
op start timeout="60" interval="0" \
op stop timeout="60" interval="0" \
op monitor interval="20" timeout="40"
```

Create **ERS** file system resources:

```
# crm configure primitive rsc_fs_<SID>_ERS<ers_sys_nr> ocf:heartbeat:Filesystem \
params \
device="<nfs.fqdn>:/<SID>_ERS<ers_sys_nr>" \
directory="/usr/sap/<SID>/ERS<ers_sys_nr>" \
fstype="nfs4" \
options="rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2" \
op start timeout="60" interval="0" \
op stop timeout="60" interval="0" \
op monitor interval="20" timeout="40"
```
+  *Example using values from [Parameter Reference](sap-nw-pacemaker-sles-parameters.md) *:

  ```
  # crm configure primitive rsc_fs_SLX_ASCS00 ocf:heartbeat:Filesystem \
  params \
  device="fs-xxxxxxxxxxxxxefs1.efs.us-east-1.amazonaws.com:/SLX_ASCS00" \
  directory="/usr/sap/SLX/ASCS00" \
  fstype="nfs4" \
  options="rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2" \
  op start timeout="60" interval="0" \
  op stop timeout="60" interval="0" \
  op monitor interval="20" timeout="40"
  
  # crm configure primitive rsc_fs_SLX_ERS10 ocf:heartbeat:Filesystem \
  params \
  device="fs-xxxxxxxxxxxxxefs1.efs.us-east-1.amazonaws.com:/SLX_ERS10" \
  directory="/usr/sap/SLX/ERS10" \
  fstype="nfs4" \
  options="rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2" \
  op start timeout="60" interval="0" \
  op stop timeout="60" interval="0" \
  op monitor interval="20" timeout="40"
  ```

 **Notes** 
+ Review the mount options to ensure that they match with your operating system, NFS file system type, and the latest recommendations from SAP.
+ <nfs.fqdn> can either be an alias or the default file system resource name of the NFS or FSx for ONTAP resource. For example, `fs-xxxxxx.efs.xxxxxx.amazonaws.com`.

## Create Overlay IP (aws-vpc-move-ip) resources


The IP resource provides the details necessary to update the route table entry for overlay IP.

Create **ASCS** IP Resource:

```
# crm configure primitive rsc_ip_<SID>_ASCS<ascs_sys_nr> ocf:heartbeat:aws-vpc-move-ip \
params \
ip="<ascs_overlayip>" \
routing_table="<routetable_id>" \
interface="eth0" \
profile="<cli_cluster_profile>" \
op start interval="0" timeout="180" \
op stop interval="0" timeout="180" \
op monitor interval="20" timeout="40"
```

Create **ERS** IP Resource:

```
# crm configure primitive rsc_ip_<SID>_ERS<ers_sys_nr> ocf:heartbeat:aws-vpc-move-ip \
params \
ip="<ers_overlayip>" \
routing_table="<routetable_id>" \
interface="eth0" \
profile="<cli_cluster_profile>" \
op start interval="0" timeout="180" \
op stop interval="0" timeout="180" \
op monitor interval="20" timeout="40"
```
+  *Example using values from [Parameter Reference](sap-nw-pacemaker-sles-parameters.md) *:

  ```
  # crm configure primitive rsc_ip_SLX_ASCS00 ocf:heartbeat:aws-vpc-move-ip \
  params \
  ip="172.16.30.5" \
  routing_table="rtb-xxxxxroutetable1" \
  interface="eth0" \
  profile="cluster" \
  op start interval="0" timeout="180" \
  op stop interval="0" timeout="180" \
  op monitor interval="20" timeout="40"
  
  # crm configure primitive rsc_ip_SLX_ERS10 ocf:heartbeat:aws-vpc-move-ip \
  params \
  ip="172.16.30.6" \
  routing_table="rtb-xxxxxroutetable1" \
  interface="eth0" \
  profile="cluster" \
  op start interval="0" timeout="180" \
  op stop interval="0" timeout="180" \
  op monitor interval="20" timeout="40"
  ```

 **Notes** 
+ If more than one route table is required for connectivity or because of subnet associations, the `routing_table` parameter can have multiple values separated by a comma. For example, `routing_table=rtb-xxxxxroutetable1,rtb-xxxxxroutetable2`.
+ Additional parameters – `lookup_type` and `routing_table_role` are required for shared VPC. For more information, see \$1https---docs-aws-amazon-com-sap-latest-sap-netweaver-sles-netweaver-ha-settings-html-sles-netweaver-ha-shared-vpc\$1[Shared VPC – optional].

## Create SAPStartSrv resources (simple-mount only)


In simple-mount architecture, the `sapstartsrv` process that is used to control start/stop and monitoring of an SAP instance, is controlled by a cluster resource. This new resource adds additional control that removes the requirement for file system resources to be restricted to a single node.

Modify and run the commands in the table to create `sapstartsrv` resource.

Create **ASCS** SAPStartSrv Resource

Use the following command to create an ASCS SAPStartSrv resource.

```
# crm configure primitive rsc_sapstart_<SID>_ASCS<ascs_sys_nr> ocf:suse:SAPStartSrv \
params \
InstanceName=<SID>_ASCS<ascs_sys_nr>_<ascs_virt_hostname>
```

Create **ERS** SAPStartSrv Resource

Use the following command to create an ERS SAPStartSrv resource.

```
# crm configure primitive rsc_sapstart_<SID>_ERS<ers_sys_nr> ocf:suse:SAPStartSrv \
params  \
InstanceName=<SID>_ERS<ers_sys_nr>_<ers_virt_hostname>
```
+  *Example using values from [Parameter Reference](sap-nw-pacemaker-sles-parameters.md) *:

  ```
  #crm configure primitive rsc_sapstart_SLX_ASCS00 ocf:suse:SAPStartSrv \
  params \
  InstanceName=SLX_ASCS00_slxascs
  
  #crm configure primitive rsc_sapstart_SLX_ERS10 ocf:suse:SAPStartSrv \
  params \
  InstanceName=SLX_ERS10_slxers
  ```

## Create SAPInstance resources (simple-mount only)


The minor difference in creating SAP instance resources between classic and simple-mount configurations is the addition of `MINIMAL_PROBE=true` parameters.

The SAP instance is started and stopped using cluster resources.

**Example**  
Create an **ASCS** SAP instance resource:  

```
# crm configure primitive rsc_sap_<SID>_ASCS<ascs_sys_nr> ocf:heartbeat:SAPInstance \
params \
InstanceName="<SID>_ASCS<ascs_sys_nr>_<ascs_virt_hostname>" \
START_PROFILE="/usr/sap/<SID>/SYS/profile/<SID>_ASCS<ascs_sys_nr>_<ascs_virt_hostname>" \
AUTOMATIC_RECOVER="false" \
MINIMAL_PROBE="true" \
operations \$id="rsc_sap_<SID>_ASCS<ascs_sys_nr>-operations" \
op start interval="0" timeout="600" \
op stop interval="0" timeout="240" \
op monitor interval="11" timeout="60" on-fail="restart" \
meta \
resource-stickiness="5000" \
failure-timeout="60" \
migration-threshold="1" \
priority="10"
```
Create an **ERS** SAP instance resource:  

```
# crm configure primitive rsc_sap_<SID>_ERS<ers_sys_nr> ocf:heartbeat:SAPInstance \
params \
InstanceName="<SID>_ERS<ers_sys_nr>_<ers_virt_hostname>" \
START_PROFILE="/usr/sap/<SID>/SYS/profile/<SID>_ERS<ers_sys_nr>_<ers_virt_hostname>" \
AUTOMATIC_RECOVER="false" \
MINIMAL_PROBE="true" \
IS_ERS="true" \
operations \$id="rsc_sap_<SID>_ERS<ers_sys_nr>-operations" \
op start interval="0" timeout="240" \
op stop interval="0" timeout="240" \
op monitor interval="11" timeout="60" on-fail="restart" \
meta \
priority="1000"
```
+  *Example using values from [Parameter Reference](sap-nw-pacemaker-sles-parameters.md) *:

  ```
  # crm configure primitive rsc_sap_SLX_ASCS00 ocf:heartbeat:SAPInstance \
  params \
  InstanceName="SLX_ASCS00_slxascs" \
  START_PROFILE="/usr/sap/SLX/SYS/profile/SLX_ASCS00_slxascs" \
  AUTOMATIC_RECOVER="false" \
  MINIMAL_PROBE="true" \
  operations \$id="rsc_sap_SLX_ASCS00-operations" \
  op start interval="0" timeout="600" \
  op stop interval="0" timeout="240" \
  op monitor interval="11" timeout="60" on-fail="restart" \
  meta \
  resource-stickiness="5000" \
  failure-timeout="60" \
  migration-threshold="1" \
  priority="10"
  
  # crm configure primitive rsc_sap_SLX_ERS10 ocf:heartbeat:SAPInstance \
  params \
  InstanceName="SLX_ERS10_slxers" \
  START_PROFILE="/usr/sap/SLX/SYS/profile/SLX_ERS10_slxers" \
  AUTOMATIC_RECOVER="false" \
  MINIMAL_PROBE="true" \
  IS_ERS="true" \
  operations \$id="rsc_sap_SLX_ERS10-operations" \
  op start interval="0" timeout="240" \
  op stop interval="0" timeout="240" \
  op monitor interval="11" timeout="60" on-fail="restart" \
  meta \
  priority="1000"
  ```
Create an **ASCS** SAP instance resource:  

```
# crm configure primitive rsc_sap_<SID>_ASCS<ascs_sys_nr> ocf:heartbeat:SAPInstance \
params \
InstanceName="<SID>_ASCS<ascs_sys_nr>_<ascs_virt_hostname>" \
START_PROFILE="/usr/sap/<SID>/SYS/profile/<SID>_ASCS<ascs_sys_nr>_<ascs_virt_hostname>" \
AUTOMATIC_RECOVER="false" \
MINIMAL_PROBE="true" \
operations \$id="rsc_sap_<SID>_ASCS<ascs_sys_nr>-operations" \
op start interval="0" timeout="600" \
op stop interval="0" timeout="240" \
op monitor interval="11" timeout="60" on-fail="restart" \
meta \
resource-stickiness="5000" \
priority="1000"
```
Create an **ERS** SAP instance resource:  

```
# crm configure primitive rsc_sap_<SID>_ERS<ers_sys_nr> ocf:heartbeat:SAPInstance \
params \
InstanceName="<SID>_ERS<ers_sys_nr>_<ers_virt_hostname>" \
START_PROFILE="/usr/sap/<SID>/SYS/profile/<SID>_ERS<ers_sys_nr>_<ers_virt_hostname>" \
AUTOMATIC_RECOVER="false" \
MINIMAL_PROBE="true" \
IS_ERS="true" \
operations \$id="rsc_sap_<SID>_ERS<ers_sys_nr>-operations" \
op start interval="0" timeout="600" \
op stop interval="0" timeout="240" \
op monitor interval="11" timeout="60" on-fail="restart"
```
+  *Example using values from [Parameter Reference](sap-nw-pacemaker-sles-parameters.md) *:

  ```
  # crm configure primitive rsc_sap_SLX_ASCS00 ocf:heartbeat:SAPInstance \
  params \
  InstanceName="SLX_ASCS00_slxascs" \
  START_PROFILE="/usr/sap/SLX/SYS/profile/SLX_ASCS00_slxascs" \
  AUTOMATIC_RECOVER="false" \
  MINIMAL_PROBE="true" \
  operations \$id="rsc_sap_SLX_ASCS00-operations" \
  op start interval="0" timeout="600" \
  op stop interval="0" timeout="240" \
  op monitor interval="11" timeout="60" on-fail="restart" \
  meta \
  resource-stickiness="5000" \
  priority="1000"
  
  # crm configure primitive rsc_sap_SLX_ERS10 ocf:heartbeat:SAPInstance \
  params \
  InstanceName="SLX_ERS10_slxers" \
  START_PROFILE="/usr/sap/SLX/SYS/profile/SLX_ERS10_slxers" \
  AUTOMATIC_RECOVER="false" \
  MINIMAL_PROBE="true" \
  IS_ERS="true" \
  operations \$id="rsc_sap_SLX_ERS10-operations" \
  op start interval="0" timeout="600" \
  op stop interval="0" timeout="240" \
  op monitor interval="11" timeout="60" on-fail="restart"
  ```

The difference between ENSA1 and ENSA2 is that ENSA2 allows the lock table to be consumed remotely, which means that for ENSA2, ASCS can restart in its current location (assuming the node is still available). This change impacts stickiness, migration and priority parameters. Ensure that you use the right command for your enqueue version.

## Create SAPInstance resources (classic only)


The SAP instance is started and stopped using cluster resources.

**Example**  
Create an **ASCS** SAPInstance resource:  

```
# crm configure primitive rsc_sap_<SID>_ASCS<ascs_sys_nr> ocf:heartbeat:SAPInstance \
params \
InstanceName="<SID>_ASCS<ascs_sys_nr>_<ascs_virt_hostname>" \
START_PROFILE="/usr/sap/<SID>/SYS/profile/<SID>_ASCS<ascs_sys_nr>_<ascs_virt_hostname>" \
AUTOMATIC_RECOVER="false" \
operations \$id="rsc_sap_<SID>_ASCS<ascs_sys_nr>-operations" \
op start interval="0" timeout="600" \
op stop interval="0" timeout="240" \
op monitor interval="11" timeout="60" on-fail="restart" \
meta \
resource-stickiness="5000" \
failure-timeout="60" \
migration-threshold="1" \
priority="10"
```
Create an **ERS** SAPInstance resource:  

```
# crm configure primitive rsc_sap_<SID>_ERS<ers_sys_nr> ocf:heartbeat:SAPInstance \
params \
InstanceName="<SID>_ERS<ers_sys_nr>_<ers_virt_hostname>" \
START_PROFILE="/usr/sap/<SID>/SYS/profile/<SID>_ERS<ers_sys_nr>_<ers_virt_hostname>" \
AUTOMATIC_RECOVER="false" \
IS_ERS="true" \
operations \$id="rsc_sap_<SID>_ERS<ers_sys_nr>-operations" \
op start interval="0" timeout="600" \
op stop interval="0" timeout="240" \
op monitor interval="11" timeout="60" on-fail="restart" \
meta \
priority="1000"
```
+  *Example using values from [Parameter Reference](sap-nw-pacemaker-sles-parameters.md) *:

  ```
  # crm configure primitive rsc_sap_SLX_ASCS00 ocf:heartbeat:SAPInstance \
  params \
  InstanceName="SLX_ASCS00_slxascs" \
  START_PROFILE="/usr/sap/SLX/SYS/profile/SLX_ASCS00_slxascs" \
  AUTOMATIC_RECOVER="false" \
  operations \$id="rsc_sap_SLX_ASCS00-operations" \
  op start interval="0" timeout="600" \
  op stop interval="0" timeout="240" \
  op monitor interval="11" timeout="60" on-fail="restart" \
  meta \
  resource-stickiness="5000" \
  failure-timeout="60" \
  migration-threshold="1" \
  priority="10"
  
  # crm configure primitive rsc_sap_SLX_ERS10 ocf:heartbeat:SAPInstance \
  params \
  InstanceName="SLX_ERS10_slxers" \
  START_PROFILE="/usr/sap/SLX/SYS/profile/SLX_ERS10_slxers" \
  AUTOMATIC_RECOVER="false" \
  IS_ERS="true" \
  operations \$id="rsc_sap_SLX_ERS10-operations" \
  op start interval="0" timeout="600" \
  op stop interval="0" timeout="240" \
  op monitor interval="11" timeout="60" on-fail="restart" \
  meta \
  priority="1000"
  ```
Create an **ASCS** SAPInstance resource:  

```
# crm configure primitive rsc_sap_<SID>_ASCS<ascs_sys_nr> ocf:heartbeat:SAPInstance \
params \
InstanceName="<SID>_ASCS<ascs_sys_nr>_<ascs_virt_hostname>" \
START_PROFILE="/usr/sap/<SID>/SYS/profile/<SID>_ASCS<ascs_sys_nr>_<ascs_virt_hostname>" \
AUTOMATIC_RECOVER="false" \
operations \$id="rsc_sap_<SID>_ASCS<ascs_sys_nr>-operations" \
op start interval="0" timeout="600" \
op stop interval="0" timeout="240" \
op monitor interval="11" timeout="60" on-fail="restart" \
meta \
resource-stickiness="5000" \
priority="1000"
```
Create an **ERS** SAPInstance resource:  

```
# crm configure primitive rsc_sap_<SID>_ERS<ers_sys_nr> ocf:heartbeat:SAPInstance \
params \
InstanceName="<SID>_ERS<ers_sys_nr>_<ers_virt_hostname>" \
START_PROFILE="/usr/sap/<SID>/SYS/profile/<SID>_ERS<ers_sys_nr>_<ers_virt_hostname>" \
AUTOMATIC_RECOVER="false" \
IS_ERS="true" \
operations \$id="rsc_sap_<SID>_ERS<ers_sys_nr>-operations" \
op start interval="0" timeout="600" \
op stop interval="0" timeout="240" \
op monitor interval="11" timeout="60" on-fail="restart"
```
+  *Example using values from [Parameter Reference](sap-nw-pacemaker-sles-parameters.md) *:

  ```
  # crm configure primitive rsc_sap_SLX_ASCS00 ocf:heartbeat:SAPInstance \
  params \
  InstanceName="SLX_ASCS00_slxascs" \
  START_PROFILE="/usr/sap/SLX/SYS/profile/SLX_ASCS00_slxascs" \
  AUTOMATIC_RECOVER="false" \
  operations \$id="rsc_sap_SLX_ASCS00-operations" \
  op start interval="0" timeout="600" \
  op stop interval="0" timeout="240" \
  op monitor interval="11" timeout="60" on-fail="restart" \
  meta \
  resource-stickiness="5000" \
  priority="1000"
  
  # crm configure primitive rsc_sap_SLX_ERS10 ocf:heartbeat:SAPInstance \
  params \
  InstanceName="SLX_ERS10_slxers" \
  START_PROFILE="/usr/sap/SLX/SYS/profile/SLX_ERS10_slxers" \
  AUTOMATIC_RECOVER="false" \
  IS_ERS="true" \
  operations \$id="rsc_sap_SLX_ERS10-operations" \
  op start interval="0" timeout="600" \
  op stop interval="0" timeout="240" \
  op monitor interval="11" timeout="60" on-fail="restart"
  ```

The change between ENSA1 and ENSA2 allows the lock table to be consumed remotely. If the node is still available, ASCS can restart in its current location for ENSA2. This impacts stickiness, migration, and priority parameters. Make sure to use the right command, depending on your enqueue server.

## Create resource groups for aws-vpc-move-ip / SAPStartSrv / SAPInstance (simple-mount only)


A cluster resource group is a set of resources that need to be located together, start sequentially, and stopped in the reverse order.

In simple-mount architecture, the overlay IP must be available first, then the SAP start services are started before the SAP instance can start. The order of the group must be as defined here.

Create an **ASCS** cluster resource group:

```
# crm configure group grp_<SID>_ASCS<ascs_sys_nr> \
rsc_ip_<SID>_ASCS<ascs_sys_nr> \
rsc_sapstart_<SID>_ASCS<ascs_sys_nr> \
rsc_sap_<SID>_ASCS<ascs_sys_nr> \
meta resource-stickiness="3000"
```

Create an **ERS** cluster resource group:

```
# crm configure group grp_<SID>_ERS<ers_sys_nr> \
rsc_ip_<SID>_ERS<ers_sys_nr> \
rsc_sapstart_<SID>_ERS<ers_sys_nr> \
rsc_sap_<SID>_ERS<ers_sys_nr>
```
+  *Example using values from [Parameter Reference](sap-nw-pacemaker-sles-parameters.md) *:

  ```
  # crm configure group grp_SLX_ASCS00 \
  rsc_ip_SLX_ASCS00 \
  rsc_sapstart_SLX_ASCS00 \
  rsc_sap_SLX_ASCS00 \
  meta resource-stickiness="3000"
  
  # crm configure group grp_SLX_ERS10 \
  rsc_ip_SLX_ERS10 \
  rsc_sapstart_SLX_ERS10 \
  rsc_sap_SLX_ERS10
  ```

## Create resource groups for Filesystem / aws-vpc-move-ip / SAPInstance (classic only)


A cluster resource group is a set of resources that need to be located together, start sequentially, and stopped in the reverse order.

In classic architecture, the file system is mounted first, then the overlay IP must be available before the SAP instance can start.

Create an **ASCS** cluster resource group:

```
# crm configure group grp_<SID>_ASCS<ascs_sys_nr> \
rsc_fs_<SID>_ASCS<ascs_sys_nr> \
rsc_ip_<SID>_ASCS<ascs_sys_nr> \
rsc_sap_<SID>_ASCS<ascs_sys_nr> \
meta resource-stickiness="3000"
```

Create an **ERS** cluster resource group:

```
# crm configure group grp_<SID>_ERS<ers_sys_nr> \
rsc_fs_<SID>_ERS<ers_sys_nr> \
rsc_ip_<SID>_ERS<ers_sys_nr> \
rsc_sap_<SID>_ERS<ers_sys_nr>
```
+  *Example using values from [Parameter Reference](sap-nw-pacemaker-sles-parameters.md) *:

  ```
  # crm configure group grp_SLX_ASCS00 \
  rsc_fs_SLX_ASCS00 \
  rsc_ip_SLX_ASCS00 \
  rsc_sap_SLX_ASCS00 \
  meta resource-stickiness="3000"
  
  # crm configure group grp_SLX_ERS10 \
  rsc_fs_SLX_ERS10 \
  rsc_ip_SLX_ERS10 \
  rsc_sap_SLX_ERS10
  ```

## Create resource constraints


Resource constraints are used to determine where resources run per the conditions. Constraints for SAP NetWeaver ensure that ASCS and ERS are started on separate nodes and locks are preserved in case of failures. The following are the different types of constraints.

### Colocation constraint


The negative score ensures that ASCS and ERS are run on separate nodes, wherever possible.

```
# crm configure colocation col_sap_<SID>_ascs_ers_separate_nodes \
-5000: grp_<SID>_ERS<ers_sys_nr> grp_<SID>_ASCS<ascs_sys_nr>
```
+  *Example using values from [Parameter Reference](sap-nw-pacemaker-sles-parameters.md) *:

  ```
  # crm configure colocation col_sap_SLX_ascs_ers_separate_nodes \
  -5000: grp_SLX_ERS10 grp_SLX_ASCS00
  ```

### Order constraint


This constraint ensures the ASCS instance is started prior to stopping the ERS instance. This is necessary to consume the lock table.

```
# crm configure order ord_sap_<SID>_ascs_start_before_ers_stop \
Optional: rsc_sap_<SID>_ASCS<ascs_sys_nr>:start rsc_sap_<SID>_ERS<ers_sys_nr>:stop \
symmetrical="false"
```
+  *Example using values from [Parameter Reference](sap-nw-pacemaker-sles-parameters.md) *:

  ```
  # crm configure order ord_sap_SLX_ascs_start_before_ers_stop \
  Optional: rsc_sap_SLX_ASCS00:start rsc_sap_SLX_ERS10:stop \
  symmetrical="false"
  ```

### Location constraint (ENSA1 only)


This constraint is only required for ENSA1. The lock table can be retrieved remotely for ENSA2, and as a result ASCS doesn’t failover to where ERS is running.

```
# crm configure location loc_sap_<SID>_ascs_follows_ers \
rsc_sap_<SID>_ASCS<ascs_sys_nr> rule 2000: runs_ers_<SID> eq 1
```
+  *Example using values from [Parameter Reference](sap-nw-pacemaker-sles-parameters.md) *:

  ```
  # crm configure location loc_sap_SLX_ascs_follows_ers \
  rsc_sap_SLX_ASCS00 rule 2000: runs_ers_SLX eq 1
  ```

## Reset Configuration – Optional


**Important**  
The following instructions help you reset the complete configuration. Run these commands only if you want to start setup from the beginning. You can make minor changes with the crm edit command.

Run the following command to back up the current configuration for reference:

```
# crm config show > /tmp/crmconfig_backup.txt
```

Run the following command to clear the current configuration:

```
# crm configure erase
```

Once the preceding erase command is executed, it removes all of the cluster resources from Cluster Information Base (CIB), and disconnects the communication from corosync to the cluster. Before starting the resource configuration run crm cluster restart, so that cluster reestablishes communication with corosync, and retrieves the configuration. The restart of cluster removes maintenance mode. Reapply before commencing additional configuration and resource setup.

# Operations


This section covers the following topics.

**Topics**
+ [

# Viewing the cluster state
](cluster-state-nw-sles.md)
+ [

# Performing planned maintenance
](planned-maintenance-nw-sles.md)
+ [

# Post-failure analysis and reset
](analysis-reset-nw-sles.md)
+ [

# Alerting and monitoring
](alerting-monitoring-nw-sles.md)

# Viewing the cluster state


You can view the state of the cluster in two ways - based on your operating system or with a web based console provided by SUSE.

**Topics**
+ [

## Operating system based
](#os-based-nw-sles)
+ [

## SUSE Hawk2
](#suse-hawk)

## Operating system based


There are multiple operating system commands that can be run as root or as a user with appropriate permissions. The commands enable you to get an overview of the status of the cluster and its services. See the following commands for more details.

```
# crm status
```

Sample output:

```
slxhost01:~ # crm status
Cluster Summary:
  * Stack: corosync
  * Current DC: slxhost01 (version 2.0.5+20201202.ba59be712-150300.4.24.1-2.0.5+20201202.ba59be712) - partition with quorum
  * Last updated: Tue Nov  1 13:41:58 2022
  * Last change:  Fri Oct 28 08:55:43 2022 by root via crm_attribute on slxhost02
  * 2 nodes configured
  * 7 resource instances configured

Node List:
  * Online: [ slxhost01 slxhost02 ]

Full List of Resources:
  * Resource Group: grp_SLX_ASCS00:
    * rsc_ip_SLX_ASCS00 (ocf::heartbeat:aws-vpc-move-ip):        Started slxhost01
    * rsc_sapstart_SLX_ASCS00   (ocf::suse:SAPStartSrv):         Started slxhost01
    * rsc_sap_SLX_ASCS00        (ocf::heartbeat:SAPInstance):    Started slxhost01
  * res_AWS_STONITH     (stonith:external/ec2):  Started slxhost02
  * Resource Group: grp_SLX_ERS10:
    * rsc_ip_SLX_ERS10  (ocf::heartbeat:aws-vpc-move-ip):        Started slxhost02
    * rsc_sapstart_SLX_ERS10    (ocf::suse:SAPStartSrv):         Started slxhost02
    * rsc_sap_SLX_ERS10 (ocf::heartbeat:SAPInstance):    Started slxhost02
```

The following table provides a list of useful commands.


| Command | Description | 
| --- | --- | 
|   `crm_mon`   |  Display cluster status on the console with updates as they occur  | 
|   `crm_mon -1`   |  Display cluster status on the console just once, and exit  | 
|   `crm_mon -Arnf`   |  -A Display node attributes -n Group resources by node -r Display inactive resources -f Display resource fail counts  | 
|   `crm help`   |  View more options  | 
|   `crm_mon --help-all`   |  View more options  | 

## SUSE Hawk2


Hawk2 is a web-based graphical user interface for managing and monitoring pacemaker highly availability clusters. It must be enabled on every node in the cluster, to point your web browser on any node for accessing it. Use the following command to enable Hawk2.

```
# systemctl enable --now hawk
# systemctl status hawk
```

Use the following URL to check security groups for access on port 7630 from your administrative host.

```
https://your-server:7630/

e.g https://slxhost01:7630
```

For more information, see [Configuring and Managing Cluster Resources with Hawk2](https://documentation.suse.com/sle-ha/12-SP5/html/SLE-HA-all/cha-conf-hawk2.html) in the SUSE Documentation.

# Performing planned maintenance


The cluster connector is designed to integrate the cluster with SAP start framework (`sapstartsrv`), including the rolling kernel switch (RKS) awareness. Stopping and starting the SAP system using `sapcontrol` should not result in any cluster remediation activities as these actions are not interpreted as failures. Validate this scenario when testing your cluster.

There are different options to perform planned maintenance on nodes, resources, and the cluster.

**Topics**
+ [

## Maintenance mode
](#maintenance-mode-nw-sles)
+ [

## Placing a node in standby mode
](#node-standby-nw-sles)
+ [

## Moving a resource
](#moving-resource-nw-sles)

## Maintenance mode


Use maintenance mode if you want to make any changes to the configuration or take control of the resources and nodes in the cluster. In most cases, this is the safest option for administrative tasks.

**Example**  
Use one of the following commands to turn on maintenance mode.  

```
# crm maintenance on
```

```
# crm configure property maintenance-mode="true"
```
Use one of the following commands to turn off maintenance mode.  

```
# crm maintenance off
```

```
# crm configure property maintenance-mode="false"
```

## Placing a node in standby mode


To perform maintenance on the cluster without system outage, the recommended method for moving active resources is to place the node you want to remove from the cluster in standby mode.

```
# crm node standby <hostname>
```

The cluster will cleanly relocate resources, and you can perform activities, including reboots on the node in standby mode. When maintenance activities are complete, you can re-introduce the node with the following command.

```
# crm node online <hostname>
```

## Moving a resource


Moving individual resources is not recommended because of the migration or move constraints that are created to lock the resource in its new location. These can be cleared as described in the info messages, but this introduces an additional setup.

```
<slxhost01>:~ crm resource move grp_<SLX>_ASCS<00> <slxhost02>
INFO: Move constraint created for grp_<SLX>_ASCS<00> to <slxhost02>
INFO: Use `crm resource clear grp_<SLX>_ASCS<00>` to remove this constraint
```

Use the following command once the resources have relocated to their target location.

```
# crm resource clear grp_SLX_ASCS00
```

# Post-failure analysis and reset


A review must be conducted after each failure to understand the source of failure as well the reaction of the cluster. In most scenarios, the cluster prevents an application outage. However, a manual action is often required to reset the cluster to a protective state for any subsequent failures.

**Topics**
+ [

## Checking the logs
](#checking-logs-nw-sles)
+ [

## Cleanup crm status
](#cleanup-crm-nw-sles)
+ [

## Restart failed nodes or pacemaker
](#restart-nodes-nw-sles)
+ [

## Further Analysis
](#_further_analysis)

## Checking the logs

+ For troubleshooting cluster issues, use journalctl to examine both pacemaker and corosync logs:

  ```
  # journalctl -u pacemaker -u corosync --since "1 hour ago"
  ```
  + Use `--since` to specify time periods (e.g., "2 hours ago", "today")
  + Add `-f` to follow logs in real-time
  + Combine with grep for specific searches
+ System messages and resource agent activity can be found in `/var/log/messages`.

Application based failures can be investigated in the SAP work directory.

## Cleanup crm status


If failed actions are reported using the `crm status` command, and if they have already been investigated, then you can clear the reports with the following command.

```
# crm resource cleanup <resource> <hostname>
```

## Restart failed nodes or pacemaker


It is recommended that failed (or fenced) nodes are not automatically restarted. It gives operators a chance to investigate the failure, and ensure that the cluster doesn’t make assumptions about the state of resources.

You need to restart the instance or the pacemaker service based on your approach.

## Further Analysis


For cluster-specific issues, use `hb_report` to generate a targeted analysis of cluster components across all nodes:

```
# hb_report -f "YYYY-MM-DD HH:MM:SS" -t "YYYY-MM-DD HH:MM:SS" /tmp/hb_report
```

For quick analysis of recent events, you can use:

```
# crm history events
# crm history log
```
+ Both `hb_report` and `crm history` commands require passwordless SSH between nodes
+ For more information, see SUSE Documentation - [Usage of hb\$1report for SLES HAE](https://www.suse.com/support/kb/doc/?id=000017501) 

# Alerting and monitoring


This section covers the following topics.

**Topics**
+ [

## Using Amazon CloudWatch Application Insights
](#application-insights-nw-sles)
+ [

## Using the cluster alert agents
](#cluster-alert-nw-sles)

## Using Amazon CloudWatch Application Insights


For monitoring and visibility of cluster state and actions, Application Insights includes metrics for monitoring enqueue replication state, cluster metrics, and SAP and high availability checks. Additional metrics, such as EFS and CPU monitoring can also help with root cause analysis.

For more information, see [Get started with Amazon CloudWatch Application Insights](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/appinsights-getting-started.html) and [SAP NetWeaver High Availability on Amazon EC2](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/component-configuration-examples-netweaver-ha.html).

## Using the cluster alert agents


Within the cluster configuration, you can call an external program (an alert agent) to handle alerts. This is a *push* notification. It passes information about the event via environment variables.

The agents can then be configured to send emails, log to a file, update a monitoring system, etc. For example, the following script can be used to access Amazon SNS.

```
#!/bin/sh

# alert_sns.sh
# modified from /usr/share/pacemaker/alerts/alert_smtp.sh.sample

##############################################################################
# SETUP
# * Create an SNS Topic and subscribe email or chatbot
# * Note down the ARN for the SNS topic
# * Give the IAM Role attached to both Instances permission to publish to the SNS Topic
# * Ensure the aws cli is installed
# * Copy this file to /usr/share/pacemaker/alerts/alert_sns.sh or other location on BOTH nodes
# * Ensure the permissions allow for hacluster and root to execute the script
# * Run the following as root (modify file location if necessary and replace SNS ARN):
#
# SLES:
# crm configure alert aws_sns_alert /usr/share/pacemaker/alerts/alert_sns.sh meta timeout=30s timestamp-format="%Y-%m-%d_%H:%M:%S" to <{ arn:aws:sns:region:account-id:myPacemakerAlerts  }>
#
# RHEL:
# pcs alert create id=aws_sns_alert path=/usr/share/pacemaker/alerts/alert_sns.sh meta timeout=30s timestamp-format="%Y-%m-%d_%H:%M:%S"
# pcs alert recipient add aws_sns_alert value=arn:aws:sns:region:account-id:myPacemakerAlerts
##############################################################################

# Additional information to send with the alerts
node_name=`uname -n`
sns_body=`env | grep CRM_alert_`

# Required for SNS
TOKEN=$(/usr/bin/curl --noproxy '*' -s -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")

# Get metadata
REGION=$(/usr/bin/curl --noproxy '*' -w "\n" -s -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/dynamic/instance-identity/document | grep region | awk -F\" '{print $4}')

sns_subscription_arn=${CRM_alert_recipient}

# Format depending on alert type
case ${CRM_alert_kind} in
   node)
     sns_subject="${CRM_alert_timestamp} ${cluster_name}: Node '${CRM_alert_node}' is now '${CRM_alert_desc}'"
   ;;
   fencing)
     sns_subject="${CRM_alert_timestamp} ${cluster_name}: Fencing ${CRM_alert_desc}"
   ;;
   resource)
     if [ ${CRM_alert_interval} = "0" ]; then
         CRM_alert_interval=""
     else
         CRM_alert_interval=" (${CRM_alert_interval})"
     fi
     if [ ${CRM_alert_target_rc} = "0" ]; then
         CRM_alert_target_rc=""
     else
         CRM_alert_target_rc=" (target: ${CRM_alert_target_rc})"
     fi
     case ${CRM_alert_desc} in
         Cancelled)
           ;;
         *)
           sns_subject="${CRM_alert_timestamp}: Resource operation '${CRM_alert_task}${CRM_alert_interval}' for '${CRM_alert_rsc}' on '${CRM_alert_node}': ${CRM_alert_desc}${CRM_alert_target_rc}"
           ;;
     esac
     ;;
   attribute)
     sns_subject="${CRM_alert_timestamp}: The '${CRM_alert_attribute_name}' attribute of the '${CRM_alert_node}' node was updated in '${CRM_alert_attribute_value}'"
     ;;
   *)
     sns_subject="${CRM_alert_timestamp}: Unhandled $CRM_alert_kind alert"
     ;;
esac

# Use this information to send the email.
aws sns publish --topic-arn "${sns_subscription_arn}" --subject "${sns_subject}" --message "${sns_body}" --region ${REGION}
```

# Testing


We recommend scheduling regular fault scenario recovery testing at least annually, and as part of the operating system or SAP kernel updates that may impact operations. For more details on best practices for regular testing, see SAP Lens – [Best Practice 4.3 – Regularly test business continuity plans and fault recovery](https://docs.aws.amazon.com/wellarchitected/latest/sap-lens/best-practice-4-3.html).

The tests described here simulate failures. These can help you understand the behavior and operational requirements of your cluster.

In addition to checking the state of cluster resources, ensure that the service you are trying to protect is in the required state. Can you still connect to SAP? Are locks still available in SM12?

Define the recovery time to ensure that it aligns with your business objectives. Record recovery actions in runbooks.

**Topics**
+ [

## Test 1: Stop ASCS on the primary node using `sapcontrol`
](#test1-nw-sles)
+ [

## Test 2: Stop ERS on the secondary node using `sapcontrol`
](#test2-nw-sles)
+ [

## Test 3: Kill the message server process on the primary node
](#test3-nw-sles)
+ [

## Test 4: Kill the enqueue server process on the primary node
](#test4-nw-sles)
+ [

## Test 5: Kill the ER process
](#test5-nw-sles)
+ [

## Test 6: Simulate hardware failure of an individual node, and repeat for other node
](#test6-nw-sles)
+ [

## Test 7: Simulate a network failure
](#test7-nw-sles)
+ [

## Test 8: Simulate an NFS failure
](#test8-nw-sles)
+ [

## Test 9: Accidental shutdown
](#test9-nw-sles)

## Test 1: Stop ASCS on the primary node using `sapcontrol`


 **Notes** – Ensure that the connector has been installed and the parameters have been updated.

 **Simulate failure** – On `slxhost01` as `slxadm`:

```
sapcontrol -nr <00> -function Stop
```

 **Expected behavior** – ASCS should be stopped on `slxhost01`, and the cluster should not perform any activity.

 **Recovery action** – Start ASCS manually.

## Test 2: Stop ERS on the secondary node using `sapcontrol`


 **Notes** – Ensure that the connector has been installed, and the parameters are updated.

 **Simulate failure** – On `slxhost02` as `slxadm`:

```
sapcontrol -nr <10> -function Stop
```

 **Expected behavior** – ERS should be stopped on `slxhost02`, and the cluster should not perform any activity.

 **Recovery action** – Start ERS manually.

## Test 3: Kill the message server process on the primary node


 **Simulate failure** – On `slxhost01` as `slxadm`:

```
kill -9 $(pgrep -f "ms.sap<SLX>_ASCS<00>")
```

 **Expected behavior** – The message server should immediately respawn based on the Restart parameter.

 **Recovery action** – No action required.

## Test 4: Kill the enqueue server process on the primary node


 **Notes** – Check that locks have persisted, and review the location constraints that only exist for ENSA1.

 **Simulate failure** – On `slxhost01` as `slxadm`:

```
kill -9 $(pgrep -f "[en|enq].sap<SLX>_ASCS<00>")
```

 **Expected behavior** – ENSA2: Cluster will restart the ENQ process and retrieve the locks remotely. ENSA1: Cluster will failover the ASCS resource to the node where the ERS is running.

 **Recovery action** – No action required.

## Test 5: Kill the ER process


 **Simulate failure** – On `slxhost02` as `slxadm`:

```
kill -9 $(pgrep -f "[er|enqr].sap<SLX>_ERS<10>")
```

 **Expected behavior** – Cluster will restart the ERS on the same node.

 **Recovery action** – No action required.

## Test 6: Simulate hardware failure of an individual node, and repeat for other node


 **Notes** – To simulate a system crash, you must first ensure that `/proc/sys/kernel/sysrq` is set to 1.

 **Simulate failure** – On both nodes as root:

```
echo 'b' > /proc/sysrq-trigger
```

 **Expected behavior** – The node which has been killed fails. The cluster will move the resources (ASCS/ERS) which were running on the failed node to the surviving node.

 **Recovery action** – Start the EC2 node and pacemaker service. The cluster will detect that the node is online and move the ERS resource so that the ASCS and ERS are not running on the same node (colocation constraint).

## Test 7: Simulate a network failure


 **Notes** – See the following list.
+ Iptables must be installed.
+ Use a subnet in this command because of the secondary ring.
+ Check for any existing iptables rules as iptables -F will flush all rules.
+ Review pcmk\$1delay and priority parameters if you see neither node survives the fence race.

 **Simulate failure** – On either node as root:

```
iptables -A INPUT -s <CIDR_of_other_subnet> -j DROP; iptables -A OUTPUT -d <CIDR_of_other_subnet> -j DROP
```

 **Expected behavior** – The cluster detects the network failure, and fences one of the nodes to avoid a split-brain situation.

 **Recovery action** – If the node where the command was run survives, execute iptables -F to clear the network failure. Start the EC2 node and pacemaker service. The cluster will detect that the node is online and move the ERS resource so that the ASCS and ERS are not running on the same node (colocation constraint).

## Test 8: Simulate an NFS failure


 **Notes** – See the following list.
+ Iptables must be installed.
+ Check for any existing iptables rules as iptables -F will flush all rules.
+ Although rare, this is an important scenario to test. Depending on the activity it may take some time (10 min \$1) to notice that I/O to EFS is not occurring and fail either the Filesystem or SAP resources.

 **Simulate failure** – On either node as root:

```
iptables -A OUTPUT -p tcp --dport 2049 -m state --state NEW,ESTABLISHED,RELATED -j DROP; iptables -A INPUT -p tcp --sport 2049 -m state --state ESTABLISHED -j DROP
```

 **Expected behavior** – The cluster detects that NFS is not available, and the SAP Instance resource agent will fail and move to the FAILED state. Because of the option "on-fail=restart" configuration, the cluster will try a local restart before eventually fencing the node and failing over.

 **Recovery action** – If the node where the command was run survives, execute iptables -F to clear the network failure. Start the EC2 node and pacemaker service. The cluster will detect that the node is online and move the ERS resource so that the ASCS and ERS are not running on the same node (colocation constraint).

## Test 9: Accidental shutdown


 **Notes** – See the following list.
+ Avoid shutdowns without cluster awareness.
+ We recommend the use of systemd to ensure predictable behaviour.
+ Ensure the resource dependencies are in place.

 **Simulate failure** – Login to AWS Management Console, and stop the instance or issue a shutdown command.

 **Expected behavior** – The node which has been shut down fails. The cluster will move the resources (ASCS/ERS) which were running on the failed node to the surviving node. If systemd and resource dependencies are not configured, you may notice that while the EC2 instance is shutting down gracefully, the cluster will detect an unclean stop of cluster services on the node and will fence the EC2 instance being shut down. For more information, see [SUSE documentation – Stopping the Cluster Services on a Node](https://documentation.suse.com/sle-ha/15-SP1/html/SLE-HA-all/cha-ha-maintenance.html#sec-ha-maint-shutdown-node).

 **Recovery action** – Start the EC2 node and pacemaker service. The cluster will detect that the node is online, and move the ERS resource so that the ASCS and ERS are not running on the same node (colocation constraint).

# SAP NetWeaver on AWS: high availability configuration for Red Hat Enterprise Linux (RHEL) for SAP applications
RHEL Pacemaker

This topic applies to Red Hat Enterprise Linux (RHEL) operating system for SAP NetWeaver applications on AWS cloud. It covers the instructions for configuration of a pacemaker cluster for the ABAP SAP Central Service (ASCS) and the Enqueue Replication Server (ERS) when deployed on Amazon EC2 instances in two different Availability Zones within an AWS Region.

This topic covers instructions for the following configuration options.

**Topics**
+ [

# Planning
](sap-nw-pacemaker-rhel-planning.md)
+ [

# Prerequisites
](sap-nw-pacemaker-rhel-prerequisites.md)
+ [

# SAP ASCS and Cluster Setup
](sap-nw-pacemaker-rhel-setup.md)
+ [

# Operations
](rhel-netweaver-ha-operations.md)
+ [

# Testing
](testing-nw-rhel.md)

# Planning


This section covers the following topics.

**Topics**
+ [

# Setup Overview
](sap-nw-pacemaker-rhel-setup-overview.md)
+ [

# Vendor Support
](sap-nw-pacemaker-rhel-references.md)
+ [

# Concepts
](sap-nw-pacemaker-rhel-concepts.md)
+ [

# Parameter Reference
](sap-nw-pacemaker-rhel-parameters.md)
+ [

# Architecture diagrams
](rhel-netweaver-ha-diagrams.md)

# Setup Overview


You must meet the following prerequisites before commencing setup.

**Topics**
+ [

## Deployed Cluster Infrastructure
](#cluster-nw-rhel)
+ [

## Supported Operating System
](#supported-os-nw-rhel)
+ [

## SAP and Red Hat references
](#references-nw-rhel)
+ [

## Required Access for Setup
](#access-nw-rhel)
+ [

## Reliability Requirements Defined
](#reliability-nw-rhel)

## Deployed Cluster Infrastructure


Ensure that your AWS networking requirements and Amazon EC2 instances where SAP workloads are installed, are correctly configured for SAP. For more information, see [SAP NetWeaver Environment Setup for Linux on AWS](https://docs.aws.amazon.com/sap/latest/sap-netweaver/std-sap-netweaver-environment-setup.html).

See the following ASCS cluster specific requirements.
+ Two cluster nodes created in private subnets in separate Availability Zones within the same Amazon VPC and AWS Region
+ Access to the route table(s) that are associated with the chosen subnets

  For more information, see [AWS – Overlay IP](sap-nw-pacemaker-rhel-concepts.md#overlay-ip-nw-rhel).
+ Amazon EC2 instances must have connectivity to the Amazon EC2 endpoint via either internet or an Amazon VPC endpoint.

## Supported Operating System


Protecting the ABAP SAP Central Services (ASCS) with a pacemaker cluster requires packages from Red Hat, including targeted cluster resource agents for SAP and AWS that may not be available in standard repositories.

For deploying SAP applications on Red Hat, SAP and Red Hat recommend using Red Hat Enterprise Linux for SAP Solutions (RHEL for SAP). RHEL for SAP provides additional benefits, including Extended Update Support (EUS), configuration and tuning packages for SAP applications, and High Availability Add-On. For more details, see Red Hat website at [Red Hat Enterprise Linux for SAP Solutions](https://www.redhat.com/en/technologies/linux-platforms/enterprise-linux/sap).

RHEL for SAP is available at [AWS Marketplace](https://aws.amazon.com/marketplace) with an hourly or annual subscription. You can also use the bring your own subscription (BYOS) model.

## SAP and Red Hat references


In addition to this guide, see the following references for more details.

 **RHEL 9 Documentation (Recommended):** 
+  [Red Hat documentation – Deploying SAP NetWeaver or S/4HANA Application Server High Availability with Simple Mount (RHEL 9)](https://docs.redhat.com/en/documentation/red_hat_enterprise_linux_for_sap_solutions/9/html/deploying_sap_netweaver_or_s4hana_application_server_high_availability_with_simple_mount/index) 
+  [Red Hat documentation – Configuring HA clusters to manage SAP NetWeaver or SAP S/4HANA Application server instances using the RHEL HA Add-On (RHEL 9)](https://docs.redhat.com/en/documentation/red_hat_enterprise_linux_for_sap_solutions/9/html/configuring_ha_clusters_to_manage_sap_netweaver_or_sap_s4hana_application_server_instances_using_the_rhel_ha_add-on) 
+  [SAP Note: 3108316 - Red Hat Enterprise Linux 9.x: Installation and Configuration](https://me.sap.com/notes/3108316) 

 **RHEL 8 Documentation:** 
+  [Red Hat documentation – Configuring HA clusters to manage SAP NetWeaver or SAP S/4HANA Application server instances using the RHEL HA Add-On (RHEL 8)](https://docs.redhat.com/en/documentation/red_hat_enterprise_linux_for_sap_solutions/8/html/configuring_ha_clusters_to_manage_sap_netweaver_or_sap_s4hana_application_server_instances_using_the_rhel_ha_add-on) 
+  [SAP Note: 2772999 - Red Hat Enterprise Linux 8.x: Installation and Configuration](https://me.sap.com/notes/2772999) 
+  [SAP Note: 2777782 - SAP HANA DB: Recommended OS settings for RHEL 8](https://me.sap.com/notes/2777782) 

 **RHEL 7 Documentation (Extended Life Phase - Not recommended for new installations):** 
+  [Red Hat documentation – RHEL Guidelines for Configuring SAP S/4HANA ASCS/ERS with Standalone Enqueue Server 2 (ENSA2) in Pacemaker (RHEL 7)](https://access.redhat.com/articles/3974941) 
+  [SAP Note: 2002167 - Red Hat Enterprise Linux 7.x: Installation and Upgrade](https://me.sap.com/notes/2002167) 

 **General SAP Notes:** 
+  [SAP Note: 1656099 - SAP Applications on AWS: Supported DB/OS and Amazon EC2 products](https://me.sap.com/notes/1656099) 

You must have SAP portal access for reading all SAP Notes.

## Required Access for Setup


The following access is required for setting up the cluster.
+ An IAM user with the following privileges.
  + modify Amazon VPC route tables
  + modify Amazon EC2 instance properties
  + create IAM policies and roles
  + create Amazon EFS file systems
+ Root access to the operating system of both cluster nodes
+ SAP administrative user access – `<sid>adm` 

  In case of a new install, this user is created by the install process.

## Reliability Requirements Defined


The SAP Lens of the Well-Architected framework, in particular the Reliability pillar, can be used to understand the reliability requirements for your SAP workload.

The ASCS is a single point of failure in a highly available SAP architecture. The impact of an outage of this component must be evaluated against factors, such as, recovery point objective (RPO), recovery time objective (RTO), cost and operation complexity. For more information, see [Reliability](https://docs.aws.amazon.com/wellarchitected/latest/sap-lens/reliability.html) in SAP Lens - AWS Well-Architected Framework.

# Vendor Support


The following section details the documentation and deployment guidance from Red Hat.

**Topics**
+ [

## Deployment Patterns
](#deployments-rhel)
+ [

## Automated Deployment
](#automation-nw-rhel)
+ [

## Pacemaker - simple-mount and classic architecture
](#simple-classic-nw-rhel)

## Deployment Patterns


The following table outlines the supported SAP deployment types and their corresponding AWS configuration patterns for high availability clustering.


| SAP Deployment Type | Support Status |  AWS Configuration Patterns | Notes | 
| --- | --- | --- | --- | 
|  SAP NetWeaver ASCS/ERS (ENSA1)  |   AWS Documented & Supported  |  SAPNetweaver-Classic, SAPNetweaver-Simple-mount  |  | 
|  SAP NetWeaver ASCS/ERS (ENSA2)  |   AWS Documented & Supported  |  SAPNetweaver-Classic, SAPNetweaver-Simple-mount  |  | 
|  SAP S/4HANA ASCS/ERS  |   AWS Documented & Supported  |  SAPNetweaver-Classic, SAPNetweaver-Simple-mount  |  S/4HANA only supports ERS2  | 
|  SAP SCS (Java)  |  Vendor Documented & Supported  |  |  Follows SAP Documentation  | 

## Automated Deployment


You can set up a cluster manually using the instructions provided here. You can also automate parts of this process to ensure consistency and repeatability.

Use AWS Launch Wizard for SAP for automated deployments of SAP NetWeaver, SAP S/4 HANA, SAP B/4HANA, and Solution Manager. Launch Wizard uses AWS CloudFormation scripts to quickly provision the resources needed to deploy SAP NetWeaver and S/4 HANA. The automation performs SAP enqueue replication and pacemaker setup so that only validation and testing are required. For more information, see [AWS Launch Wizard for SAP](https://docs.aws.amazon.com/launchwizard/latest/userguide/launch-wizard-sap.html).

To ensure that the behavior and operation of your cluster is well understood regardless of how your system is set up, we recommend a thorough test cycle. See [Testing](https://docs.aws.amazon.com/sap/latest/sap-netweaver/sap-nw-pacemaker-rhel-testing.html) for more details.

## Pacemaker - simple-mount and classic architecture


This guide covers two architectures for SAP cluster solutions on RHEL for SAP – simple-mount and classic (previous standard). Simple-mount was certified as the RHEL for SAP Applications cluster solution in 2025. It is now the recommended architecture for both ENSA1 and ENSA2 deployments running on RHEL for SAP 9 and above. For more details, see [Red Hat documentation – Deploying SAP NetWeaver or S/4HANA Application Server High Availability with Simple Mount](https://docs.redhat.com/en/documentation/red_hat_enterprise_linux_for_sap_solutions/9/html/deploying_sap_netweaver_or_s4hana_application_server_high_availability_with_simple_mount/index).

If you are configuring a new SAP installation, we recommend the simple-mount architecture. If you already have the classic architecture, and wish to migrate to the simple-mount architecture, see [Switching architecture to simple-mount](#switching-architecture-rhel).

The following are the differences between the classic and simple-mount architectures.
+ Removing file system resources from cluster – a file system is required but it is not mounted and unmounted by the cluster. The executable directory for the ASCS and ERS can be permanently mounted on both nodes.
+ Addition of SAPStartSrv – SAPStartSrv controls the matching SAPStartSrv framework process.
+ Sapping and sappong services – these services manage the start of SAPStartSrv services with sapinit.

See the [Architecture diagrams](https://docs.aws.amazon.com/sap/latest/sap-netweaver/rhel-netweaver-ha-diagrams.html) for more details.

### Switching architecture to simple-mount


TODO - Review and update internal links.

Follow along these steps if you want to switch an existing cluster with classic architecture to use the recommended configuration of simple-mount architecture.

These steps must be performed in an outage window, allowing stop/start of services and basic testing.

1. Put the cluster in maintenance mode. See [Maintenance mode](planned-maintenance-nw-rhel.md#maintenance-mode-nw-rhel) 

1. Stop SAP services, including application servers connected to the cluster as well as ASCS and ERS.

1. Install any missing operating system packages. See [Install Missing Operating System Packages](sap-nw-pacemaker-rhel-os-settings.md#packages-nw-rhel).

   It might be necessary to install `sapstartsrv-resource-agents`. However, all operating system prerequisites must be checked and updated to ensure that versions are compatible.

1. Add entries for ASCS and ERS mount point on both nodes (if not already added). See [Update /etc/fstab](sap-shared-filesystems-nw-rhel.md#update-fstab-nw-rhel) 

1. Enable `sapping`/`sappong` services. See [Enable sapping and sappong Services (Simple-Mount Only)](sap-ascs-service-control-nw-rhel.md#sapping-sappong-services-nw-rhel) 

1. Align and disable `systemd` services. See [Ensure ASCS and ERS SAP Services can run on either node (systemd)](sap-ascs-service-control-nw-rhel.md#modify-sapservices-nw-rhel) 

1. Backup the configuration with the following command.

   ```
   # pcs config show >> /tmp/classic_ha_setup.txt
   ```

   See [Prepare for Resource Creation](cluster-config-nw-rhel.md#prepare-resource-nw-rhel) 

1.  *Optional* – delete the configuration. You can edit in place but we recommend starting with a blank configuration. This ensures that latest timeout and priority parameters are in place. See [Reset Configuration](cluster-config-nw-rhel.md#reset-config-nw-rhel) 

   ```
   # pcs resource cleanup
   # pcs config show
   ```

1. Configure cluster resources again.

1. Check the cluster and perform some tests.

1. Resume standard operations by starting any additional services, including application servers.

# Concepts


This section covers AWS, SAP, and Red Hat concepts.

**Topics**
+ [

## SAP – ABAP SAP Central Services (ASCS)
](#ascs-nw-rhel)
+ [

## SAP – Enqueue Replication Server (ERS)
](#ers-nw-rhel)
+ [

## AWS – Availability Zones
](#availability-zones-nw-rhel)
+ [

## AWS – Overlay IP
](#overlay-ip-nw-rhel)
+ [

## AWS – Shared VPC
](#shared-vpc)
+ [

## Pacemaker - STONITH fencing agent
](#stonith-nw-rhel)

## SAP – ABAP SAP Central Services (ASCS)


The ABAP SAP Central Services (ASCS) is an SAP instance consisting of the following two services. It is considered a single point of failure (SPOF) in a resilient SAP architecture.
+  **Message server** – Responsible for application load distribution (GUI and RFC), communication between application servers, and centralised configuration information for web dispatchers and application servers.
+  **Enqueue server (standalone)** – Maintains a lock table in main memory (shared memory). Unlike a database lock, an enqueue lock can exist across multiple logical units of work (LUW), and is set by a SAP Dialog work process. The lock mechanism prevents two transactions from changing the same data in the database simultaneously.

**Note**  
With ABAP Release 7.53 (ABAP Platform 1809), the new Standalone Enqueue Server 2 (ENSA2) is installed by default. It replaces the previous version (ENSA1) but can be configured for the previous versions. See [SAP Note 2630416 - Support for Standalone Enqueue Server 2](https://me.sap.com/notes/2630416) (SAP portal access required) for more information.  
This document includes modifications to align with the correct ENSA version.

## SAP – Enqueue Replication Server (ERS)


The Enqueue Replication Server (ERS) is an SAP instance containing a replica of the lock table (replication table).

In a resilient setup, if the standalone enqueue server (EN/ENQ) fails, it can be restarted either by restart parameters or by high availability software, such as Pacemaker. The enqueue server retrieves the replication table remotely or by failing over to the host where the ERS is running.

## AWS – Availability Zones


Availability Zone is one or more discreet data centers with redundant power, networking, and connectivity in an AWS Region. For more information, see [Regions and Availability Zones](https://aws.amazon.com/about-aws/global-infrastructure/regions_az/).

For mission critical deployments of SAP on AWS where the goal is to minimise the recovery time objective (RTO), we suggest distributing single points of failure across Availability Zones. Compared with single instance or single Availability Zone deployments, this increases resilience and isolation against a broad range of failure scenarios and issues, including natural disasters.

Each Availability Zone is physically separated by a meaningful distance (many kilometers) from another Availability Zone. All Availability Zones in an AWS Region are interconnected with high-bandwidth, low-latency network, over fully redundant, dedicated metro fiber. This enables synchronous replication. All traffic between Availability Zones is encrypted.

## AWS – Overlay IP


Overlay IP enables a connection to the application, regardless of which Availability Zone (and subnet) contains the active primary node.

When deploying an Amazon EC2 instance in AWS, IP addresses are allocated from the CIDR range of the allocated subnet. The subnet cannot span across multiple Availability Zones, and therefore the subnet IP addresses may be unavailable after faults, including network connectivity or hardware issues which require a failover to the replication target in a different Availability Zone.

To address this, we suggest that you configure an overlay IP, and use this in the connection parameters for the application. This IP address is a non-overlapping RFC1918 private IP address from outside of VPC CIDR block and is configured as an entry in the route table or tables. The route directs the connection to the active node and is updated during a failover by the cluster software.

You can select any one of the following RFC1918 private IP addresses for your overlay IP address.
+ 10.0.0.0 – 10.255.255.255 (10/8 prefix)
+ 172.16.0.0 – 172.31.255.255 (172.16/12 prefix)
+ 192.168.0.0 – 192.168.255.255 (192.168/16 prefix)

If, for example, you use the 10/8 prefix in your SAP VPC, selecting a 172 or a 192 IP address may help to differentiate the overlay IP. Consider the use of an IP Address Management (IPAM) tool such as Amazon VPC IP Address Manager to plan, track, and monitor IP addresses for your AWS workloads. For more information, see [What is IPAM?](https://docs.aws.amazon.com/vpc/latest/ipam/what-it-is-ipam.html) 

The overlay IP agent in the cluster can also be configured to update multiple route tables which contain the Overlay IP entry if your subnet association or connectivity requires it.

 **Access to overlay IP** 

The overlay IP is outside of the range of the VPC, and therefore cannot be reached from locations that are not associated with the route table, including on-premises and other VPCs.

Use [AWS Transit Gateway](https://docs.aws.amazon.com/vpc/latest/tgw/what-is-transit-gateway.html) as a central hub to facilitate the network connection to an overlay IP address from multiple locations, including Amazon VPCs, other AWS Regions, and on-premises using [AWS Direct Connect](https://docs.aws.amazon.com/directconnect/latest/UserGuide/Welcome.html) or [AWS Client VPN](https://docs.aws.amazon.com/vpn/latest/clientvpn-admin/what-is.html).

If you do not have AWS Transit Gateway set up as a network transit hub or if it is not available in your preferred AWS Region, you can use a [Network Load Balancer](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/introduction.html) to enable network access to an overlay IP.

For more information, see [SAP on AWS High Availability with Overlay IP Address Routing](https://docs.aws.amazon.com/sap/latest/sap-hana/sap-ha-overlay-ip.html).

## AWS – Shared VPC


An enterprise landing zone setup or security requirements may require the use of a separate cluster account to restrict the route table access required for the Overlay IP to an isolated account. For more information, see [Share your VPC with other accounts](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-sharing.html).

Evaluate the operational impact against your security posture before setting up shared VPC. To set up, see [Shared VPC – optional](https://docs.aws.amazon.com/sap/latest/sap-netweaver/rhel-netweaver-ha-settings.html#rhel-netweaver-ha-shared-vpc).

## Pacemaker - STONITH fencing agent


In a two-node cluster setup for a primary resource and its replication pair, it is important that there is only one node in the primary role with the ability to modify your data. In the event of a failure scenario where a node is unresponsive or incommunicable, ensuring data consistency requires that the faulty node is isolated by powering it down before the cluster commences other actions, such as promoting a new primary. This arbitration is the role of the fencing agent.

Since a two-node cluster introduces the possibility of a fence race in which a dual shoot out can occur with communication failures resulting in both nodes simultaneously claiming, "I can’t see you, so I am going to power you off". The fencing agent is designed to minimise this risk by providing an external witness.

RHEL supports several fencing agents, including the one recommended for use with Amazon EC2 Instances (fence\$1aws). This resource uses API commands to check its own instance status - "Is my instance state anything other than running?" before proceeding to power off its pair. If it is already in a stopping or stopped state it will admit defeat and leave the surviving node untouched.

# Parameter Reference


The cluster setup relies on the following parameters.

**Topics**
+ [

## Global AWS parameters
](#global-aws-parameters-nw-rhel)
+ [

## Amazon EC2 instance parameters
](#ec2-parameters-nw-rhel)
+ [

## SAP Instance Parameters
](#sap-pacemaker-resource-parameters-nw-rhel)
+ [

## Pacemaker Parameters
](#rhel-cluster-parameters)

## Global AWS parameters


| Name | Parameter | Example | 
| --- | --- | --- | 
|   AWS account ID  |   `<account_id>`   |   `123456789100`   | 
|   AWS Region  |   `<region_id>`   |   `us-east-1`   | 
+  AWS account – For more details, see [Your AWS account ID and its alias](https://docs.aws.amazon.com/IAM/latest/UserGuide/console-account-alias.html).
+  AWS Region – For more details, see [Describe your Regions](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html#using-regions-availability-zones-describe).

## Amazon EC2 instance parameters


| Name | Parameter | Host 1 | Host 2 | 
| --- | --- | --- | --- | 
|  Amazon EC2 instance ID  |   `<instance_id>`   |   `i-xxxxinstidforhost1`   |   `i-xxxxinstidforhost2`   | 
|  Hostname  |   `<hostname>`   |   `rhxhost01`   |   `rhxhost02`   | 
|  Host IP  |   `<host_ip>`   |   `10.1.10.1`   |   `10.1.20.1`   | 
|  Host additional IP  |   `<host_additional_ip>`   |   `10.1.10.2`   |   `10.1.20.2`   | 
|  Configured subnet  |   `<subnet_id>`   |   `subnet-xxxxxxxxxxsubnet1`   |   `subnet-xxxxxxxxxxsubnet2`   | 
|  Associated VPC Route Table(s)  |   `<routetable_id>`   |   `rtb-xxxxxroutetable1 [,rtb-xxxxxroutetable2]`   |  | 
|  Sapmnt NFS ID or CNAME  |   `<sapmnt_nfs_id>`   |   `fs-xxxxxxxxxxxxxefs1`   |  | 
+ Hostname – Hostnames must comply with SAP requirements outlined in [SAP Note 611361 - Hostnames of SAP ABAP Platform servers](https://me.sap.com/notes/611361) (requires SAP portal access).

  Run the following command on your instances to retrieve the hostname.

  ```
  # hostname
  ```
+  **Amazon EC2 instance ID** – run the following command (IMDSv2 compatible) on your instances to retrieve instance metadata.

  ```
  # /usr/bin/curl --noproxy '*' -w "\n" -s -H "X-aws-ec2-metadata-token: $(curl --noproxy '*' -s -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")" http://169.254.169.254/latest/meta-data/instance-id
  ```

  For more details, see [Retrieve instance metadata](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html) and [Instance identity documents](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-identity-documents.html).
+  **Amazon EC2 subnet ID** – run the following command to retrieve the subnet ID for each of your instances.

  ```
  # INSTANCE_ID=i-xxxxinstidforhost1
  # aws ec2 describe-instances --instance-ids $INSTANCE_ID --query 'Reservations[0].Instances[0].SubnetId' --output text
  ```

  For more details, see [describe-instances](https://docs.aws.amazon.com/cli/latest/reference/ec2/describe-instances.html) and [VPC subnets](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Subnets.html).
+  **Route table(s) for subnets** – run the following AWS CLI commands to retrieve the route table(s) associated with both cluster node subnets.

  ```
  # SUBNET_ID_1=subnet-xxxxxxxxxxsubnet1
  # SUBNET_ID_2=subnet-xxxxxxxxxxsubnet2
  # aws ec2 describe-route-tables --filters "Name=association.subnet-id,Values=$SUBNET_ID_1,$SUBNET_ID_2" --query 'RouteTables[].RouteTableId' --output text
  ```

  If both cluster nodes are in subnets associated with the same route table, only one route table ID will be returned. If they are associated with different route tables, both route table IDs will be returned.

  For more details, see [describe-route-tables](https://docs.aws.amazon.com/cli/latest/reference/ec2/describe-route-tables.html) and [Route tables](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Route_Tables.html).

## SAP Instance Parameters


| Name | Parameter | Example | 
| --- | --- | --- | 
|  SID  |   `<SID>` or `<sid>`   |   `RHX`   | 
|  ASCS Alias  |   `<ascs_virt_hostname>`   |   `rhxascs`   | 
|  ASCS System Number  |   `<ascs_sys_nr>`   |   `00`   | 
|  ASCS Overlay IP  |   `<ascs_oip>`   |   `172.16.1.50`   | 
|  ASCS NFS Mount Point  |   `<ascs_nfs_mount_point>`   |   `/RHX_ASCS00`   | 
|  ERS Alias  |   `<ers_virt_hostname>`   |   `rhxers`   | 
|  ERS System Number  |   `<ers_sys_nr>`   |   `10`   | 
|  ERS Overlay IP  |   `<ers_oip>`   |   `172.16.1.51`   | 
|  ERS NFS Mount Point  |   `<ers_nfs_mount_point>`   |   `/RHX_ERS10`   | 
|  ENSA Type  |   `<ensa_type>`   |   `ENSA2`   | 

## Pacemaker Parameters


| Name | Parameter | Example | 
| --- | --- | --- | 
|  Cluster user  |   `cluster_user`   |   `hacluster`   | 
|  Cluster password  |   `cluster_password`   |  | 
|  Cluster tag  |   `cluster_tag`   |   `pacemaker`   | 
|   AWS CLI cluster profile  |   `aws_cli_cluster_profile`   |   `cluster`   | 
|  Cluster connector  |   `cluster_connector`   |   `sap-redhat-cluster-connector`   | 

# Architecture diagrams


This guide covers two architectures for SAP cluster solutions on RHEL for SAP – simple-mount and classic (previous standard). See the following images to learn more.

**Topics**
+ [

## Pacemaker - simple-mount architecture
](#simple-mount-diagram-nw-rhel)
+ [

## Pacemaker - classic architecture
](#classic-diagram-nw-rhel)

## Pacemaker - simple-mount architecture


See the following image for more details.

![\[Simple Mount Achitecture\]](http://docs.aws.amazon.com/sap/latest/sap-netweaver/images/image-pacemaker-nw-rhel-simplemount.png)


## Pacemaker - classic architecture


See the following image for more details.

![\[Classic Architecture.\]](http://docs.aws.amazon.com/sap/latest/sap-netweaver/images/image-pacemaker-nw-rhel-classic.png)


# Prerequisites
Prerequisites

**Topics**
+ [

# AWS Infrastructure Setup
](sap-nw-pacemaker-rhel-infra-setup.md)
+ [

# EC2 Instance Configuration
](sap-nw-pacemaker-rhel-ec2-configuration.md)
+ [

# Operating System Requirements
](sap-nw-pacemaker-rhel-os-settings.md)

# AWS Infrastructure Setup


This section covers the one-time setup tasks required to prepare your AWS environment for the cluster deployment:

**Note**  
We recommend using administrative privileges from an administrative workstation or AWS Console for the initial infrastructure setup instead of granting instance-based privileges, as this maintains the principle of least privilege. Infrastructure setup APIs (such as CreateRoute, ModifyInstanceAttribute, and CreateTags) are only required during initial configuration and are not needed for ongoing cluster operations.

**Topics**
+ [

## Create IAM Roles and Policies for Pacemaker
](#iam-roles-rhel)
+ [

## Modify Security Groups for Cluster Communication
](#sg-rhel)
+ [

## Add VPC Route Table Entries for Overlay IPs
](#rt-rhel)

## Create IAM Roles and Policies for Pacemaker


In addition to the permissions required for standard SAP operations, two IAM policies are required for the cluster to control AWS resources. These policies must be assigned to your Amazon EC2 instance using an IAM role. This enables Amazon EC2 instance, and therefore the cluster to call AWS services.

**Note**  
Create policies with least-privilege permissions, granting access to only the specific resources that are required within the cluster. For multiple clusters, you may need to create multiple policies.

For more information, see [IAM roles for Amazon EC2](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html#ec2-instance-profile).

### STONITH Policy


The RHEL STONITH resource agent (fence\$1aws) requires permission to start and stop both the nodes of the cluster. Create a policy as shown in the following example. Attach this policy to the IAM role assigned to both Amazon EC2 instances in the cluster.

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeInstances",
        "ec2:DescribeTags"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "ec2:StartInstances",
        "ec2:StopInstances"
      ],
      "Resource": [
        "arn:aws:ec2:us-east-1:123456789012:instance/arn:aws:ec2:us-east-1:123456789012:instance/i-1234567890abcdef0",
        "arn:aws:ec2:us-east-1:123456789012:instance/arn:aws:ec2:us-east-1:123456789012:instance/i-1234567890abcdef0"
      ]
    }
  ]
}
```

### AWS Overlay IP Policy


The RHEL Overlay IP resource agent (aws-vpc-move-ip) requires permission to modify a routing entry in route tables. Create a policy as shown in the following example. Attach this policy to the IAM role assigned to both Amazon EC2 instances in the cluster.

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "ec2:ReplaceRoute",
            "Resource": [
                 "arn:aws:ec2:us-east-1:123456789012:route-table/rtb-0123456789abcdef0",
                 "arn:aws:ec2:us-east-1:123456789012:route-table/rtb-0123456789abcdef0"
                        ]
        },
        {
            "Effect": "Allow",
            "Action": "ec2:DescribeRouteTables",
            "Resource": "*"
        }
    ]
}
```

### Shared VPC (optional)


**Note**  
The following directions are only required for setups which include a Shared VPC.

Amazon VPC sharing enables you to share subnets with other AWS accounts within the same AWS Organizations. Amazon EC2 instances can be deployed using the subnets of the shared Amazon VPC.

In the pacemaker cluster, the aws-vpc-move-ip resource agent has been enhanced to support a shared VPC setup while maintaining backward compatibility with previous existing features.

The following checks and changes are required. We refer to the AWS account that owns Amazon VPC as the sharing VPC account, and to the consumer account where the cluster nodes are going to be deployed as the cluster account.

**Minimum Version Requirements**  
The latest version of the aws-vpc-move-ip agent shipped with RHEL8 and RHEL9 supports the shared VPC setup by default. The following are the minimum version required to support a shared VPC Setup:
+ RHEL 8 - resource-agents-4.1.1-90.el8\$14.7.x86\$164
+ RHEL 9 - resource-agents-4.9.0-16.el9\$10.6.x86\$164

**IAM Roles and Policies**  
Using the Overlay IP agent with a shared Amazon VPC requires a different set of IAM permissions to be granted on both AWS accounts (sharing VPC account and cluster account).

**Sharing VPC Account**  
In sharing VPC account, create an IAM role to delegate permissions to the EC2 instances that will be part of the cluster. During the IAM Role creation, select "Another AWS account" as the type of trusted entity, and enter the AWS account ID where the EC2 instances will be deployed/running from.

After the IAM role has been created, create the following IAM policy on the sharing VPC account, and attach it to an IAM role. Add or remove route table entries as needed.

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "VisualEditor0",
      "Effect": "Allow",
      "Action": "ec2:ReplaceRoute",
      "Resource": [
        "arn:aws:ec2:us-east-1:123456789012:route-table/rtb-0123456789abcdef0",
        "arn:aws:ec2:us-east-1:123456789012:route-table/rtb-0123456789abcdef0"
      ]
    },
    {
      "Sid": "VisualEditor1",
      "Effect": "Allow",
      "Action": "ec2:DescribeRouteTables",
      "Resource": "*"
    }
  ]
}
```

Next, edit move to the "Trust relationships" tab in the IAM role, and ensure that the AWS account you entered while creating the role has been correctly added.

In cluster account, create the following IAM policy, and attach it to an IAM role. This is the IAM Role that is going to be attached to the EC2 instances.

 **STS Policy** 

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "VisualEditor0",
      "Effect": "Allow",
      "Action": "sts:AssumeRole",
      "Resource": "arn:aws:iam::123456789012:role/sharing-vpc-account-cluster-role"
    }
  ]
}
```

 **STONITH Policy** 

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "VisualEditor0",
      "Effect": "Allow",
      "Action": [
        "ec2:StartInstances",
        "ec2:StopInstances"
      ],
      "Resource": [
        "arn:aws:ec2:us-east-1:123456789012:instance/arn:aws:ec2:us-east-1:123456789012:instance/i-1234567890abcdef0",
        "arn:aws:ec2:us-east-1:123456789012:instance/arn:aws:ec2:us-east-1:123456789012:instance/i-1234567890abcdef0"
      ]
    },
    {
      "Sid": "VisualEditor1",
      "Effect": "Allow",
      "Action": "ec2:DescribeInstances",
      "Resource": "*"
    }
  ]
}
```

## Modify Security Groups for Cluster Communication


A security group controls the traffic that is allowed to reach and leave the resources that it is associated with. For more information, see [Control traffic to your AWS resources using security groups](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-security-groups.html).

In addition to the standard ports required to access SAP and administrative functions, the following rules must be applied to the security groups assigned to all Amazon EC2 instances in the cluster.


| Source | Protocol | Port range | Description | 
| --- | --- | --- | --- | 
|  The security group ID (its own resource ID)  |  UDP  |  5405  |  Allows UDP traffic between cluster resources for corosync communication  | 
+ Note the use of the `UDP` protocol.
+ If you are running a local firewall, such as iptables, ensure that communication on the preceding ports is allowed between two Amazon EC2 instances.

## Add VPC Route Table Entries for Overlay IPs


You need to add initial route table entries for the Overlay IP. For more information on Overlay IP, see [AWS – Overlay IP](sap-nw-pacemaker-rhel-concepts.md#overlay-ip-nw-rhel).

Add entries to the VPC route table or tables associated with the subnets of your Amazon EC2 instance for the cluster. The entries for destination (Overlay IP CIDR) and target (Amazon EC2 instance or ENI) must be added manually for the ASCS and the ERS. This ensures that the cluster resource has a route to modify. It also supports the install of SAP using the virtual names associated with the Overlay IP before the configuration of the cluster.

Using either the Amazon VPC console, or an AWS CLI command add a route to the table or tables for the Overlay IP.

------
#### [  AWS Console ]

1. Identify the EC2 instance IDs for both cluster nodes and determine which route tables are associated with their subnets. For details, see [Parameter Reference](sap-nw-pacemaker-rhel-parameters.md#sap-pacemaker-resource-parameters-nw-rhel) 

1. Open the Amazon VPC console at https://console.aws.amazon.com/vpc

1. In the navigation pane, choose **Route Tables**, select the first route table.

1. Choose **Actions** → **Edit routes**.

1. Choose **Add route** and configure the ASCS route:    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sap/latest/sap-netweaver/sap-nw-pacemaker-rhel-infra-setup.html)

1. Choose **Add route** and configure the ERS route:    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sap/latest/sap-netweaver/sap-nw-pacemaker-rhel-infra-setup.html)

1. Choose **Save changes**.

1. Repeat for any additional associated route tables or route tables from the VPC which require connectivity to the ASCS.

   Your route table now includes entries for required Overlay IPs, in addition to the standard routes.

------
#### [  AWS CLI ]

Identify the EC2 instance IDs for both cluster nodes and determine which route tables are associated with their subnets. For details, see. [Parameter Reference](sap-nw-pacemaker-rhel-parameters.md#sap-pacemaker-resource-parameters-nw-rhel).

For the ASCS:

```
$ aws ec2 create-route --route-table-id <routetable_id> --destination-cidr-block <ascs_overlayip>/32 --instance-id <instance_id_1>
```

For the ERS:

```
$ aws ec2 create-route --route-table-id <routetable_id> --destination-cidr-block <ers_overlayip>/32 --instance-id <instance_id_2>
```

------

# EC2 Instance Configuration


Amazon EC2 instance settings can be applied using Infrastructure as Code or manually using AWS Command Line Interface or AWS Console. We recommend Infrastructure as Code automation to reduce manual steps, and ensure consistency.

**Topics**
+ [

## Assign or Review Pacemaker IAM Role
](#assign-review-pacemaker-iam-role-nw-rhel)
+ [

## Assign or Review Security Groups
](#assign-review-security-groups-nw-sles)
+ [

## Assign Secondary IP Addresses
](#assign-secondary-ip-addresses-nw-rhel)
+ [

## Disable Source/Destination Check
](#source-dest-nw-rhel)
+ [

## Review Stop Protection
](#stop_protection-nw-rhel)
+ [

## Review Automatic Recovery
](#auto_recovery-nw-rhel)

**Important**  
The following configurations must be performed on all cluster nodes. Ensure consistency across nodes to prevent cluster issues.

## Assign or Review Pacemaker IAM Role


The two cluster resource IAM policies must be assigned to an IAM role associated with your Amazon EC2 instance. If an IAM role is not associated to your instance, create a new IAM role for cluster operations.

The following AWS Console or AWS CLI commands can be used to modify the IAM role assignment.

------
#### [  AWS Console ]

1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2.

1. Select one of your cluster nodes.

1. In the navigation pane, choose **Actions** → **Security** → **Modify IAM role**.

1. Choose the IAM role that contains the policies created in [Create IAM Roles and Policies for Pacemaker](sap-nw-pacemaker-rhel-infra-setup.md#iam-roles-rhel).

1. Choose **Update IAM role**.

1. Repeat these steps for all nodes in the cluster.

------
#### [  AWS CLI ]

To assign an IAM role using the AWS CLI:

```
$ aws ec2 associate-iam-instance-profile --instance-id <instance_id> --iam-instance-profile Name=<iam_instance_profile_name>
```

Repeat for all nodes in the cluster.

------

You can verify the IAM role assignment on your instances using the AWS CLI:

```
$ aws ec2 describe-instances --instance-ids <instance_id> --query 'Reservations[0].Instances[0].IamInstanceProfile' --output table
```

You can check the specific permissions of the roles created for pacemaker in [Create IAM Roles and Policies for Pacemaker](sap-nw-pacemaker-rhel-infra-setup.md#iam-roles-rhel) by running the following on both your instances.

When --dry-run is used, the AWS CLI or SDK sends the request to the EC2 service with this flag. EC2 then performs all necessary permission checks and validates the request parameters. If the user has the required permissions and the request is well-formed, the service returns a DryRunOperation error response, indicating that the operation would have succeeded.

Check that the fencing resource has the permission to shut down both instances:

```
$ aws ec2 stop-instances --instance-ids <instance_id_1> --dry-run
$ aws ec2 stop-instances --instance-ids <instance_id_2> --dry-run
```

Check that the overlay IP resource has the pemissions to update the route tables:

```
$ aws ec2 replace-route --route-table-id <routetable_id> --destination-cidr-block <ascs_overlayip>/32 --instance-id <instance_id_1> --dry-run
```

## Assign or Review Security Groups


The security group rules created in the AWS [Modify Security Groups for Cluster Communication](sap-nw-pacemaker-rhel-infra-setup.md#sg-rhel) section must be assigned to your Amazon EC2 instances. If a security group is not associated with your instance, or if the required rules are not present in the assigned security group, add the security group or update the rules.

The following AWS Console or AWS CLI commands can be used to modify security group assignments.

------
#### [  AWS Console ]

1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2.

1. Select one of your cluster nodes.

1. In the **Security** tab, review the security groups, ports, and source of traffic.

1. If required, choose **Actions** → **Security** → **Change security groups**.

1. Under **Associated security groups**, search for and select the required groups.

1. Choose **Save**.

1. Repeat these steps for all nodes in the cluster.

------
#### [  AWS CLI ]

To modify security groups using the AWS CLI:

```
$ aws ec2 modify-instance-attribute --instance-id <instance_id> --groups <security_group_id1> <security_group_id2>
```

Repeat for all nodes in the cluster.

------

You can verify the security group rules on your instances using the AWS CLI:

```
$ aws ec2 describe-instance-attribute --instance-id <instance_id> --attribute groupSet
```

## Assign Secondary IP Addresses


Secondary IP addresses are used to create a redundant communication channel (secondary ring) in corosync for clusters. The cluster nodes can use the secondary ring to communicate in case of underlying network disruptions.

These IPs are only used in cluster configurations. The secondary IPs provide the same fault tolerance as a secondary Elastic Network Interface (ENI). For more information, see [Secondary IP addresses for your EC2 Instance](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-secondary-ip-addresses.html).

The following AWS Console or AWS CLI commands can be used to assign secondary IP addresses.

------
#### [  AWS Console ]

1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2.

1. Select one of your cluster nodes.

1. In the **Networking** tab, choose the network interface ID.

1. Choose **Actions** → **Manage IP addresses**.

1. Choose **Assign new IP address**.

1. Select **Auto-assign** or specify an IP from the subnet range.

1. Choose **Yes, Update**.

1. Repeat these steps for all nodes in the cluster.

------
#### [  AWS CLI ]

To assign secondary IP addresses using the AWS CLI:

```
$ ENI_ID=$(aws ec2 describe-instances --instance-id <instance_id> \
    --query 'Reservations[0].Instances[0].NetworkInterfaces[0].NetworkInterfaceId' \
    --output text)
$ aws ec2 assign-private-ip-addresses --network-interface-id $ENI_ID --secondary-private-ip-address-count 1
```

Repeat for all nodes in the cluster.

------

You can verify the secondary IP configuration on your instances using the AWS CLI:

```
$ aws ec2 describe-instances --instance-id <instance_id> \
    --query 'Reservations[*].Instances[*].NetworkInterfaces[*].PrivateIpAddresses[*].PrivateIpAddress' \
    --output text
```

Verify that:
+ Each instance returns two IP addresses from the same subnet
+ The primary network interface (eth0) has both IPs assigned
+ The secondary IPs will be used later for ring0\$1addr and ring1\$1addr in corosync.conf

## Disable Source/Destination Check


Amazon EC2 instances perform source/destination checks by default, requiring that an instance is either the source or the destination of any traffic it sends or receives. In the pacemaker cluster, source/destination check must be disabled on both instances receiving traffic from the Overlay IP.

The following AWS Console or AWS CLI commands can be used to modify the attribute.

------
#### [  AWS Console ]

1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2.

1. Select one of your cluster nodes.

1. In the navigation pane, choose **Actions** → **Networking** → **Change source/destination check**.

1. For Source/Destination Checking, choose **Stop** to allow traffic when the source or destination is not the instance itself.

1. Repeat these steps for all nodes in the cluster.

------
#### [  AWS CLI ]

To modify using the AWS CLI (requires appropriate configuration permissions):

```
$ aws ec2 modify-instance-attribute --instance-id <instance_id> --no-source-dest-check
```

Repeat for all nodes in the cluster.

------

To confirm the value of an attribute for a particular instance, use the following command. The value `false` means source/destination checking is disabled

```
$ aws ec2 describe-instance-attribute --instance-id <instance_id> --attribute sourceDestCheck
```

The output

```
{
    "InstanceId": "i-xxxxinstidforhost1",
    "SourceDestCheck": {
        "Value": false
    }
}
```

## Review Stop Protection


To ensure that STONITH actions can be executed, you must ensure that stop protection is disabled for Amazon EC2 instances that are part of a pacemaker cluster. If the default settings have been modified, use the following commands for both instances to disable stop protection via AWS CLI.

The following AWS Console or CLI commands can be used to modify the attribute.

------
#### [  AWS Console ]

1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2.

1. Select one of your cluster nodes.

1. Choose **Actions** → **Instance settings** → **Change stop protection**.

1. Ensure **Stop protection** is not enabled.

1. Repeat these steps for all nodes in the cluster.

------
#### [  AWS CLI ]

To modify using the AWS CLI (requires appropriate configuration permissions):

```
$ aws ec2 modify-instance-attribute --instance-id <instance_id> --no-disable-api-stop
```

Repeat this command for all nodes in the cluster.

------

To confirm the value of an attribute for a particular instance, use the following command. The value `false` means it is possible to stop the instance using an AWS CLI.

```
$ aws ec2 describe-instance-attribute --instance-id <instance_id> --attribute disableApiStop
```

The output

```
{
    "InstanceId": "i-xxxxinstidforhost1",
    "DisableApiStop": {
        "Value": false
    }
}
```

## Review Automatic Recovery


After a failure, cluster-controlled operations must be resumed in a coordinated way. This helps ensure that the cause of failure is known and addressed, and the status of the cluster is as expected. For example, verifying that there are no pending fencing actions.

The following AWS Console or CLI commands can be used to modify the attribute.

------
#### [  AWS Console ]

1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2.

1. Select one of your cluster nodes.

1. Choose **Actions** → **Instance settings** → **Change auto-recovery behavior**.

1. Select **Off** to disable auto-recovery for system status check failures.

1. Repeat these steps for all nodes in the cluster.

------
#### [  AWS CLI ]

To modify auto-recovery settings (requires appropriate configuration permissions):

```
$ aws ec2 modify-instance-maintenance-options --instance-id <instance_id> --auto-recovery disabled
```

Repeat this command for all nodes in the cluster.

------

To confirm the value of an attribute for a particular instance, use the following command. The value `disabled` means autorecovery will not be attempted.

```
$ aws ec2 describe-instances --instance-ids <instance_id> --query 'Reservations[*].Instances[*].MaintenanceOptions.AutoRecovery'
```

The output:

```
[
    [
        "disabled"
    ]
]
```

# Operating System Requirements


This section outlines the required operating system configurations for Red Hat Enterprise Linux for SAP (RHEL for SAP) cluster nodes. Note that this is not a comprehensive list of configuration requirements for running SAP on AWS, but rather focuses specifically on cluster management prerequisites.

Consider using configuration management tools or automated deployment scripts to ensure accurate and repeatable setup across your cluster infrastructure.

**Topics**
+ [

## Root Access
](#_root_access)
+ [

## Install Missing Operating System Packages
](#packages-nw-rhel)
+ [

## Update and Check Operating System Versions
](#_update_and_check_operating_system_versions)
+ [

## System Logging
](#_system_logging)
+ [

## Disable NetworkManager Cloud Services
](#_disable_networkmanager_cloud_services)
+ [

## Disable kdump
](#_disable_kdump)
+ [

## Time Synchronization Services
](#_time_synchronization_services)
+ [

## Install AWS CLI and Configure Profiles
](#install_shared_aws_cli_and_configure_profiles)
+ [

## Pacemaker Proxy Settings (Optional)
](#_pacemaker_proxy_settings_optional)

**Important**  
The following configurations must be performed on all cluster nodes. Ensure consistency across nodes to prevent cluster issues.

## Root Access


Verify root access on both cluster nodes. The majority of the setup commands in this document are performed with the root user. Assume that commands should be run as root unless there is an explicit call out to choose otherwise.

## Install Missing Operating System Packages


This is applicable to all cluster nodes. You must install any missing operating system packages.

The following packages and their dependencies are required for the pacemaker setup. Depending on your baseline image, for example, RHEL for SAP, these packages may already be installed.


| Package | Description | Category | Required | Configuration Pattern | 
| --- | --- | --- | --- | --- | 
|  chrony  |  Time Synchronization  |  System Support  |  Mandatory  |  All  | 
|  rsyslog  |  System Logging  |  System Support  |  Mandatory  |  All  | 
|  pacemaker  |  Cluster Resource Manager  |  Core Cluster  |  Mandatory  |  All  | 
|  corosync  |  Cluster Communication Engine  |  Core Cluster  |  Mandatory  |  All  | 
|  resource-agents  |  Resource Agents including SAPInstance  |  Core Cluster  |  Mandatory  |  All  | 
|  resource-agents-cloud  |  Cloud Resource agents including aws-vpc-move-ip  |  Core Cluster  |  Mandatory  |  RHEL 9 and above  | 
|  fence-agents-aws  |   AWS Fencing Capabilities  |  Core Cluster  |  Mandatory  |  All  | 
|  resource-agents-sap  |  SAP Resource Agents  |  SAP Integration  |  Mandatory  |  resource-agents-sap-4.15.1 required for SimpleMount  | 
|  sap-cluster-connector  |  SAP HA-Script Connector  |  SAP Integration  |  Mandatory  |  All  | 
|  pcs  |  Pacemaker Configuration System  |  Core Cluster  |  Mandatory  |  All  | 
|  sysstat  |  Performance Monitoring Tools  |  Support Tools  |  Recommended  |  All  | 
|  dstat  |  System Resource Statistics  |  Monitoring  |  Recommended  |  All  | 
|  iotop  |  I/O Monitoring  |  Monitoring  |  Recommended  |  All  | 

**Note**  
Refer to [Vendor Support of Deployment Types](sap-nw-pacemaker-rhel-references.md#deployments-rhel) for more information on Configuration Patterns. `Mandatory*` indicates that this package is mandatory based on the Configuration Pattern.

You can use the following script to check for missing packages and optionally install them:

```
#!/bin/bash
# Mandatory core packages for SAP NetWeaver HA on AWS
mandatory_packages="corosync pacemaker resource-agents resource-agents-cloud fence-agents-aws rsyslog chrony sap-cluster-connector pcs resource-agents-sap"

# Recommended monitoring and support packages
support_packages="sysstat dstat iotop"

# Default to checking all packages
packages="${mandatory_packages} ${support_packages}"

missingpackages=""

echo "Checking SAP NetWeaver HA package requirements..."

for package in ${packages}; do
    echo "Checking if ${package} is installed..."
    if ! rpm -q ${package} --quiet; then
        echo " ${package} is missing and needs to be installed"
        missingpackages="${missingpackages} ${package}"
    fi
done

if [ -z "$missingpackages" ]; then
    echo "All packages are installed."
else
    echo "Missing mandatory packages: $(echo ${missingpackages} | tr ' ' '\n' | grep -E "^($(echo ${mandatory_packages} | tr ' ' '|'))$")"
    echo "Missing support packages: $(echo ${missingpackages} | tr ' ' '\n' | grep -E "^($(echo ${support_packages} | tr ' ' '|'))$")"

    echo -n "Do you want to install the missing packages (y/n)? "
    read response
    if [ "$response" = "y" ]; then
        dnf install -y $missingpackages
    fi
fi
```

If a package is not installed, and you are unable to install it using dnf, it may be because Red Hat Enterprise Linux High Availability Add-On is not available as a repository in your chosen image. You can verify the availability of the add-on using the following command:

```
$ sudo dnf repolist
```

To install or update a package or packages with confirmation, use the following command:

```
$ sudo dnf install <package_name(s)>
```

## Update and Check Operating System Versions


You must update and confirm versions across nodes. Apply all the latest patches to your operating system versions. This ensures that bugs are addressed and new features are available.

You can update the patches individually or update all system patches using the `dnf update` command. A clean reboot is recommended prior to setting up a cluster.

```
$ sudo dnf update
$ sudo reboot
```

Compare the operating system package versions on the two cluster nodes and ensure that the versions match on both nodes.

## System Logging


Both systemd-journald and rsyslog are suggested for comprehensive logging. Systemd-journald (enabled by default) provides structured, indexed logging with immediate access to events, while rsyslog is maintained for backward compatibility and traditional file-based logging. This dual approach ensures both modern logging capabilities and compatibility with existing log management tools and practices.

 **1. Enable and start rsyslog:** 

```
# systemctl enable --now rsyslog
```

**2. (Optional) Configure persistent logging for systemd-journald:**  
If you are not using a logging agent (like the AWS CloudWatch Unified Agent or Vector) to ship logs to a centralized location, you may want to configure persistent logging to retain logs after system reboots.

```
# mkdir -p /etc/systemd/journald.conf.d
```

Create `/etc/systemd/journald.conf.d/99-logstorage.conf` with:

```
[Journal]
Storage=persistent
```

Persistent logging requires careful storage management. Configure appropriate retention and rotation settings in `journald.conf` to prevent logs from consuming excessive disk space. Review `man journald.conf` for available options such as SystemMaxUse, RuntimeMaxUse, and MaxRetentionSec.

To apply the changes, restart journald:

```
# systemctl restart systemd-journald
```

After enabling persistent storage, only new logs will be stored persistently. Existing logs from the current boot session will remain in volatile storage until the next reboot.

 **3. Verify services are running:** 

```
# systemctl status systemd-journald
# systemctl status rsyslog
```

## Disable NetworkManager Cloud Services


When using Red Hat Enterprise Linux 8.6 or later, the NetworkManager cloud setup services must be disabled to maintain cluster stability. These services can interfere with cluster operations by automatically removing the overlay IP address from network interfaces.

Run these commands on each cluster node:

```
# systemctl disable --now nm-cloud-setup.timer
# systemctl disable --now nm-cloud-setup
```

Verify the services are disabled and stopped:

```
# systemctl status nm-cloud-setup.timer
# systemctl status nm-cloud-setup
```

The status commands should show both services as "disabled" and "inactive (dead)".

## Disable kdump


The kernel crash dump facility (kdump) should be disabled with the following commands on each cluster node:

```
# systemctl stop kdump
# systemctl disable kdump
```

When kdump triggers an immediate system reboot during a kernel panic, it bypasses Pacemaker’s controlled failover process, potentially leaving cluster resources in an inconsistent state.

## Time Synchronization Services


Time synchronization is important for cluster operation. Ensure that chrony rpm is installed, and configure appropriate time servers in the configuration file.

You can use Amazon Time Sync Service that is available on any instance running in a VPC. It does not require internet access. To ensure consistency in the handling of leap seconds, don’t mix Amazon Time Sync Service with any other ntp time sync servers or pools.

Create or check the `/etc/chrony.d/ec2.conf` file to define the server:

```
# Amazon EC2 time source config
server 169.254.169.123 prefer iburst minpoll 4 maxpoll 4
```

Start the chronyd.service, using the following command:

```
# systemctl enable --now chronyd.service
# systemctl status chronyd
```

Verify time synchronization is working:

```
# chronyc tracking
```

Ensure the output shows `Reference ID : A9FEA97B (169.254.169.123)` confirming synchronization with Amazon Time Sync Service.

For more information, see [Set the time for your Linux instance](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-time.html).

## Install AWS CLI and Configure Profiles


The AWS cluster resource agents require AWS Command Line Interface (AWS CLI). Check if AWS CLI is already installed, and install it if necessary.

Check if AWS CLI is installed:

```
# aws --version
```

If the command is not found, install AWS CLI v2 using the following commands:

```
# cd /tmp
# curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
# dnf install -y unzip
# unzip awscliv2.zip
# sudo ./aws/install --update
```

Create symlinks to ensure AWS CLI is in the system PATH:

```
# sudo ln -sf /usr/local/bin/aws /usr/bin/aws
```

Verify the installation:

```
# aws --version
```

The installation creates a symbolic link at `/usr/local/bin/aws` which is typically in the system PATH by default.

For more information, see [Installing or updating to the latest version of the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html).

After installing AWS CLI, you need to create an AWS CLI profile for the root account.

You can either edit the config file at `/root/.aws` manually or by using the `aws configure` AWS CLI command.

You should skip providing the information for the access and secret access keys. The permissions are provided through IAM roles attached to Amazon EC2 instances.

```
# aws configure
AWS Access Key ID [None]:
AWS Secret Access Key [None]:
Default region name [None]: <region>
Default output format [None]:
```

The profile name is `default` unless configured. If you choose to use a different name you can specify `--profile`. The name chosen in this example is cluster. It is used in the AWS resource agent definition for pacemaker. The AWS Region must be the default AWS Region of the instance.

```
# aws configure --profile cluster
AWS Access Key ID [None]:
AWS Secret Access Key [None]:
Default region name [None]: <region>
Default output format [None]:
```

On the hosts, you can verify the available profiles using the following command:

```
# aws configure list-profiles
```

And review that an assumed role is associated by querying the caller identity:

```
# aws sts get-caller-identity --profile=<profile_name>
```

## Pacemaker Proxy Settings (Optional)


If your Amazon EC2 instance has been configured to access the internet and/or AWS Cloud through proxy servers, then you need to replicate the settings in the pacemaker configuration. For more information, see [Using an HTTP Proxy](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-proxy.html).

Add the following lines to `/etc/sysconfig/pacemaker`:

```
http_proxy=http://<proxyhost>:<proxyport>
https_proxy=http://<proxyhost>:<proxyport>
no_proxy=127.0.0.1,localhost,169.254.169.254,fd00:ec2::254
```
+ Modify proxyhost and proxyport to match your settings.
+ Ensure that you exempt the address used to access the instance metadata.
+ Configure no\$1proxy to include the IP address of the instance metadata service – 169.254.169.254 (IPV4) and fd00:ec2::254 (IPV6). This address does not vary.

# SAP ASCS and Cluster Setup


This section covers the following topics.

**Topics**
+ [

# SAP Shared File Systems
](sap-shared-filesystems-nw-rhel.md)
+ [

# Check IP availability and resolution
](check-ip-availability-resolution-nw-rhel.md)
+ [

# Install SAP
](install-sap-nw-rhel.md)
+ [

# Configure SAP for Cluster Control
](sap-ascs-service-control-nw-rhel.md)
+ [

# Cluster Node Setup
](cluster-node-setup-nw-rhel.md)
+ [

# Cluster Configuration
](cluster-config-nw-rhel.md)

# SAP Shared File Systems


**Topics**
+ [

## Select Shared Storage
](#select-storage-type-nw-rhel)
+ [

## Create file systems
](#create-filesystems-nw-rhel)
+ [

## Create mount point directories
](#create-mount-dirs-nw-rhel)
+ [

## Update /etc/fstab
](#update-fstab-nw-rhel)
+ [

## Temporarily mount ASCS and ERS directories for installation (classic only)
](#temp-mount-dirs-nw-rhel)

## Select Shared Storage


SAP NetWeaver high availability deployments require shared file systems. On Linux, you can use either [Amazon Elastic File System](https://aws.amazon.com/efs/) or [Amazon FSx for NetApp ONTAP](https://aws.amazon.com/fsx/netapp-ontap/). Choose between these options based on your requirements for resilience, performance, and cost. For detailed setup information, see [Getting started with Amazon Elastic File System](https://docs.aws.amazon.com/efs/latest/ug/getting-started.html) or [Getting started with Amazon FSx for NetApp ONTAP](https://docs.aws.amazon.com/fsx/latest/ONTAPGuide/getting-started.html).

We recommend sharing a single Amazon EFS or FSx for ONTAP file system across multiple SIDs within an account.

The file system’s DNS name is the simplest mounting option. When connecting from an Amazon EC2 instance, the DNS automatically resolves to the mount target’s IP address in that instance’s Availability Zone. You can also create an alias (CNAME) to help identify the shared file system’s purpose. Throughout this document, we use `<nfs.fqdn>`.

Examples:
+  `file-system-id.efs.aws-region.amazonaws.com` 
+  `svm-id.fs-id.fsx.aws-region.amazonaws.com` 
+  `qas_sapmnt_share.example.com` 

**Note**  
Review the `enableDnsHostnames` and `enableDnsSupport` DNS attributes for your VPC. For more information, see [View and update DNS attributes for your VPC](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-dns.html#vpc-dns-updating).

## Create file systems


The following shared file systems are covered in this document:


| NFS Location Structure | NFS Location Example | File System Location Structure | File System Location Example | 
| --- | --- | --- | --- | 
|  <SID>\$1sapmnt  |   `RHX_sapmnt`   |  /sapmnt/<SID>  |   `/sapmnt/RHX`   | 
|  <SID>\$1ASCS<ascs\$1sys\$1nr>  |   `RHX_ASCS00`   |  /usr/sap/<SID>/ASCS<ascs\$1sys\$1nr>  |   `/usr/sap/RHX/ASCS00`   | 
|  <SID>\$1ERS<ers\$1sys\$1nr>  |   `RHX_ERS10`   |  /usr/sap/<SID>/ERS<ers\$1sys\$1nr>  |   `/usr/sap/RHX/ERS10`   | 

The following options can differ depending on how you architect and operate your systems:
+ ASCS and ERS mount points - In simple-mount architecture, you can share the entire `/usr/sap/<SID>` directory. This document uses separate mount points to simplify migration and follow SAP’s recommendation for local application server executables when co-hosting ASCS/ERS.
+ Transport directory - `/usr/sap/trans` is optional for ASCS installations. Add this shared directory if your change management processes require it.
+ Home directory - This document uses local home directories to ensure `<sid>adm` access during NFS issues. Consider a shared home directory if you need consistent user environments across nodes.
+ NFS location naming - The "NFS Location" names are arbitrary and can be chosen based on your naming conventions (e.g., `myEFSMount1`, `prod_sapmnt`, etc.). The "File system location" follows the standard SAP directory structure and should use the parameter references shown.

For more information, see [SAP System Directories on UNIX](https://help.sap.com/docs/SAP_NETWEAVER_750/ff18034f08af4d7bb33894c2047c3b71/2744f17a26a74a8abfd202c4f5dc9a0f.html).

Using the NFS ID created in the previous step, temporarily mount the root directory of the NFS. `/mnt` is available by default; it can also be substituted with another temporary location.

**Note**  
The following commands use the NFS location names from the table above. Replace `<SID>_sapmnt`, `<SID>_ASCS<ascs_sys_nr>`, and `<SID>_ERS<ers_sys_nr>` with your chosen NFS location names and parameter values.

```
# mount <nfs.fqdn>:/ /mnt
# mkdir -p /mnt/<SID>_sapmnt
# mkdir -p /mnt/<SID>_ASCS<ascs_sys_nr>
# mkdir -p /mnt/<SID>_ERS<ers_sys_nr>
```
+  *Example using values from [Parameter Reference](sap-nw-pacemaker-rhel-parameters.md) *:

  ```
  # mount fs-xxxxxxxxxxxxxefs1.efs.us-east-1.amazonaws.com:/ /mnt
  # mkdir -p /mnt/RHX_sapmnt
  # mkdir -p /mnt/RHX_ASCS00
  # mkdir -p /mnt/RHX_ERS10
  ```

During SAP installation, the `<sid>adm` user and proper directory ownership will be created. Until then, we need to ensure the installation process has sufficient access. Set temporary permissions on the directories:

```
# chmod 777 /mnt/<SID>_sapmnt /mnt/<SID>_ASCS<ascs_sys_nr> /mnt/<SID>_ERS<ers_sys_nr>
```
+  *Example using values from [Parameter Reference](sap-nw-pacemaker-rhel-parameters.md) *:

  ```
  # chmod 777 /mnt/RHX_sapmnt /mnt/RHX_ASCS00 /mnt/RHX_ERS10
  ```

The SAP installation process will automatically set the correct ownership and permissions for operational use.

Unmount the temporary mount:

```
# umount /mnt
```

## Create mount point directories


This is applicable to both cluster nodes. Create the directories for the required mount points (permanent or cluster controlled):

```
# mkdir /sapmnt
# mkdir /usr/sap/<SID>/ASCS<ascs_sys_nr>
# mkdir /usr/sap/<SID>/ERS<ers_sys_nr>
```
+  *Example using values from [Parameter Reference](sap-nw-pacemaker-rhel-parameters.md) *:

  ```
  # mkdir /sapmnt
  # mkdir /usr/sap/RHX/ASCS00
  # mkdir /usr/sap/RHX/ERS10
  ```

## Update /etc/fstab


This is applicable to both cluster nodes. `/etc/fstab` is a configuration table containing the details required for mounting and unmounting file systems to a host.

Add the file systems not managed by the cluster to `/etc/fstab`.

For both **simple-mount** and **classic** architectures, prepare and append an entry for the `sapmnt` file system to `/etc/fstab`:

```
<nfs.fqdn>/<SID>_sapmnt    /sapmnt    nfs    nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport    0    0
```

 **Simple-mount only** – prepare and append entries for the ASCS and ERS file systems to `/etc/fstab`:

```
<nfs.fqdn>:/<SID>_ASCS<ascs_sys_nr>   /usr/sap/<SID>/ASCS<ascs_sys_nr>  nfs    nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport    0    0
<nfs.fqdn>:/<SID>_ERS<ers_sys_nr>     /usr/sap/<SID>/ERS<ers_sys_nr>    nfs    nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport    0    0
```
+  *Example using values from [Parameter Reference](sap-nw-pacemaker-rhel-parameters.md) *:

  ```
  fs-xxxxxxxxxxxxxefs1.efs.us-east-1.amazonaws.com:/RHX_sapmnt    /sapmnt               nfs    nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport    0    0
  fs-xxxxxxxxxxxxxefs1.efs.us-east-1.amazonaws.com:/RHX_ASCS00    /usr/sap/RHX/ASCS00   nfs    nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport    0    0
  fs-xxxxxxxxxxxxxefs1.efs.us-east-1.amazonaws.com:/RHX_ERS10     /usr/sap/RHX/ERS10    nfs    nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport    0    0
  ```

Verify that your mount options are:
+ Compatible with your operating system version
+ Supported by your chosen NFS file system type (EFS or FSx for ONTAP)
+ Aligned with current SAP recommendations

Consult SAP and AWS documentation for the latest mount option recommendations.

Use the following command to mount the file systems defined in `/etc/fstab`:

```
# mount -a
```

Use the following command to check that the required file systems are available:

```
# df -h
```

## Temporarily mount ASCS and ERS directories for installation (classic only)


This is only applicable to the classic architecture. Simple-mount architecture has these directories permanently available in `/etc/fstab`.

Mount ASCS and ERS directories for installation.

Use the following command on the instance where you plan to install ASCS:

```
# mount <nfs.fqdn>:/<SID>_ASCS<ascs_sys_nr>  /usr/sap/<SID>/ASCS<ascs_sys_nr>
```

Use the following command on the instance where you plan to install ERS:

```
# mount <nfs.fqdn>:/<SID>_ERS<ers_sys_nr>  /usr/sap/<SID>/ERS<ers_sys_nr>
```
+  *Example using values from [Parameter Reference](sap-nw-pacemaker-rhel-parameters.md) *:

  ```
  # mount fs-xxxxxxxxxxxxxefs1.efs.us-east-1.amazonaws.com:/RHX_ASCS00  /usr/sap/RHX/ASCS00
  # mount fs-xxxxxxxxxxxxxefs1.efs.us-east-1.amazonaws.com:/RHX_ERS10   /usr/sap/RHX/ERS10
  ```

# Check IP availability and resolution


## Add Overlay IP for SAP Installation


SAP Installation should be done using the virtual names assigned to the overlay IP. Before adding the overlay IPs to the instances, ensure that the VPC route table entries have been created as described in [Add VPC Route Table Entries for Overlay IPs](sap-nw-pacemaker-rhel-infra-setup.md#rt-rhel).

To facilitate SAP installation, manually add the Overlay IPs to the instances:

1. To the instance where you intend to install the **ASCS** 

   ```
   # ip addr add <ascs_overlayip>/32 dev eth0
   ```

1. To the instance where you intend to install the **ERS** 

   ```
   # ip addr add <ers_overlayip>/32 dev eth0
   ```

Note the following:
+ Route table entries for the overlay IPs must be created first (see [Add VPC Route Table Entries for Overlay IPs](sap-nw-pacemaker-rhel-infra-setup.md#rt-rhel))
+ This IP configuration is temporary and will be lost after instance reboot
+ The cluster will take over management of these IPs once configured

## Hostname Resolution


You must ensure that all instances can resolve all hostnames in use. Add the hostnames for cluster nodes to `/etc/hosts` file on all cluster nodes. This ensures that hostnames for cluster nodes can be resolved even in case of DNS issues. Configure the `/etc/hosts` file for a two-node cluster:

```
# cat /etc/hosts
<primary_ip_1> <hostname_1>.example.com <hostname_1>
<primary_ip_2> <hostname_2>.example.com <hostname_2>
<ascs_overlayip> <ascs_virt_hostname>.example.com <ascs_virt_hostname>
<ers_overlayip> <ers_virt_hostname>.example.com <ers_virt_hostname>
```
+  *Example using values from [Parameter Reference](sap-nw-pacemaker-rhel-parameters.md) *:

  ```
  # cat /etc/hosts
  10.1.10.1 rhxhost01.example.com rhxhost01
  10.1.20.1 rhxhost02.example.com rhxhost02
  172.16.30.5 rhxascs.example.com rhxascs
  172.16.30.6 rhxers.example.com rhxers
  ```

In this configuration, the secondary IPs used for the second cluster ring are not mentioned. They are only used in the cluster configuration. You can allocate virtual hostnames for administration and identification purposes.

**Important**  
The Overlay IP is out of VPC range, and cannot be reached from locations not associated with the route table, including on-premises.

# Install SAP


The following topics provide information about installing SAP on AWS Cloud in a highly available cluster. Review SAP Documentation for more details.

**Topics**
+ [

## Final checks for software provisioning
](#final-checks-software-provisioning-nw-rhel)
+ [

## Install SAP ASCS and ERS instances
](#install-sap-instances-nw-rhel)
+ [

## Kernel upgrade and ENSA2 – optional
](#kernel-ensa2-nw-rhel)
+ [

## Check SAP host agent version
](#check-host-agent-nw-rhel)

## Final checks for software provisioning


Before running SAP Software Provisioning Manager (SWPM), ensure that the following prerequisites are consistent across both cluster nodes:
+ Collect any missing details and populate the [Parameter Reference](sap-nw-pacemaker-rhel-parameters.md) section to ensure clarity on the specific values used in installation commands.
+  **User and Group Configuration** - If operating system groups are pre-defined, ensure matching UID and GID values for `<sid>adm` and `sapsys` across both cluster nodes.
+  **Installation Software** - Download the latest version of Software Provisioning Manager (SWPM) and SAP installation media for your SAP release from [Software Provisioning Manager](https://support.sap.com/en/tools/software-logistics-tools/software-provisioning-manager.html).
+  **Network Configuration** - Verify both cluster nodes have identical configuration with all routes, overlay IPs, and virtual hostnames accessible. This ensures that either node can run ASCS or ERS roles.
+  **File Systems** - Verify all shared file systems are mounted and accessible from both nodes with consistent mount points and permissions.

## Install SAP ASCS and ERS instances


Install the SAP ASCS and ERS instances using their virtual hostnames to ensure installation against the overlay IP addresses. This approach is required for proper cluster integration.

Install the ASCS instance on `<instance_id_1>` using virtual hostname `<ascs_virt_hostname>` with the `SAPINST_USE_HOSTNAME` parameter. This ensures the installation uses the overlay IP rather than the physical hostname:

 *Example using values from [Parameter Reference](sap-nw-pacemaker-rhel-parameters.md) *:

```
# <swpm location>/sapinst SAPINST_USE_HOSTNAME=<ascs_virt_hostname>
```

Install the ERS instance on `<instance_id_2>` using virtual hostname `<ers_virt_hostname>` with the `SAPINST_USE_HOSTNAME` parameter. This ensures the installation uses the overlay IP rather than the physical hostname:

```
# <swpm location>/sapinst SAPINST_USE_HOSTNAME=<ers_virt_hostname>
```

Once the ASCS and ERS installations are complete, you will need to install and configure the database and SAP Primary Application Server (PAS) - these components are not covered in this cluster setup documentation. Optionally, you can also install and configure Additional Application Server (AAS). For more details on installing these SAP NetWeaver components, refer to SAP Help Portal.

For additional information on unattended installation options, see [SAP Note 2230669 – System Provisioning Using an Input Parameter File](https://me.sap.com/notes/2230669) (requires SAP portal access).

## Kernel upgrade and ENSA2 – optional


As of AS ABAP Release 7.53 (ABAP Platform 1809), the new Standalone Enqueue Server 2 (ENSA2) is installed by default. ENSA2 replaces the previous version – ENSA1.

If you have an older version of SAP NetWeaver, consider following the SAP guidance to upgrade the kernel and update the Enqueue Server configuration. An upgrade will allow you to take advantage of the features available in the latest version. For more information, see the following SAP Notes (require SAP portal access):
+  [SAP Note 2630416 – Support for Standalone Enqueue Server 2](https://me.sap.com/notes/2630416) 
+  [SAP Note 2711036 – Usage of the Standalone Enqueue Server 2 in an HA Environment](https://me.sap.com/notes/2711036) 

## Check SAP host agent version


This is applicable to both cluster nodes. The SAP host agent is used for system instance control and monitoring. This agent is used by SAP cluster resource agents and hooks. It is recommended that you have the latest version installed on both instances. For more details, see [SAP Note 2219592 – Upgrade Strategy of SAP Host Agent](https://me.sap.com/notes/2219592).

Use the following command to check the version of the host agent:

```
# /usr/sap/hostctrl/exe/saphostexec -version
```

# Configure SAP for Cluster Control


Modify SAP service configurations, user permissions, and system integration settings to enable proper cluster control of ASCS and ERS instances.

**Topics**
+ [

## Add <sid>adm to haclient group
](#add-sidadm-haclient-nw-rhel)
+ [

## Modify SAP profiles for start operations and cluster hook
](#modify-sap-profiles-nw-rhel)
+ [

## Enable sapping and sappong Services (Simple-Mount Only)
](#sapping-sappong-services-nw-rhel)
+ [

## Ensure ASCS and ERS SAP Services can run on either node (systemd)
](#modify-sapservices-nw-rhel)
+ [

## Configure dependencies for Pacemaker and SAP services (systemd)
](#configure-systemd-deps-nw-rhel)
+ [

## (Alternative) Ensure ASCS and ERS SAP Services can run on either node (sysV)
](#modify-sapservices-sysv-nw-rhel)

## Add <sid>adm to haclient group


This is applicable to both cluster nodes. An `haclient` operating system group is created when the cluster connector package is installed. Adding the `<sid>adm` user to this group ensures that your cluster has necessary access. Run the following command as root:

```
# usermod -a -G haclient <sid>adm
```
+  *Example using values from [Parameter Reference](sap-nw-pacemaker-rhel-parameters.md) *:

  ```
  # usermod -a -G haclient rhxadm
  ```

## Modify SAP profiles for start operations and cluster hook


This action ensures that there is compatibility between the SAP start framework and cluster actions. Modify SAP profiles to change the start behavior of the SAP instance and processes. Ensure that `sapcontrol` is aware that the system is being managed by a pacemaker cluster.
+ ASCS profile – `/usr/sap/<SID>/SYS/profile/<SID>_ASCS<ascs_sys_nr>_<ascs_virt_hostname>` 
+ ERS profile – `/usr/sap/<SID>/SYS/profile/<SID>_ERS<ers_sys_nr>_<ers_virt_hostname>` 

The profile directory /usr/sap/<SID>/SYS/profile/ is typically a symbolic link to /sapmnt/<SID>/profile/ on the shared NFS filesystem. This means profile modifications made on one node are immediately visible on all cluster nodes. You can modify the profiles from either node.
+  *Example using values from [Parameter Reference](sap-nw-pacemaker-rhel-parameters.md) *:
  + ASCS profile example – `/usr/sap/RHX/SYS/profile/RHX_ASCS00_rhxascs` 
  + ERS profile example – `/usr/sap/RHX/SYS/profile/RHX_ERS10_rhxers` 

Follow the procedure outlined below to make the necessary changes:

1.  **Program or process start behavior** – In case of failure, processes must be restarted. Determining where the process starts and in what order needs to be controlled by the cluster, and not SAP start framework behavior defined in the profiles. Your locks can be lost if this parameter is not changed. In newer SAP installations, the profiles may already contain `Start_Program_XX` instead of `Restart_Program_XX`. If `Start_Program_XX` is already present, no changes are needed for this step.  
**Example**  

------
#### [ ENSA1 ]

    **ASCS** 

   ```
   #For ENSA1 (_EN)
   #Changing Restart to Start for Cluster compatibility
   #Old value: Restart_Program_XX = local $(_EN) pf=$(_PF)
   
   Start_Program_XX = local $(_EN) pf=$(_PF)
   ```

    **ERS** 

   ```
   #For ENSA1 (_ER)
   #Changing Restart to Start for Cluster compatibility
   #Old value: Restart_Program_XX = local $(_ER) pf=$(_PFL)NR=$(SCSID)
   
   Start_Program_XX = local $(_ER) pf=$(_PFL) NR=$(SCSID)
   ```

    *`XX` indicates the start-up order. This value may be different in your install; retain the unchanged value.* 

------
#### [ ENSA2 ]

    **ASCS** 

   ```
   #For ENSA2 (_ENQ)
   #Changing Restart to Start for Cluster compatibility
   #Old value: Restart_Program_XX = local $(_ENQ) pf=$(_PF)
   
   Start_Program_XX = local $(_ENQ) pf=$(_PF)
   ```

    **ERS** 

   ```
   #For ENSA2 (_ENQR)
   #Changing Restart to Start for Cluster compatibility
   #Old value: Restart_Program_XX = local $(_ENQR) pf=$(_PFL)NR=$(SCSID)
   
   Start_Program_XX = local $(_ENQR) pf=$(_PFL) NR=$(SCSID)
   ```

    *`XX` indicates the start order. This value may be different in your install; retain the unchanged value.* 

------

1.  **Disable instance auto start in both profiles** – When an instance restarts, SAP start framework should not start ASCS and ERS automatically. Add the following parameter on both profiles to prevent an auto start:

   ```
   # Disable instance auto start
   Autostart = 0
   ```

1.  **Add cluster connector details in both profiles** – The connector integrates the SAP start and control frameworks of SAP NetWeaver with Red Hat cluster to assist with maintenance and awareness of state. Add the following parameters on both profiles:

   ```
   # Added for Cluster Connectivity
   service/halib = $(DIR_EXECUTABLE)/saphascriptco.so
   service/halib_cluster_connector = /usr/bin/sap_cluster_connector
   ```
**Important**  
RPM package `sap-cluster-connector` has *dashes*. The executable `/usr/bin/sap_cluster_connector` available after installation has *underscores*. Ensure that the correct name, that is executable `/usr/bin/sap_cluster_connector`, is used in both profiles.

1.  **Restart services** – Restart SAP services for ASCS and ERS to ensure that the preceding settings take effect. Adjust the system number to match the service.

    **ASCS** 

   ```
   # /usr/sap/hostctrl/exe/sapcontrol -nr 00 -function RestartService
   ```

    **ERS** 

   ```
   # /usr/sap/hostctrl/exe/sapcontrol -nr <ers_sys_nr> -function RestartService
   ```
   +  *Example using values from [Parameter Reference](sap-nw-pacemaker-rhel-parameters.md) *:

      **ASCS** 

     ```
     # /usr/sap/hostctrl/exe/sapcontrol -nr 00 -function RestartService
     ```

      **ERS** 

     ```
     # /usr/sap/hostctrl/exe/sapcontrol -nr 10 -function RestartService
     ```

1.  **Check integration using `sapcontrol` ** – `sapcontrol` includes functions: `HACheckConfig` and `HACheckFailoverConfig`. These functions can be used to check configuration, including awareness of the cluster connector. These checks have limited value before the cluster is configured, but you can run HACheckFailoverConfig to ensure the base configuration is in place.

    **ASCS** 

   ```
   # /usr/sap/hostctrl/exe/sapcontrol -nr <ascs_sys_nr> -function HACheckFailoverConfig
   ```
   +  *Example using values from [Parameter Reference](sap-nw-pacemaker-rhel-parameters.md) *:

      **ASCS** 

     ```
     # /usr/sap/hostctrl/exe/sapcontrol -nr 00 -function HACheckFailoverConfig
     
     10.10.2025 01:23:55
     HACheckFailoverConfig
     OK
     state, category, description, comment
     SUCCESS, SAP CONFIGURATION, SAPInstance RA sufficient version, SAPInstance includes is-ers patch
     ```

## Enable sapping and sappong Services (Simple-Mount Only)


For simple-mount architecture, enable the sapping and sappong systemd services on both cluster nodes. These services ensure proper SAP instance startup coordination between systemd and the cluster.

The sapping service runs before sapinit during boot and temporarily hides the `/usr/sap/sapservices` file to prevent automatic SAP instance startup. The sappong service runs after sapinit and restores the sapservices file, making it available for cluster management while maintaining compatibility with SAP management tools.

```
# systemctl enable sapping
# systemctl enable sappong
```

Verify the services are enabled:

```
# systemctl status sapping
# systemctl status sappong
```

**Note**  
Both services will show "inactive (dead)" status, which is normal for one-shot services that only run during system boot.

## Ensure ASCS and ERS SAP Services can run on either node (systemd)


This is applicable to both cluster nodes.

To ensure that the cluster can orchestrate availability by starting and stopping instances on either cluster node, the SAP Services must be registerd on both nodes and auto-start should be disabled.

In recent Operating System and SAP kernel versions, SAP offers systemd integration for sapstartsrv which controls how SAP instances are stopped and started. This is the recommended configuration and a requirement for Simple Mount Configuration.

For more details, see the following SAP Notes (require SAP portal access):
+  [SAP Note 3139184 – Linux: systemd integration for sapstartsrv and SAP Host Agent](https://me.sap.com/notes/3139184) 
+  [SAP Note 3115048 – sapstartsrv with native Linux systemd support](https://me.sap.com/notes/3115048) 

You can confirm whether systemd is in place by running the following command. Systemd is in place if SAP Services (e.g., SAPRHX\$100.service, SAPRHX\$110.service) are listed.

```
# systemctl list-unit-files SAP*
```

If you have installed an ASCS or ERS on this host but no SAP Services are returned, the classic SysV init may be in use. In that case you can skip to section [(Alternative) Ensure ASCS and ERS SAP Services can run on either node (sysV)](#modify-sapservices-sysv-nw-rhel) 

1.  **On the instance where the ASCS was installed** 

   Register the missing ERS service on the node where you have installed ASCS.

   1. Temporarily mount the ERS directory (classic only):

      ```
      # mount <nfs.fqdn>:/<SID>_ERS<ers_sys_nr>  /usr/sap/<SID>/ERS<ers_sys_nr>
      ```

   1. Register the ERS service:

      ```
      # export LD_LIBRARY_PATH=/usr/sap/<SID>/ERS<ers_sys_nr>/exe
      # /usr/sap/<SID>/ERS<ers_sys_nr>/exe/sapstartsrv pf=/usr/sap/<SID>/SYS/profile/<SID>_ERS<ers_sys_nr>_<ers_virt_hostname> -reg
      # systemctl start SAP<SID>_<ers_sys_nr>
      ```

   1. Check the existence and state of SAP services (example):

      ```
      # systemctl list-unit-files SAP*
      UNIT FILE                    STATE   VENDOR PRESET
      SAPRHX_00.service           disabled disabled
      SAPRHX_10.service           disabled disabled
      SAP.slice                   static  -
      3 unit files listed.
      ```

   1. If the state is not disabled, run the following command to disable `sapservices` integration for `SAP<SID>_<ascs_sys_nr>` and `SAP<SID>_<ers_sys_nr>` on both nodes:
**Important**  
Stopping these services also stops the associated SAP instances.

      ```
      # systemctl stop SAP<SID>_<ascs_sys_nr>.service
      # systemctl disable SAP<SID>_<ascs_sys_nr>.service
      # systemctl stop SAP<SID>_<ers_sys_nr>.service
      # systemctl disable SAP<SID>_<ers_sys_nr>.service
      ```

   1. Unmount the ERS directory (classic only):

      ```
      # umount /usr/sap/<SID>/ERS<ers_sys_nr>
      ```
      +  *Example using values from [Parameter Reference](sap-nw-pacemaker-rhel-parameters.md) *:

        ```
        # mount <nfs.fqdn>:/RHX_ERS10  /usr/sap/RHX/ERS10
        # export LD_LIBRARY_PATH=/usr/sap/RHX/ERS10/exe
        # /usr/sap/RHX/ERS10/exe/sapstartsrv pf=/usr/sap/RHX/SYS/profile/RHX_ERS10_rhxers -reg
        # systemctl start SAPRHX_10
        # systemctl stop SAPRHX_00.service
        # systemctl disable SAPRHX_00.service
        # systemctl stop SAPRHX_10.service
        # systemctl disable SAPRHX_10.service
        # umount /usr/sap/RHX/ERS10
        ```

1.  **On the instance where the ERS was installed** 

   Register the missing ASCS service on the node where you have installed ERS.

   1. Temporarily mount the ASCS directory (classic only):

      ```
      # mount <nfs.fqdn>:/<SID>_ASCS<ascs_sys_nr> /usr/sap/<SID>/ASCS<ascs_sys_nr>
      ```

   1. Register the ASCS service:

      ```
      # export LD_LIBRARY_PATH=/usr/sap/<SID>/ASCS<ascs_sys_nr>/exe
      # /usr/sap/<SID>/ASCS<ascs_sys_nr>/exe/sapstartsrv pf=/usr/sap/<SID>/SYS/profile/<SID>_ASCS<ascs_sys_nr>_<ascs_virt_hostname> -reg
      # systemctl start SAP<SID>_<ascs_sys_nr>
      ```

   1. Check the existence and state of SAP services (example):

      ```
      # systemctl list-unit-files SAP*
      UNIT FILE                    STATE   VENDOR PRESET
      SAPRHX_00.service           disabled disabled
      SAPRHX_10.service           disabled disabled
      SAP.slice                   static   -
      3 unit files listed.
      ```

   1. If the state is not disabled, run the following command to disable `sapservices` integration for `SAP<SID>_<ascs_sys_nr>` and `SAP<SID>_<ers_sys_nr>` on both nodes:
**Important**  
Stopping these services also stops the associated SAP instances.

      ```
      # systemctl stop SAP<SID>_<ascs_sys_nr>.service
      # systemctl disable SAP<SID>_<ascs_sys_nr>.service
      # systemctl stop SAP<SID>_<ers_sys_nr>.service
      # systemctl disable SAP<SID>_<ers_sys_nr>.service
      ```

   1. Unmount the ASCS directory (classic only):

      ```
      # umount /usr/sap/<SID>/ASCS<ascs_sys_nr>
      ```
      +  *Example using values from [Parameter Reference](sap-nw-pacemaker-rhel-parameters.md) *:

        ```
        # mount <nfs.fqdn>:/RHX_ASCS00 /usr/sap/RHX/ASCS00
        # export LD_LIBRARY_PATH=/usr/sap/RHX/ASCS00/exe
        # /usr/sap/RHX/ASCS00/exe/sapstartsrv pf=/usr/sap/RHX/SYS/profile/RHX_ASCS00_rhxascs -reg
        # systemctl start SAPRHX_00
        # systemctl stop SAPRHX_00.service
        # systemctl disable SAPRHX_00.service
        # systemctl stop SAPRHX_10.service
        # systemctl disable SAPRHX_10.service
        # umount /usr/sap/RHX/ASCS00
        ```

## Configure dependencies for Pacemaker and SAP services (systemd)


This step is required on both cluster nodes when using systemd integration.

When an EC2 instance shuts down unexpectedly, Pacemaker (the cluster resource manager) may trigger unnecessary fencing actions because it cannot distinguish between planned SAP service shutdowns and system failures. To prevent this, configure systemd dependencies that inform Pacemaker about the relationship between SAP services and cluster operations.

Create a systemd drop-in configuration for the `resource-agents-deps.target`, which is a systemd target that Pacemaker uses to understand external service dependencies:

```
# mkdir -p /etc/systemd/system/resource-agents-deps.target.d/
# cd /etc/systemd/system/resource-agents-deps.target.d/

# cat > sap_systemd_<sid>.conf <<_EOF
[Unit]
Requires=sapinit.service
After=sapinit.service
After=SAP<SID>_<ascs_sys_nr>.service
After=SAP<SID>_<ers_sys_nr>.service
_EOF

# systemctl daemon-reload
```
+  *Example using values from [Parameter Reference](sap-nw-pacemaker-rhel-parameters.md) *:

  ```
  # cat > sap_systemd_rhx.conf <<_EOF
  [Unit]
  Requires=sapinit.service
  After=sapinit.service
  After=SAPRHX_00.service
  After=SAPRHX_10.service
  _EOF
  
  # systemctl daemon-reload
  ```

## (Alternative) Ensure ASCS and ERS SAP Services can run on either node (sysV)


This is only applicable for if systemd integration is not in place.

To ensure that SAP instance can be managed by the cluster and also manually during planned maintenance activities, add the missing entries for ASCS and ERS `sapstartsrv` service in `/usr/sap/sapservices` file on both cluster nodes (ASCS and ERS host). Copy the missing entry from both hosts. Post-modifications, the `/usr/sap/sapservices` file looks as follows on both hosts:

```
#!/bin/sh
LD_LIBRARY_PATH=/usr/sap/<SID>/ASCS<ascs_sys_nr>/exe:$LD_LIBRARY_PATH; export LD_LIBRARY_PATH; /usr/sap/<SID>/ASCS<ascs_sys_nr>/exe/sapstartsrv pf=/usr/sap/<SID>/SYS/profile/<SID>_ASCS<ascs_sys_nr>_<ascs_virt_hostname> -D -u <sid>adm
LD_LIBRARY_PATH=/usr/sap/<SID>/ERS<ers_sys_nr>/exe:$LD_LIBRARY_PATH; export LD_LIBRARY_PATH; /usr/sap/<SID>/ERS<ers_sys_nr>/exe/sapstartsrv pf=/usr/sap/<SID>/SYS/profile/<SID>_ERS<ers_sys_nr>_<ers_virt_hostname> -D -u <sid>adm
```
+  *Example using values from [Parameter Reference](sap-nw-pacemaker-rhel-parameters.md) *:

  ```
  #!/bin/sh
  LD_LIBRARY_PATH=/usr/sap/RHX/ASCS00/exe:$LD_LIBRARY_PATH; export LD_LIBRARY_PATH; /usr/sap/RHX/ASCS00/exe/sapstartsrv pf=/usr/sap/RHX/SYS/profile/RHX_ASCS00_rhxascs -D -u rhxadm
  LD_LIBRARY_PATH=/usr/sap/RHX/ERS10/exe:$LD_LIBRARY_PATH; export LD_LIBRARY_PATH; /usr/sap/RHX/ERS10/exe/sapstartsrv pf=/usr/sap/RHX/SYS/profile/RHX_ERS10_rhxers -D -u rhxadm
  ```

# Cluster Node Setup


Establish cluster communication between nodes using Corosync and configure required authentication.

**Topics**
+ [

## Change the hacluster Password
](#change-hacluster-password-nw-rhel)
+ [

## Setup Passwordless Authentication
](#setup-passwordless-auth-nw-rhel)
+ [

## Start and Enable the pcsd Service
](#start-pcsd-service-nw-rhel)
+ [

## Authorize the Cluster
](#configure-cluster-nodes-nw-rhel)
+ [

## Generate Corosync Configuration
](#generate-corosync-config-nw-rhel)
+ [

## Start and Verify the Cluster
](#start-cluster-nw-rhel)
+ [

## Configure Cluster Services
](#configure-cluster-services-nw-rhel)
+ [

## Verify Cluster Status
](#verify-cluster-status-nw-rhel)

## Change the hacluster Password


On all cluster nodes, change the password of the operating system user hacluster:

```
# passwd hacluster
```

## Setup Passwordless Authentication


Red Hat cluster tools provide comprehensive reporting and troubleshooting capabilities for cluster activity. Many of these tools require passwordless SSH access between nodes to collect cluster-wide information effectively. Red Hat recommends configuring passwordless SSH for the root user to enable seamless cluster diagnostics and reporting.

For more details, see Red Hat Documentation [How to setup SSH Key passwordless login in Red Hat Enterprise Linux](https://access.redhat.com/solutions/9194).

**Warning**  
Review the security implications for your organization, including root access controls and network segmentation, before implementing this configuration.

## Start and Enable the pcsd Service


On all cluster nodes, enable and start the pcsd service:

```
# systemctl enable pcsd --now
```

## Authorize the Cluster


Run the following command to authenticate the cluster nodes. You will be prompted for the hacluster password you set earlier:

```
# pcs host auth <hostname_1> <hostname_2> -u hacluster -p <password>
```
+  *Example using values from [Parameter Reference](sap-nw-pacemaker-rhel-parameters.md) *:

  ```
  # pcs host auth rhxhost01 rhxhost02 -u hacluster -p <password>
  ```

## Generate Corosync Configuration


Corosync provides membership and member-communication needs for high availability clusters. Initial setup can be performed using the following command with dual network rings for redundant communication:

```
# pcs cluster setup <cluster_name> \
<hostname_1> addr=<host_ip_1> addr=<host_additional_ip_1> \
<hostname_2> addr=<host_ip_2> addr=<host_additional_ip_2>
```
+  *Example using values from [Parameter Reference](sap-nw-pacemaker-rhel-parameters.md) *:

  ```
  # pcs cluster setup myCluster rhxhost01 addr=10.1.10.1 addr=10.1.10.2 rhxhost02 addr=10.1.20.1 addr=10.1.20.2
  Destroying cluster on hosts: 'rhxhost01', 'rhxhost02'...
  rhxhost01: Successfully destroyed cluster
  rhxhost02: Successfully destroyed cluster
  Requesting remove 'pcsd settings' from 'rhxhost01', 'rhxhost02'
  rhxhost01: successful removal of the file 'pcsd settings'
  rhxhost02: successful removal of the file 'pcsd settings'
  Sending 'corosync authkey', 'pacemaker authkey' to 'rhxhost01', 'rhxhost02'
  rhxhost01: successful distribution of the file 'corosync authkey'
  rhxhost01: successful distribution of the file 'pacemaker authkey'
  rhxhost02: successful distribution of the file 'corosync authkey'
  rhxhost02: successful distribution of the file 'pacemaker authkey'
  Sending 'corosync.conf' to 'rhxhost01', 'rhxhost02'
  rhxhost01: successful distribution of the file 'corosync.conf'
  rhxhost02: successful distribution of the file 'corosync.conf'
  Cluster has been successfully set up.
  ```

The timing parameters are optimized for AWS cloud environments. Update the token timeout to provide reliable cluster operation while accommodating normal cloud network characteristics:

```
# pcs cluster config update totem token=15000
```

## Start and Verify the Cluster


Start the cluster on all nodes:

```
# pcs cluster start --all
```

**Note**  
By enabling the pacemaker service, the server automatically joins the cluster after a reboot. This ensures that your system is protected. Alternatively, you can start the pacemaker service manually on boot to investigate the cause of any failure.

Run the following command to check the cluster status:

```
# pcs status
```

Example output:

```
Cluster name: myCluster

WARNINGS:
No stonith devices and stonith-enabled is not false

Cluster Summary:
  * Stack: corosync
  * Current DC: rhxhost01 (version 2.1.2-4.el9_0.5-ada5c3b36e2) - partition with quorum
  * Last updated: Fri Oct 24 06:35:46 2025
  * Last change:  Fri Oct 24 06:26:38 2025 by hacluster via crmd on rhxhost01
  * 2 nodes configured
  * 0 resource instances configured

Node List:
  * Online: [ rhxhost01 rhxhost02 ]

Full List of Resources:
  * No resources

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
```

Both cluster nodes must show up as online. You can find the ring status and the associated IP addresses of the cluster with the corosync-cfgtool command:

```
# corosync-cfgtool -s
```

Example output:

```
Local node ID 1, transport knet
LINK ID 0 udp
        addr    = 10.1.10.114
        status:
                nodeid:          1:     localhost
                nodeid:          2:     connected
LINK ID 1 udp
        addr    = 10.1.10.215
        status:
                nodeid:          1:     localhost
                nodeid:          2:     connected
```

Both network rings should report "active with no faults". If either ring is missing, review the corosync configuration and check that `/etc/corosync/corosync.conf` changes have been synced to the secondary node. You may need to do this manually. Restart the cluster if needed.

## Configure Cluster Services


Enable pacemaker to start automatically after reboot:

```
# pcs cluster enable --all
```

Enabling pacemaker also handles corosync through service dependencies. The cluster will start automatically after reboot. For troubleshooting scenarios, you can choose to manually start services after boot instead.

## Verify Cluster Status


 **1. Check pacemaker service status:** 

```
# systemctl status pacemaker
```

 **2. Verify cluster status:** 

```
# pcs status
```

 *Example output*:

```
Cluster name: myCluster
Cluster Summary:
  * Stack: corosync
  * Current DC: rhxhost01 (version 2.1.5+20221208.a3f44794f) - partition with quorum
  * 2 nodes configured
  * 0 resource instances configured

Node List:
  * Online: [ rhxhost01 rhxhost02 ]

Full List of Resources:
  * No resources
```

# Cluster Configuration


The following sections provide details on the resources, groups and constraints necessary to ensure high availability of SAP Central Services.

**Topics**
+ [

## Prepare for Resource Creation
](#prepare-resource-nw-rhel)
+ [

## Cluster Bootstrap
](#cluster-bootstrap-nw-rhel)
+ [

## Create STONITH Fencing Resource
](#create-stonith-ec2-nw-rhel)
+ [

## SAP Resource Groups and Ordering
](#resource-groups-nw-rhel)
+ [

## Create Filesystem resources (classic only)
](#filesystem-resources-nw-rhel)
+ [

## Create overlay IP resources
](#overlay-ip-resources-nw-rhel)
+ [

## Create SAPStartSrv resources (simple-mount only)
](#sapstartsrv-resources-nw-rhel)
+ [

## Create SAPInstance resources (simple-mount only)
](#sap-resources-simple-nw-rhel)
+ [

## Create SAPInstance resources (classic only)
](#sap-resources-classic-nw-rhel)
+ [

## Review ASCS Resource group and modify stickiness.
](#resource-groups-review-nw-rhel)
+ [

## Create resource constraints
](#resource-constraints-nw-rhel)
+ [

## Reset Configuration – Optional
](#reset-config-nw-rhel)

## Prepare for Resource Creation


To ensure that the cluster does not perform any unexpected actions during setup of resources and configuration, set the maintenance mode to true.

Run the following command to put the cluster in maintenance mode:

```
# pcs property set maintenance-mode=true
```

To verify the current maintenance state:

```
$ pcs status
```

**Note**  
There are two types of maintenance mode:  
Cluster-wide maintenance (set with `pcs property set maintenance-mode=true`)
Node-specific maintenance (set with `pcs node maintenance nodename`)
Always use cluster-wide maintenance mode when making configuration changes. For node-specific operations like hardware maintenance, refer to the Operations section for proper procedures.  
To disable maintenance mode after configuration is complete:  

```
# pcs property set maintenance-mode=false
```

## Cluster Bootstrap


### Configure Cluster Properties


Configure cluster properties to establish fencing behavior and resource failover settings:

```
# pcs property set stonith-enabled="true"
# pcs property set stonith-timeout="600"
# pcs property set priority-fencing-delay="20"
```
+ The **priority-fencing-delay** is recommended for protecting the SAP ASCS nodes during network partitioning events. When a cluster partition occurs, this delay gives preference to nodes hosting higher priority resources, with the ASCS receiving additional priority weighting over the ERS . This helps ensure the ASCS node survives in split-brain scenarios. The recommended 20 second priority-fencing-delay works in conjunction with the pcmk\$1delay\$1max (10 seconds) configured in the stonith resource, providing a total potential delay of up to 30 seconds before fencing occurs

To verify your cluster property settings:

```
# pcs property config
# pcs property config <property_name>
```

### Configure Resource Defaults


Configure resource default behaviors:

------
#### [ RHEL 8.4 and above ]

```
# pcs resource defaults update resource-stickiness="1"
# pcs resource defaults update migration-threshold="3"
# pcs resource defaults update failure-timeout="600s"
```

------
#### [ RHEL 7.x and RHEL 8.0 to 8.3 ]

```
# pcs resource defaults resource-stickiness="1"
# pcs resource defaults migration-threshold="3"
# pcs resource defaults failure-timeout="600s"
```
+ The **resource-stickiness** value of 1 encourages the ASCS resource to stay on its current node, avoiding unnecessary resource movement.
+ The **migration-threshold** causes a resource to move to a different node after 3 consecutive failures, ensuring timely failover when issues persist.
+ The **failure-timeout** automatically removes a failure count after 10 minutes, preventing individual historical failures from accumulating and affecting long-term resource behavior. If testing failover scenarios in quick succession, it may be necessary to manually query and clear accumulated failure counts between tests. Use `pcs resource failcount` and `pcs resource refresh`.

------

Individual resources may override these defaults with their own defined values.

To verify your resource default settings:

```
# pcs resource defaults
```

### Configure Operation Defaults


```
# pcs resource op defaults update timeout="600"
```
+ The **op\$1defaults timeout** ensures all cluster operations have a reasonable default timeout of 600 seconds. Individual resources may override this with their own timeout values.

To verify your operation default settings:

```
# pcs resource op defaults
```

## Create STONITH Fencing Resource


An AWS STONITH resource is required for proper cluster fencing operations. The `fence_aws` resource is recommended for AWS deployments as it leverages the AWS API to safely fence failed or incommunicable nodes by stopping their EC2 instances.

Create the STONITH resource using resource agent ** `fence_aws` **:

```
# pcs stonith create <stonith_resource_name> fence_aws \
pcmk_host_map="<hostname_1>:<instance_id_1>;<hostname_2>:<instance_id_2>" \
region="<aws_region>" \
skip_os_shutdown="true" \
pcmk_delay_max="10" \
pcmk_reboot_timeout="300" \
pcmk_reboot_retries="2" \
op start interval="0" timeout="180" \
op stop interval="0" timeout="180" \
op monitor interval="180" timeout="60"
```

Details:
+  **pcmk\$1host\$1map** - Maps cluster node hostnames to their EC2 instance IDs. This mapping must be unique within the AWS account and follow the format hostname:instance-id, with multiple entries separated by semicolons.
+  **region** - AWS region where the EC2 instances are deployed
+  **pcmk\$1delay\$1max** - Random delay before fencing operations. Works in conjunction with cluster property `priority-fencing-delay` to prevent simultaneous fencing in 2-node clusters. Historically set to higher values, but with `priority-fencing-delay` now handling primary node protection, a lower value (10s) is sufficient. Omit in clusters with real quorum (3\$1 nodes) to avoid unnecessary delay.
+  **pcmk\$1reboot\$1timeout** - Maximum time in seconds allowed for a reboot operation
+  **pcmk\$1reboot\$1retries** - Number of times to retry a failed reboot operation
+  **skip\$1os\$1shutdown** (NEW) - Leverages a new ec2 stop-instance API flag to forcefully stop an EC2 Instance by skipping the shutdown of the Operating System.
  +  [Red Hat Solution 4963741 - fence\$1aws fence action fails with "Timed out waiting to power OFF"](https://access.redhat.com/solutions/4963741) (requires Red Hat Customer Portal access)

------
#### [ ENSA1 ]

 *Example using values from [Parameter Reference](sap-nw-pacemaker-rhel-parameters.md) *:

```
# pcs stonith create rsc_fence_aws fence_aws \
pcmk_host_map="rhxhost01:i-xxxxinstidforhost1;rhxhost02:i-xxxxinstidforhost2" \
region="us-east-1" \
skip_os_shutdown="true" \
pcmk_delay_max="30" \
pcmk_reboot_timeout="120" \
pcmk_reboot_retries="4" \
op start interval="0" timeout="180" \
op stop interval="0" timeout="180" \
op monitor interval="180" timeout="60"
```

------
#### [ ENSA2 ]

 *Example using values from [Parameter Reference](sap-nw-pacemaker-rhel-parameters.md) *:

```
# pcs stonith create rsc_fence_aws fence_aws \
pcmk_host_map="rhxhost01:i-xxxxinstidforhost1;rhxhost02:i-xxxxinstidforhost2" \
region="us-east-1" \
skip_os_shutdown="true" \
pcmk_delay_max="10" \
pcmk_reboot_timeout="120" \
pcmk_reboot_retries="4" \
op start interval="0" timeout="180" \
op stop interval="0" timeout="180" \
op monitor interval="180" timeout="60"
```

------

## SAP Resource Groups and Ordering


When creating the resources for the SAP ASCS and ERS, it is necessary to specify a group.

A cluster resource group is a set of resources that need to be located together, start sequentially, and stopped in the reverse order.

Depending on the configuration pattern the following groups will be created for the ASCS and ERS
+  **Classic**: Filesystem, IP, SAPInstance
+  **SimpleMount**: IP, SAPStartSrv, SAPInstance

Since RHEL 9.4 a new syntax for creating a resource in a group has been introduced in addition to the --group parameter. You receive the following deprecation warning now:

```
Deprecation Warning: Using '--group' is deprecated and will be replaced with 'group' in a future release. Specify --future to switch to the future behavior.
```

## Create Filesystem resources (classic only)


In classic configuration, the mounting and unmounting of file system resources to align with the location of the SAP services is done using cluster resources.

Create **ASCS** file system resources:

```
# pcs resource create rsc_fs_<SID>_ASCS<ascs_sys_nr> ocf:heartbeat:Filesystem \
device="<nfs.fqdn>:/<SID>_ASCS<ascs_sys_nr>" \
directory="/usr/sap/<SID>/ASCS<ascs_sys_nr>" \
fstype="nfs4" \
options="rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2" \
force_unmount="safe" \
fast_stop="no" \
op start timeout="60" interval="0" \
op stop timeout="60" interval="0" \
op monitor interval="20" timeout="40" \
--group "grp_<SID>_ASCS<ascs_sys_nr>"
```

Create **ERS** file system resources:

```
# pcs resource create rsc_fs_<SID>_ERS<ers_sys_nr> ocf:heartbeat:Filesystem \
device="<nfs.fqdn>:/<SID>_ERS<ers_sys_nr>" \
directory="/usr/sap/<SID>/ERS<ers_sys_nr>" \
fstype="nfs4" \
force_unmount="safe" \
fast_stop="no" \
options="rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2" \
op start timeout="60" interval="0" \
op stop timeout="60" interval="0" \
op monitor interval="20" timeout="40" \
--group "grp_<SID>_ERS<ers_sys_nr>"
```
+  *Example using values from [Parameter Reference](sap-nw-pacemaker-rhel-parameters.md) *:

  ```
  # pcs resource create rsc_fs_RHX_ASCS00 ocf:heartbeat:Filesystem \
  device="fs-xxxxxxxxxxxxxefs1.efs.us-east-1.amazonaws.com:/RHX_ASCS00" \
  directory="/usr/sap/RHX/ASCS00" \
  fstype="nfs4" \
  force_unmount="safe" \
  fast_stop="no" \
  options="rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2" \
  op start timeout="60" interval="0" \
  op stop timeout="60" interval="0" \
  op monitor interval="20" timeout="40"
  
  # pcs resource create rsc_fs_RHX_ERS10 ocf:heartbeat:Filesystem \
  device="fs-xxxxxxxxxxxxxefs1.efs.us-east-1.amazonaws.com:/RHX_ERS10" \
  directory="/usr/sap/RHX/ERS10" \
  fstype="nfs4" \
  force_unmount="safe" \
  fast_stop="no" \
  options="rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2" \
  op start timeout="60" interval="0" \
  op stop timeout="60" interval="0" \
  op monitor interval="20" timeout="40"
  ```

 **Notes** 
+ Review the mount options to ensure that they match with your operating system, NFS file system type, and the latest recommendations from SAP.
+ <nfs.fqdn> can either be an alias or the default file system resource name of the NFS or FSx for ONTAP resource. For example, `fs-xxxxxx.efs.xxxxxx.amazonaws.com`.
+  `force_unmount` and `fast_stop` are recommendations for ensuring the filesystem can be quickly unmounted. See Red Hat solutions:
  +  [Red Hat Solution 3357961 - During failover of a pacemaker resources, a Filesystem resource kills processes not using the filesystem](https://access.redhat.com/solutions/3357961) (requires Red Hat customer portal login)
  +  [Red Hat Solution 4801371 - What is the fast\$1stop option for a Filesystem resource in a Pacemaker cluster?](https://access.redhat.com/solutions/4801371) (requires Red Hat customer portal login)

## Create overlay IP resources


The IP resource provides the details necessary to update the route table entry for overlay IP.

Create **ASCS** IP Resource:

```
# pcs resource create rsc_ip_<SID>_ASCS<ascs_sys_nr> ocf:heartbeat:aws-vpc-move-ip \
ip="<ascs_overlayip>" \
routing_table="<routetable_id>" \
interface="eth0" \
profile="<cli_cluster_profile>" \
op start interval="0" timeout="180" \
op stop interval="0" timeout="180" \
op monitor interval="20" timeout="40"
--group "grp_<SID>_ASCS<ascs_sys_nr>"
```

Create **ERS** IP Resource:

```
# pcs resource create rsc_ip_<SID>_ERS<ers_sys_nr> ocf:heartbeat:aws-vpc-move-ip \
ip="<ers_overlayip>" \
routing_table="<routetable_id>" \
interface="eth0" \
profile="<cli_cluster_profile>" \
op start interval="0" timeout="180" \
op stop interval="0" timeout="180" \
op monitor interval="20" timeout="40" \
--group "grp_<SID>_ERS<ers_sys_nr>"
```
+  *Example using values from [Parameter Reference](sap-nw-pacemaker-rhel-parameters.md) *:

  ```
  # pcs resource create rsc_ip_RHX_ASCS00 ocf:heartbeat:aws-vpc-move-ip \
  ip="172.16.30.5" \
  routing_table="rtb-xxxxxroutetable1" \
  interface="eth0" \
  profile="cluster" \
  op start interval="0" timeout="180" \
  op stop interval="0" timeout="180" \
  op monitor interval="20" timeout="40" \
  --group grp_RHX_ASCS00
  
  # pcs resource create rsc_ip_RHX_ERS10 ocf:heartbeat:aws-vpc-move-ip \
  ip="172.16.30.6" \
  routing_table="rtb-xxxxxroutetable1" \
  interface="eth0" \
  profile="cluster" \
  op start interval="0" timeout="180" \
  op stop interval="0" timeout="180" \
  op monitor interval="20" timeout="40"
  ```
+  *Example using values from [Parameter Reference](sap-nw-pacemaker-rhel-parameters.md) *:

  ```
  # pcs resource create rsc_ip_RHX_ASCS00 ocf:heartbeat:aws-vpc-move-ip \
  ip="172.16.30.5" \
  routing_table="rtb-xxxxxroutetable1" \
  interface="eth0" \
  profile="cluster" \
  op start interval="0" timeout="180" \
  op stop interval="0" timeout="180" \
  op monitor interval="20" timeout="40" \
  --group grp_RHX_ASCS00
  
  # pcs resource create rsc_ip_RHX_ERS10 ocf:heartbeat:aws-vpc-move-ip \
  ip="172.16.30.6" \
  routing_table="rtb-xxxxxroutetable1" \
  interface="eth0" \
  profile="cluster" \
  op start interval="0" timeout="180" \
  op stop interval="0" timeout="180" \
  op monitor interval="20" timeout="40"
  ```

 **Notes** 
+ If more than one route table is required for connectivity or because of subnet associations, the `routing_table` parameter can have multiple values separated by a comma. For example, `routing_table=rtb-xxxxxroutetable1,rtb-xxxxxroutetable2`.
+ Additional parameters – `lookup_type` and `routing_table_role` are required for shared VPC. For more information, see \$1https---docs-aws-amazon-com-sap-latest-sap-netweaver-rhel-netweaver-ha-settings-html-rhel-netweaver-ha-shared-vpc\$1[Shared VPC – optional].

## Create SAPStartSrv resources (simple-mount only)


In simple-mount architecture, the `sapstartsrv` process that is used to control start/stop and monitoring of an SAP instance, is controlled by a cluster resource. This new resource adds additional control that removes the requirement for file system resources to be restricted to a single node.

Modify and run the commands in the table to create `sapstartsrv` resource.

Create **ASCS** SAPStartSrv Resource

Use the following command to create an ASCS SAPStartSrv resource.

```
# pcs resource create rsc_sapstart_<SID>_ASCS<ascs_sys_nr> ocf:heartbeat:SAPStartSrv \
InstanceName=<SID>_ASCS<ascs_sys_nr>_<ascs_virt_hostname>
op monitor interval=0 timeout=20 enabled=0
--group grp_<SID>_ASCS<instance>
```

Create **ERS** SAPStartSrv Resource

Use the following command to create an ERS SAPStartSrv resource.

```
# pcs resource create rsc_sapstart_<SID>_ERS<ers_sys_nr> ocf:heartbeat:SAPStartSrv \
InstanceName=<SID>_ERS<ers_sys_nr>_<ers_virt_hostname>
op monitor interval=0 timeout=20 enabled=0
--group grp_<SID>_ERS<ers_sys_nr>
```
+  *Example using values from [Parameter Reference](sap-nw-pacemaker-rhel-parameters.md) *:

  ```
  #crm configure primitive rsc_sapstart_RHX_ASCS00 ocf:heartbeat:SAPStartSrv \
  params \
  InstanceName=RHX_ASCS00_rhxascs \
  op monitor interval=0 timeout=20 enabled=0 \
  --group grp_RHX_ASCS00
  
  #crm configure primitive rsc_sapstart_RHX_ERS10 ocf:heartbeat:SAPStartSrv \
  params \
  InstanceName=RHX_ERS10_rhxers \
  op monitor interval=0 timeout=20 enabled=0 \
  --group grp_RHX_ERS10
  ```

## Create SAPInstance resources (simple-mount only)


The minor difference in creating SAP instance resources between classic and simple-mount configurations is the addition of `MINIMAL_PROBE=true` parameters.

The SAP instance is started and stopped using cluster resources.

**Example**  
Create an **ASCS** SAP instance resource:  

```
# pcs resource create rsc_sap_<SID>_ASCS<ascs_sys_nr> ocf:heartbeat:SAPInstance \
InstanceName="<SID>_ASCS<ascs_sys_nr>_<ascs_virt_hostname>" \
START_PROFILE="/usr/sap/<SID>/SYS/profile/<SID>_ASCS<ascs_sys_nr>_<ascs_virt_hostname>" \
AUTOMATIC_RECOVER="false" \
MINIMAL_PROBE="true" \
op start interval="0" timeout="600" \
op stop interval="0" timeout="240" \
op monitor interval="11" timeout="60" on-fail="restart" \
meta resource-stickiness="5000" \
meta failure-timeout="60" \
meta migration-threshold="1" \
meta priority="10"
```
Create an **ERS** SAP instance resource:  

```
# pcs resource create rsc_sap_<SID>_ERS<ers_sys_nr> ocf:heartbeat:SAPInstance \
InstanceName="<SID>_ERS<ers_sys_nr>_<ers_virt_hostname>" \
START_PROFILE="/usr/sap/<SID>/SYS/profile/<SID>_ERS<ers_sys_nr>_<ers_virt_hostname>" \
AUTOMATIC_RECOVER="false" \
MINIMAL_PROBE="true" \
IS_ERS="true" \
op start interval="0" timeout="240" \
op stop interval="0" timeout="240" \
op monitor interval="11" timeout="60" on-fail="restart" \
meta priority="1000"
```
+  *Example using values from [Parameter Reference](sap-nw-pacemaker-rhel-parameters.md) *:

  ```
  # pcs resource create rsc_sap_RHX_ASCS00 ocf:heartbeat:SAPInstance \
  InstanceName="RHX_ASCS00_rhxascs" \
  START_PROFILE="/usr/sap/RHX/SYS/profile/RHX_ASCS00_rhxascs" \
  AUTOMATIC_RECOVER="false" \
  MINIMAL_PROBE="true" \
  op start interval="0" timeout="600" \
  op stop interval="0" timeout="240" \
  op monitor interval="11" timeout="60" on-fail="restart" \
  meta resource-stickiness="5000" \
  meta failure-timeout="60" \
  meta migration-threshold="1" \
  meta priority="10"
  
  # pcs resource create rsc_sap_RHX_ERS10 ocf:heartbeat:SAPInstance \
  InstanceName="RHX_ERS10_rhxers" \
  START_PROFILE="/usr/sap/RHX/SYS/profile/RHX_ERS10_rhxers" \
  AUTOMATIC_RECOVER="false" \
  MINIMAL_PROBE="true" \
  IS_ERS="true" \
  op start interval="0" timeout="240" \
  op stop interval="0" timeout="240" \
  op monitor interval="11" timeout="60" on-fail="restart" \
  meta priority="1000"
  ```
Create an **ASCS** SAP instance resource:  

```
# pcs resource create rsc_sap_<SID>_ASCS<ascs_sys_nr> ocf:heartbeat:SAPInstance \
InstanceName="<SID>_ASCS<ascs_sys_nr>_<ascs_virt_hostname>" \
START_PROFILE="/usr/sap/<SID>/SYS/profile/<SID>_ASCS<ascs_sys_nr>_<ascs_virt_hostname>" \
AUTOMATIC_RECOVER="false" \
MINIMAL_PROBE="true" \
op start interval="0" timeout="600" \
op stop interval="0" timeout="240" \
op monitor interval="20" timeout="60" on-fail="restart" \
meta resource-stickiness="5000" \
meta priority="1000"
```
Create an **ERS** SAP instance resource:  

```
# pcs resource create rsc_sap_<SID>_ERS<ers_sys_nr> ocf:heartbeat:SAPInstance \
InstanceName="<SID>_ERS<ers_sys_nr>_<ers_virt_hostname>" \
START_PROFILE="/usr/sap/<SID>/SYS/profile/<SID>_ERS<ers_sys_nr>_<ers_virt_hostname>" \
AUTOMATIC_RECOVER="false" \
IS_ERS="true" \
op start interval="0" timeout="600" \
op stop interval="0" timeout="240" \
op monitor interval="20" timeout="60" on-fail="restart"
```
+  *Example using values from [Parameter Reference](sap-nw-pacemaker-rhel-parameters.md) *:

  ```
  # pcs resource create rsc_sap_RHX_ASCS00 ocf:heartbeat:SAPInstance \
  InstanceName="RHX_ASCS00_rhxascs" \
  START_PROFILE="/usr/sap/RHX/SYS/profile/RHX_ASCS00_rhxascs" \
  AUTOMATIC_RECOVER="false" \
  MINIMAL_PROBE="true" \
  op start interval="0" timeout="600" \
  op stop interval="0" timeout="240" \
  op monitor interval="20" timeout="60" on-fail="restart" \
  meta resource-stickiness="5000" \
  meta priority="1000"
  
  # pcs resource create rsc_sap_RHX_ERS10 ocf:heartbeat:SAPInstance \
  InstanceName="RHX_ERS10_rhxers" \
  START_PROFILE="/usr/sap/RHX/SYS/profile/RHX_ERS10_rhxers" \
  AUTOMATIC_RECOVER="false" \
  IS_ERS="true" \
  op start interval="0" timeout="600" \
  op stop interval="0" timeout="240" \
  op monitor interval="20" timeout="60" on-fail="restart"
  ```

The difference between ENSA1 and ENSA2 is that ENSA2 allows the lock table to be consumed remotely, which means that for ENSA2, ASCS can restart in its current location (assuming the node is still available). This change impacts stickiness, migration and priority parameters. Ensure that you use the right command for your enqueue version.

## Create SAPInstance resources (classic only)


The SAP instance is started and stopped using cluster resources.

**Example**  
Create an **ASCS** SAPInstance resource:  

```
# pcs resource create rsc_sap_<SID>_ASCS<ascs_sys_nr> ocf:heartbeat:SAPInstance \
InstanceName="<SID>_ASCS<ascs_sys_nr>_<ascs_virt_hostname>" \
START_PROFILE="/usr/sap/<SID>/SYS/profile/<SID>_ASCS<ascs_sys_nr>_<ascs_virt_hostname>" \
AUTOMATIC_RECOVER="false" \
op start interval="0" timeout="600" \
op stop interval="0" timeout="240" \
op monitor interval="11" timeout="60" on-fail="restart" \
meta resource-stickiness="5000" \
meta failure-timeout="60" \
meta migration-threshold="1" \
meta priority="10" \
--group "grp_<SID>_ASCS<ascs_sys_nr>"
```
Create an **ERS** SAPInstance resource:  

```
# pcs resource create rsc_sap_<SID>_ERS<ers_sys_nr> ocf:heartbeat:SAPInstance \
InstanceName="<SID>_ERS<ers_sys_nr>_<ers_virt_hostname>" \
START_PROFILE="/usr/sap/<SID>/SYS/profile/<SID>_ERS<ers_sys_nr>_<ers_virt_hostname>" \
AUTOMATIC_RECOVER="false" \
IS_ERS="true" \
op start interval="0" timeout="600" \
op stop interval="0" timeout="240" \
op monitor interval="11" timeout="60" on-fail="restart" \
meta \
priority="1000"
--group "grp_<SID>_ERS<ers_sys_nr>"
```
+  *Example using values from [Parameter Reference](sap-nw-pacemaker-rhel-parameters.md) *:

  ```
  # pcs resource create rsc_sap_RHX_ASCS00 ocf:heartbeat:SAPInstance \
  InstanceName="RHX_ASCS00_rhxascs" \
  START_PROFILE="/usr/sap/RHX/SYS/profile/RHX_ASCS00_rhxascs" \
  AUTOMATIC_RECOVER="false" \
  op start interval="0" timeout="600" \
  op stop interval="0" timeout="240" \
  op monitor interval="11" timeout="60" on-fail="restart" \
  meta resource-stickiness="5000" \
  meta failure-timeout="60" \
  meta migration-threshold="1" \
  meta priority="10"
  
  # pcs resource create rsc_sap_RHX_ERS10 ocf:heartbeat:SAPInstance \
  InstanceName="RHX_ERS10_rhxers" \
  START_PROFILE="/usr/sap/RHX/SYS/profile/RHX_ERS10_rhxers" \
  AUTOMATIC_RECOVER="false" \
  IS_ERS="true" \
  op start interval="0" timeout="600" \
  op stop interval="0" timeout="240" \
  op monitor interval="11" timeout="60" on-fail="restart" \
  meta priority="1000"
  ```
Create an **ASCS** SAPInstance resource:  

```
# pcs resource create rsc_sap_<SID>_ASCS<ascs_sys_nr> ocf:heartbeat:SAPInstance \
InstanceName="<SID>_ASCS<ascs_sys_nr>_<ascs_virt_hostname>" \
START_PROFILE="/usr/sap/<SID>/SYS/profile/<SID>_ASCS<ascs_sys_nr>_<ascs_virt_hostname>" \
AUTOMATIC_RECOVER="false" \
op start interval="0" timeout="600" \
op stop interval="0" timeout="240" \
op monitor interval="11" timeout="60" on-fail="restart" \
meta resource-stickiness="5000" \
meta priority="1000" \
--group "grp_<SID>_ASCS<ascs_sys_nr>"
```
Create an **ERS** SAP instance resource:  

```
# pcs resource create rsc_sap_<SID>_ERS<ers_sys_nr> ocf:heartbeat:SAPInstance \
InstanceName="<SID>_ERS<ers_sys_nr>_<ers_virt_hostname>" \
START_PROFILE="/usr/sap/<SID>/SYS/profile/<SID>_ERS<ers_sys_nr>_<ers_virt_hostname>" \
AUTOMATIC_RECOVER="false" \
IS_ERS="true" \
op start interval="0" timeout="600" \
op stop interval="0" timeout="240" \
op monitor interval="11" timeout="60" on-fail="restart" \
--group "grp_<SID>_ERS<ers_sys_nr>"
```
+  *Example using values from [Parameter Reference](sap-nw-pacemaker-rhel-parameters.md) *:

  ```
  # pcs resource create rsc_sap_RHX_ASCS00 ocf:heartbeat:SAPInstance \
  InstanceName="RHX_ASCS00_rhxascs" \
  START_PROFILE="/usr/sap/RHX/SYS/profile/RHX_ASCS00_rhxascs" \
  AUTOMATIC_RECOVER="false" \
  op start interval="0" timeout="600" \
  op stop interval="0" timeout="240" \
  op monitor interval="11" timeout="60" on-fail="restart" \
  meta resource-stickiness="5000" \
  meta priority="1000"
  
  # pcs resource create rsc_sap_RHX_ERS10 ocf:heartbeat:SAPInstance \
  InstanceName="RHX_ERS10_rhxers" \
  START_PROFILE="/usr/sap/RHX/SYS/profile/RHX_ERS10_rhxers" \
  AUTOMATIC_RECOVER="false" \
  IS_ERS="true" \
  op start interval="0" timeout="600" \
  op stop interval="0" timeout="240" \
  op monitor interval="11" timeout="60" on-fail="restart"
  ```

The change between ENSA1 and ENSA2 allows the lock table to be consumed remotely. If the node is still available, ASCS can restart in its current location for ENSA2. This impacts stickiness, migration, and priority parameters. Make sure to use the right command, depending on your enqueue server.

## Review ASCS Resource group and modify stickiness.


A cluster resource group is a set of resources that need to be located together, start sequentially, and stopped in the reverse order.

```
# pcs resource meta grp_<SID>_ASCS<ascs_sys_nr> resource-stickiness=3000
```

In simple-mount architecture, the overlay IP must be available first, then the SAP services are started before the SAP instance can start.

## Create resource constraints


Resource constraints are used to determine where resources run per the conditions. Constraints for SAP NetWeaver ensure that ASCS and ERS are started on separate nodes and locks are preserved in case of failures. The following are the different types of constraints.

### Colocation constraint


The negative score ensures that ASCS and ERS are run on separate nodes, wherever possible.

```
# pcs constraint colocation add grp_<SID>_ERS<ers_sys_nr> with grp_<SID>_ASCS<ascs_sys_nr> score=-5000
```
+  *Example using values from [Parameter Reference](sap-nw-pacemaker-rhel-parameters.md) *:

  ```
  # pcs constraint colocation add grp_RHX_ERS10 with grp_RHX_ASCS00 score=-5000
  ```

### Order constraint


This constraint ensures the ASCS instance is started prior to stopping the ERS instance. This is necessary to consume the lock table.

```
# pcs constraint order start rsc_sap_<SID>_ASCS<ascs_sys_nr> then stop rsc_sap_<SID>_ERS<ers_sys_nr> kind=Optional symmetrical=false
```
+  *Example using values from [Parameter Reference](sap-nw-pacemaker-rhel-parameters.md) *:

  ```
  # pcs constraint order start rsc_sap_RHX_ASCS00 then stop rsc_sap_RHX_ERS10 kind=Optional symmetrical=false
  ```

### Location constraint (ENSA1 only)


This constraint is only required for ENSA1. The lock table can be retrieved remotely for ENSA2, and as a result ASCS doesn’t failover to where ERS is running.

```
# pcs constraint location rsc_sap_<SID>_ASCS<ascs_sys_nr> rule score=2000 runs_ers_<SID> eq 1
```
+  *Example using values from [Parameter Reference](sap-nw-pacemaker-rhel-parameters.md) *:

  ```
  # pcs constraint location rsc_sap_RHX_ASCS00 rule score=2000 runs_ers_RHX eq 1
  ```

## Reset Configuration – Optional


**Important**  
The following instructions help you reset the complete configuration. Run these commands only if you want to start setup from the beginning. You can make minor changes with the crm edit command.

Run the following command to back up the current configuration for reference:

```
# pcs config > /tmp/pcsconfig_backup.txt
```

Run the following command to clear the current configuration:

```
# pcs cluster cib-push --config /dev/null
```

Once the preceding command is executed, it removes all of the cluster resources from Cluster Information Base (CIB). Before starting the resource configuration, run pcs cluster start --all to ensure the cluster is running properly. The restart removes maintenance mode. Reapply maintenance mode before commencing additional configuration and resource setup.

# Operations


This section covers the following topics.

**Topics**
+ [

# Viewing the cluster state
](cluster-state-nw-rhel.md)
+ [

# Performing planned maintenance
](planned-maintenance-nw-rhel.md)
+ [

# Post-failure analysis and reset
](analysis-reset-nw-rhel.md)
+ [

# Alerting and monitoring
](alerting-monitoring-nw-rhel.md)

# Viewing the cluster state


You can view the state of the cluster in two ways - based on your operating system or with a web based console provided by Red Hat.

**Topics**
+ [

## Operating system based
](#os-based-nw-rhel)
+ [

## Red Hat Cockpit
](#rhel-cockpit)

## Operating system based


There are multiple operating system commands that can be run as root or as a user with appropriate permissions. The commands enable you to get an overview of the status of the cluster and its services. See the following commands for more details.

```
# pcs status
```

Sample output:

```
rhxhost01:~ # pcs status
Cluster name: rhx-cluster
Cluster Summary:
  * Stack: corosync
  * Current DC: rhxhost01 (version 2.1.0-8.el8-7c3f660707) - partition with quorum
  * Last updated: Tue Nov  1 13:41:58 2022
  * Last change:  Fri Oct 28 08:55:43 2022 by root via crm_attribute on rhxhost02
  * 2 nodes configured
  * 7 resource instances configured

Node List:
  * Online: [ rhxhost01 rhxhost02 ]

Full List of Resources:
  * Resource Group: grp_RHX_ASCS00:
    * rsc_ip_RHX_ASCS00 (ocf::heartbeat:aws-vpc-move-ip):        Started rhxhost01
    * rsc_sapstart_RHX_ASCS00   (ocf::heartbeat:SAPStartSrv):         Started rhxhost01
    * rsc_sap_RHX_ASCS00        (ocf::heartbeat:SAPInstance):    Started rhxhost01
  * res_AWS_STONITH     (stonith:fence_aws):  Started rhxhost02
  * Resource Group: grp_RHX_ERS10:
    * rsc_ip_RHX_ERS10  (ocf::heartbeat:aws-vpc-move-ip):        Started rhxhost02
    * rsc_sapstart_RHX_ERS10    (ocf::heartbeat:SAPStartSrv):         Started rhxhost02
    * rsc_sap_RHX_ERS10 (ocf::heartbeat:SAPInstance):    Started rhxhost02
```

The following table provides a list of useful commands.


| Command | Description | 
| --- | --- | 
|   `pcs status`   |  Display cluster status on the console  | 
|   `pcs status --full`   |  Display detailed cluster status including inactive resources  | 
|   `pcs status nodes`   |  Display node status and attributes  | 
|   `pcs status resources`   |  Display resource status and fail counts  | 
|   `pcs cluster status`   |  Display cluster daemon status  | 
|   `pcs help`   |  View more options  | 
|   `pcs status --help`   |  View more options  | 

## Red Hat Cockpit


Cockpit is a web-based graphical user interface for managing and monitoring Red Hat Enterprise Linux systems, including pacemaker highly availability clusters. It must be enabled on every node in the cluster, to point your web browser on any node for accessing it. Use the following command to enable Cockpit.

```
# systemctl enable --now cockpit.socket
# systemctl status cockpit.socket
```

Use the following URL to check security groups for access on port 9090 from your administrative host.

```
https://your-server:9090/

e.g https://rhxhost01:9090
```

For more information, see [Configuring and Managing High Availability Clusters](https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/9/html/configuring_and_managing_high_availability_clusters) in the Red Hat Documentation.

# Performing planned maintenance


The cluster connector is designed to integrate the cluster with SAP start framework (`sapstartsrv`), including the rolling kernel switch (RKS) awareness. Stopping and starting the SAP system using `sapcontrol` should not result in any cluster remediation activities as these actions are not interpreted as failures. Validate this scenario when testing your cluster.

There are different options to perform planned maintenance on nodes, resources, and the cluster.

**Topics**
+ [

## Maintenance mode
](#maintenance-mode-nw-rhel)
+ [

## Placing a node in standby mode
](#node-standby-nw-rhel)
+ [

## Moving a resource
](#moving-resource-nw-rhel)

## Maintenance mode


Use maintenance mode if you want to make any changes to the configuration or take control of the resources and nodes in the cluster. In most cases, this is the safest option for administrative tasks.

**Example**  
Use one of the following commands to turn on maintenance mode.  

```
# pcs property set maintenance-mode=true
```

```
# pcs cluster maintenance --all
```
Use one of the following commands to turn off maintenance mode.  

```
# pcs property set maintenance-mode=false
```

```
# pcs cluster maintenance --all --wait=60
```

## Placing a node in standby mode


To perform maintenance on the cluster without system outage, the recommended method for moving active resources is to place the node you want to remove from the cluster in standby mode.

```
# pcs node standby <hostname>
```

The cluster will cleanly relocate resources, and you can perform activities, including reboots on the node in standby mode. When maintenance activities are complete, you can re-introduce the node with the following command.

```
# pcs node unstandby <hostname>
```

## Moving a resource


Moving individual resources is not recommended because of the migration or move constraints that are created to lock the resource in its new location. These can be cleared as described in the info messages, but this introduces an additional setup.

```
<rhxhost01>:~ pcs resource move grp_<RHX>_ASCS<00> <rhxhost02>
Location constraint to move resource 'grp_<RHX>_ASCS<00>' has been created
Run 'pcs resource clear grp_<RHX>_ASCS<00>' to remove this constraint
```

Use the following command once the resources have relocated to their target location.

```
# pcs resource clear grp_RHX_ASCS00
```

# Post-failure analysis and reset


A review must be conducted after each failure to understand the source of failure as well the reaction of the cluster. In most scenarios, the cluster prevents an application outage. However, a manual action is often required to reset the cluster to a protective state for any subsequent failures.

**Topics**
+ [

## Checking the logs
](#checking-logs-nw-rhel)
+ [

## Cleanup pcs status
](#cleanup-crm-nw-rhel)
+ [

## Restart failed nodes or pacemaker
](#restart-nodes-nw-rhel)
+ [

## Further Analysis
](#_further_analysis)

## Checking the logs

+ For troubleshooting cluster issues, use journalctl to examine both pacemaker and corosync logs:

  ```
  # journalctl -u pacemaker -u corosync --since "1 hour ago"
  ```
  + Use `--since` to specify time periods (e.g., "2 hours ago", "today")
  + Add `-f` to follow logs in real-time
  + Combine with grep for specific searches
+ System messages and resource agent activity can be found in `/var/log/messages`.

Application based failures can be investigated in the SAP work directory.

## Cleanup pcs status


If failed actions are reported using the `pcs status` command, and if they have already been investigated, then you can clear the reports with the following command.

```
# pcs resource cleanup <resource> <hostname>
```

## Restart failed nodes or pacemaker


It is recommended that failed (or fenced) nodes are not automatically restarted. It gives operators a chance to investigate the failure, and ensure that the cluster doesn’t make assumptions about the state of resources.

You need to restart the instance or the pacemaker service based on your approach.

## Further Analysis


For cluster-specific issues, use `sosreport` to generate a targeted analysis of cluster components:

```
# sosreport --batch --tmp-dir /tmp
```

For quick analysis of recent events, you can use:

```
# pcs status --full
# journalctl -u pacemaker --since "1 hour ago"
```
+  `sosreport` collects system configuration and diagnostic information
+ For more information, see Red Hat Documentation - [What is sosreport and how to create and retrieve one](https://access.redhat.com/solutions/3592) 

# Alerting and monitoring


This section covers the following topics.

**Topics**
+ [

## Using Amazon CloudWatch Application Insights
](#application-insights-nw-rhel)
+ [

## Using the cluster alert agents
](#cluster-alert-nw-rhel)

## Using Amazon CloudWatch Application Insights


For monitoring and visibility of cluster state and actions, Application Insights includes metrics for monitoring enqueue replication state, cluster metrics, and SAP and high availability checks. Additional metrics, such as EFS and CPU monitoring can also help with root cause analysis.

For more information, see [Get started with Amazon CloudWatch Application Insights](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/appinsights-getting-started.html) and [SAP NetWeaver High Availability on Amazon EC2](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/component-configuration-examples-netweaver-ha.html).

## Using the cluster alert agents


Within the cluster configuration, you can call an external program (an alert agent) to handle alerts. This is a *push* notification. It passes information about the event via environment variables.

The agents can then be configured to send emails, log to a file, update a monitoring system, etc. For example, the following script can be used to access Amazon SNS.

```
#!/bin/sh

# alert_sns.sh
# modified from /usr/share/pacemaker/alerts/alert_smtp.sh.sample

##############################################################################
# SETUP
# * Create an SNS Topic and subscribe email or chatbot
# * Note down the ARN for the SNS topic
# * Give the IAM Role attached to both Instances permission to publish to the SNS Topic
# * Ensure the aws cli is installed
# * Copy this file to /usr/share/pacemaker/alerts/alert_sns.sh or other location on BOTH nodes
# * Ensure the permissions allow for hacluster and root to execute the script
# * Run the following as root (modify file location if necessary and replace SNS ARN):
#
# SLES:
# crm configure alert aws_sns_alert /usr/share/pacemaker/alerts/alert_sns.sh meta timeout=30s timestamp-format="%Y-%m-%d_%H:%M:%S" to <{ arn:aws:sns:region:account-id:myPacemakerAlerts  }>
#
# RHEL:
# pcs alert create id=aws_sns_alert path=/usr/share/pacemaker/alerts/alert_sns.sh meta timeout=30s timestamp-format="%Y-%m-%d_%H:%M:%S"
# pcs alert recipient add aws_sns_alert value=arn:aws:sns:region:account-id:myPacemakerAlerts
##############################################################################

# Additional information to send with the alerts
node_name=`uname -n`
sns_body=`env | grep CRM_alert_`

# Required for SNS
TOKEN=$(/usr/bin/curl --noproxy '*' -s -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")

# Get metadata
REGION=$(/usr/bin/curl --noproxy '*' -w "\n" -s -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/dynamic/instance-identity/document | grep region | awk -F\" '{print $4}')

sns_subscription_arn=${CRM_alert_recipient}

# Format depending on alert type
case ${CRM_alert_kind} in
   node)
     sns_subject="${CRM_alert_timestamp} ${cluster_name}: Node '${CRM_alert_node}' is now '${CRM_alert_desc}'"
   ;;
   fencing)
     sns_subject="${CRM_alert_timestamp} ${cluster_name}: Fencing ${CRM_alert_desc}"
   ;;
   resource)
     if [ ${CRM_alert_interval} = "0" ]; then
         CRM_alert_interval=""
     else
         CRM_alert_interval=" (${CRM_alert_interval})"
     fi
     if [ ${CRM_alert_target_rc} = "0" ]; then
         CRM_alert_target_rc=""
     else
         CRM_alert_target_rc=" (target: ${CRM_alert_target_rc})"
     fi
     case ${CRM_alert_desc} in
         Cancelled)
           ;;
         *)
           sns_subject="${CRM_alert_timestamp}: Resource operation '${CRM_alert_task}${CRM_alert_interval}' for '${CRM_alert_rsc}' on '${CRM_alert_node}': ${CRM_alert_desc}${CRM_alert_target_rc}"
           ;;
     esac
     ;;
   attribute)
     sns_subject="${CRM_alert_timestamp}: The '${CRM_alert_attribute_name}' attribute of the '${CRM_alert_node}' node was updated in '${CRM_alert_attribute_value}'"
     ;;
   *)
     sns_subject="${CRM_alert_timestamp}: Unhandled $CRM_alert_kind alert"
     ;;
esac

# Use this information to send the email.
aws sns publish --topic-arn "${sns_subscription_arn}" --subject "${sns_subject}" --message "${sns_body}" --region ${REGION}
```

# Testing


We recommend scheduling regular fault scenario recovery testing at least annually, and as part of the operating system or SAP kernel updates that may impact operations. For more details on best practices for regular testing, see SAP Lens – [Best Practice 4.3 – Regularly test business continuity plans and fault recovery](https://docs.aws.amazon.com/wellarchitected/latest/sap-lens/best-practice-4-3.html).

The tests described here simulate failures. These can help you understand the behavior and operational requirements of your cluster.

In addition to checking the state of cluster resources, ensure that the service you are trying to protect is in the required state. Can you still connect to SAP? Are locks still available in SM12?

Define the recovery time to ensure that it aligns with your business objectives. Record recovery actions in runbooks.

**Topics**
+ [

## Test 1: Stop ASCS on the primary node using `sapcontrol`
](#test1-nw-rhel)
+ [

## Test 2: Stop ERS on the secondary node using `sapcontrol`
](#test2-nw-rhel)
+ [

## Test 3: Kill the message server process on the primary node
](#test3-nw-rhel)
+ [

## Test 4: Kill the enqueue server process on the primary node
](#test4-nw-rhel)
+ [

## Test 5: Kill the ER process
](#test5-nw-rhel)
+ [

## Test 6: Simulate hardware failure of an individual node, and repeat for other node
](#test6-nw-rhel)
+ [

## Test 7: Simulate a network failure
](#test7-nw-rhel)
+ [

## Test 8: Simulate an NFS failure
](#test8-nw-rhel)
+ [

## Test 9: Accidental shutdown
](#test9-nw-rhel)

## Test 1: Stop ASCS on the primary node using `sapcontrol`


 **Notes** – Ensure that the connector has been installed and the parameters have been updated.

 **Simulate failure** – On `rhxhost01` as `rhxadm`:

```
sapcontrol -nr <00> -function Stop
```

 **Expected behavior** – ASCS should be stopped on `rhxhost01`, and the cluster should not perform any activity.

 **Recovery action** – Start ASCS manually.

## Test 2: Stop ERS on the secondary node using `sapcontrol`


 **Notes** – Ensure that the connector has been installed, and the parameters are updated.

 **Simulate failure** – On `rhxhost02` as `rhxadm`:

```
sapcontrol -nr <10> -function Stop
```

 **Expected behavior** – ERS should be stopped on `rhxhost02`, and the cluster should not perform any activity.

 **Recovery action** – Start ERS manually.

## Test 3: Kill the message server process on the primary node


 **Simulate failure** – On `rhxhost01` as `rhxadm`:

```
kill -9 $(pgrep -f "ms.sap<RHX>_ASCS<00>")
```

 **Expected behavior** – The message server should immediately respawn based on the Restart parameter.

 **Recovery action** – No action required.

## Test 4: Kill the enqueue server process on the primary node


 **Notes** – Check that locks have persisted, and review the location constraints that only exist for ENSA1.

 **Simulate failure** – On `rhxhost01` as `rhxadm`:

```
kill -9 $(pgrep -f "[en|enq].sap<RHX>_ASCS<00>")
```

 **Expected behavior** – ENSA2: Cluster will restart the ENQ process and retrieve the locks remotely. ENSA1: Cluster will failover the ASCS resource to the node where the ERS is running.

 **Recovery action** – No action required.

## Test 5: Kill the ER process


 **Simulate failure** – On `rhxhost02` as `rhxadm`:

```
kill -9 $(pgrep -f "[er|enqr].sap<RHX>_ERS<10>")
```

 **Expected behavior** – Cluster will restart the ERS on the same node.

 **Recovery action** – No action required.

## Test 6: Simulate hardware failure of an individual node, and repeat for other node


 **Notes** – To simulate a system crash, you must first ensure that `/proc/sys/kernel/sysrq` is set to 1.

 **Simulate failure** – On both nodes as root:

```
echo 'b' > /proc/sysrq-trigger
```

 **Expected behavior** – The node which has been killed fails. The cluster will move the resources (ASCS/ERS) which were running on the failed node to the surviving node.

 **Recovery action** – Start the EC2 node and pacemaker service. The cluster will detect that the node is online and move the ERS resource so that the ASCS and ERS are not running on the same node (colocation constraint).

## Test 7: Simulate a network failure


 **Notes** – See the following list.
+ Iptables must be installed.
+ Use a subnet in this command because of the secondary ring.
+ Check for any existing iptables rules as iptables -F will flush all rules.
+ Review pcmk\$1delay and priority parameters if you see neither node survives the fence race.

 **Simulate failure** – On either node as root:

```
iptables -A INPUT -s <CIDR_of_other_subnet> -j DROP; iptables -A OUTPUT -d <CIDR_of_other_subnet> -j DROP
```

 **Expected behavior** – The cluster detects the network failure, and fences one of the nodes to avoid a split-brain situation.

 **Recovery action** – If the node where the command was run survives, execute iptables -F to clear the network failure. Start the EC2 node and pacemaker service. The cluster will detect that the node is online and move the ERS resource so that the ASCS and ERS are not running on the same node (colocation constraint).

## Test 8: Simulate an NFS failure


 **Notes** – See the following list.
+ Iptables must be installed.
+ Check for any existing iptables rules as iptables -F will flush all rules.
+ Although rare, this is an important scenario to test. Depending on the activity it may take some time (10 min \$1) to notice that I/O to EFS is not occurring and fail either the Filesystem or SAP resources.

 **Simulate failure** – On either node as root:

```
iptables -A OUTPUT -p tcp --dport 2049 -m state --state NEW,ESTABLISHED,RELATED -j DROP; iptables -A INPUT -p tcp --sport 2049 -m state --state ESTABLISHED -j DROP
```

 **Expected behavior** – The cluster detects that NFS is not available, and the SAP Instance resource agent will fail and move to the FAILED state. Because of the option "on-fail=restart" configuration, the cluster will try a local restart before eventually fencing the node and failing over.

 **Recovery action** – If the node where the command was run survives, execute iptables -F to clear the network failure. Start the EC2 node and pacemaker service. The cluster will detect that the node is online and move the ERS resource so that the ASCS and ERS are not running on the same node (colocation constraint).

## Test 9: Accidental shutdown


 **Notes** – See the following list.
+ Avoid shutdowns without cluster awareness.
+ We recommend the use of systemd to ensure predictable behaviour.
+ Ensure the resource dependencies are in place.

 **Simulate failure** – Login to AWS Management Console, and stop the instance or issue a shutdown command.

 **Expected behavior** – The node which has been shut down fails. The cluster will move the resources (ASCS/ERS) which were running on the failed node to the surviving node. If systemd and resource dependencies are not configured, you may notice that while the EC2 instance is shutting down gracefully, the cluster will detect an unclean stop of cluster services on the node and will fence the EC2 instance being shut down.

 **Recovery action** – Start the EC2 node and pacemaker service. The cluster will detect that the node is online, and move the ERS resource so that the ASCS and ERS are not running on the same node (colocation constraint).