# SAP HANA High Availability on Red Hat Enterprise Linux (RHEL) using Pacemaker
<a name="sap-hana-on-aws-rhel-pacemaker"></a>

**Topics**
+ [Planning](sap-hana-pacemaker-rhel-planning.md)
+ [Prerequisites](sap-hana-pacemaker-rhel-prerequisites.md)
+ [SAP HANA and Cluster Setup](sap-hana-pacemaker-rhel-deployment-cluster.md)
+ [Operations](sap-hana-pacemaker-rhel-operations.md)
+ [Testing](sap-hana-pacemaker-rhel-testing.md)

# Planning
<a name="sap-hana-pacemaker-rhel-planning"></a>

Review the following prerequisites carefully before beginning your high availability cluster deployment, ensuring all infrastructure, operating system, and access requirements are met. Familiarize yourself with linked references, supported configurations and the core concepts which are used in this solution.

**Topics**
+ [Setup Overview](sap-hana-pacemaker-rhel-setup-overview.md)
+ [Vendor Support](sap-hana-pacemaker-rhel-references.md)
+ [Concepts](sap-hana-pacemaker-rhel-concepts.md)
+ [Automated Deployment](sap-hana-pacemaker-rhel-automation.md)
+ [Parameter Reference](sap-hana-pacemaker-rhel-parameters.md)
+ [Architecture Diagrams](sap-hana-pacemaker-rhel-arch-diagrams.md)

# Setup Overview
<a name="sap-hana-pacemaker-rhel-setup-overview"></a>

## Deployed Cluster Infrastructure
<a name="_deployed_cluster_infrastructure"></a>

Ensure that your AWS networking requirements and Amazon EC2 instances where SAP workloads are installed, are correctly configured for SAP.

The following SAP HANA cluster specific requirements must be met:
+ Two cluster nodes created in private subnets in separate Availability Zones within the same Amazon VPC and AWS Region.
+ Access to the route table(s) that are associated with the chosen subnets. For more information, see [Overlay IP](sap-hana-pacemaker-rhel-concepts.md#overlay-ip-rhel).
+ Targeted Amazon EC2 instances must have connectivity to the Amazon EC2 endpoint via internet or an Amazon VPC endpoint.

## Supported Operating System
<a name="_supported_operating_system"></a>

Protecting the SAP HANA Database with a pacemaker cluster requires packages from Red Hat, including cluster resource agents for SAP and AWS that are not available in standard repositories.

For deploying SAP HANA on Red Hat, either "RHEL for SAP Solutions" (BYOS) or "RHEL for SAP with High Availability and Update Services" (PAYG) are required.

## Required Access for Setup
<a name="_required_access_for_setup"></a>

The following access is required for setting up the cluster:

An IAM user with the following privileges:
+ Modify Amazon VPC route tables
+ Modify Amazon EC2 instance properties
+ Create IAM policies and roles
+ Create Amazon EFS file systems

Additional required access:
+ Root access to the operating system of both cluster nodes
+ SAP HANA administrative user access – <sid>adm
+ SAP HANA SystemDB Administrative access for changing configuration and backup administration.

**Example**  
These access requirements are specific to the cluster setup process and can be restricted for ongoing cluster operations and maintenance.

## Reliability Requirements Defined
<a name="_reliability_requirements_defined"></a>

The SAP Lens of the Well-Architected framework, in particular the Reliability pillar, can be used to understand the reliability requirements for your SAP workload.

The SAP HANA application is a single point of failure in a highly available SAP architecture. The impact of an outage of this component must be evaluated against factors, such as, recovery point objective (RPO), recovery time objective (RTO), cost and operation complexity. For more information, see [Reliability in SAP Lens - AWS Well-Architected Framework](https://docs.aws.amazon.com/wellarchitected/latest/sap-lens/reliability.html).

# Vendor Support
<a name="sap-hana-pacemaker-rhel-references"></a>

## SAP and Red Hat References
<a name="_sap_and_red_hat_references"></a>

In addition to this guide, see the following references for more details:
+ Red Hat Documentation: [Automating SAP HANA Scale-Up System Replication using the RHEL HA Add-On - Red Hat Enterprise Linux for SAP Solutions 9](https://docs.redhat.com/en/documentation/red_hat_enterprise_linux_for_sap_solutions/9/html/automating_sap_hana_scale-up_system_replication_using_the_rhel_ha_add-on/index) 
+ Red Hat Documentation: [Deploying SAP HANA Scale-Up System Replication High Availability - Advanced Next Generation Interface](https://docs.redhat.com/en/documentation/red_hat_enterprise_linux_for_sap_solutions/9/html/deploying_sap_hana_scale-up_system_replication_high_availability/index) 
+ Red Hat Documentation: [Automating SAP HANA Scale-Out System Replication using the RHEL HA Add-On - Red Hat Enterprise Linux for SAP Solutions 9](https://docs.redhat.com/en/documentation/red_hat_enterprise_linux_for_sap_solutions/9/html/automating_sap_hana_scale-out_system_replication_using_the_rhel_ha_add-on) 
+ SAP Note: [1656099 - SAP Applications on AWS: Supported DB/OS and Amazon EC2 products](https://me.sap.com/notes/1656099) 
+ SAP Note: [2777782 - SAP HANA DB: Recommended OS Settings for RHEL 8](https://me.sap.com/notes/2777782) 
+ SAP Note: [3108302 - SAP HANA DB: Recommended OS Settings for RHEL 9](https://me.sap.com/notes/3108302) 

**Note**  
SAP portal access is required to access SAP Notes.

## Deployment Guidance
<a name="deployments-rhel"></a>

 AWS works in collaboration with Red Hat to support SAP HANA deployments on AWS. AWS provides detailed guidance on configuring EC2 instances and AWS-specific resources to meet SAP HANA requirements. While we strive to consolidate documentation to simplify the user experience, the underlying software components and resources owned by Pacemaker remain under the purview of the software vendor for development and support.


| SAP HANA Deployment Type | Support Status | Notes |  AWS Configuration Patterns | 
| --- | --- | --- | --- | 
|  SAP HANA Scale-Up Standard  |   AWS Documented & Supported  |  Covered in AWS SAP HANA guides  |  SAPHANAScaleUp-Classic, SAPHANAScaleUp-ANGI  | 
|  SAP HANA Scale-Up Secondary Read-Enabled  |  Vendor Documented & Supported  |  Follows SAP documentation  |  | 
|  SAP HANA Scale-Up Multi-Tier Replication  |  Vendor Documented & Supported  |  Follows SAP documentation  |  | 
|  SAP HANA Scale-Up Multi-Target Replication  |  Vendor Documented & Supported  |  Follows SAP documentation  |  | 
|  SAP HANA Scale-Out Standard  |   AWS Documented & Supported  |  Covered in AWS SAP HANA guides  |  SAPHANAScaleOut-Classic, SAPHANAScaleUp-ANGI  | 
|  SAP HANA Scale-Out Secondary Read-Enabled  |  Vendor Documented & Supported  |  Follows SAP documentation  |  | 
|  SAP HANA Scale-Out Multi-Tier Replication  |  Vendor Documented & Supported  |  Follows SAP documentation  |  | 
|  SAP HANA Scale-Out Multi-Target Replication  |  Vendor Documented & Supported  |  Follows SAP documentation  |  | 

**Note**  
 AWS configuration patterns represent standardized deployment templates that have been validated for specific use cases. In the documentation we will highlight where instructions deviate according to the configuration pattern.

**What is Angi?**  
SAPHanaSR-angi (SAP HANA SR - Advanced Next Generation Interface) is the latest unified high availability solution for managing SAP HANA System Replication in Pacemaker clusters, supported on RHEL 9.6 and newer. The solution consolidates the management of both scale-up and scale-out deployments into a single package and introduces technical improvements such as faster takeover times during filesystem failures, unresponsive HANA instances, and node failures in scale-out configurations.

This document covers new implementations using SAPHanaSR-angi. For migrations from existing SAPHanaSR or SAPHanaSR-ScaleOut installations to SAPHanaSR-angi, refer to the Red Hat documentation.

# Concepts
<a name="sap-hana-pacemaker-rhel-concepts"></a>

## SAP – SAP HANA and Hana System Replication
<a name="_sap_sap_hana_and_hana_system_replication"></a>

SAP HANA is an in-memory, column-oriented, relational database management system developed by SAP. It uses HANA System Replication (HSR) to replicate data and changes from a primary system to one or more secondary systems. In scale-out deployments, this replication occurs between corresponding nodes across the primary and secondary systems, with each service having its counterpart in the secondary system. HSR ensures changes are continuously replicated to minimize the Recovery Point Objective (RPO). While takeovers can be manually triggered using HANA tooling, the addition of a Pacemaker cluster automates the failover process through monitoring, orchestration, and integration with resource agents for hardware connectivity and management.

## AWS – Availability Zones
<a name="shared_aws_availability_zones"></a>

An Availability Zone is one or more discreet data centers with redundant power, networking, and connectivity in an AWS Region. For more information, see Regions and Availability Zones.

For mission critical deployments of SAP on AWS where the goal is to minimise the recovery time objective (RTO), we suggest distributing single points of failure across Availability Zones. Compared with single instance or single Availability Zone deployments, this increases resilience and isolation against a broad range of failure scenarios and issues, including natural disasters.

Each Availability Zone is physically separated by a meaningful distance (many kilometers) from another Availability Zone. All Availability Zones in an AWS Region re interconnected with high-bandwidth, low-latency network, over fully redundant, dedicated metro fiber. This enables synchronous replication. All traffic between Availability Zones is encrypted.

## AWS – Overlay IP
<a name="overlay-ip-rhel"></a>

An Overlay IP enables a connection to the application, regardless of which Availability Zone (and subnet) contains the active primary node.

When deploying an Amazon EC2 instance in AWS, IP addresses are allocated from the CIDR range of the allocated subnet. The subnet cannot span across multiple Availability Zones, and therefore the subnet IP addresses may be unavailable after faults, including network connectivity or hardware issues which require a failover to the replication target in a different Availability Zone.

To address this, we suggest that you configure an overlay IP, and use this in the connection parameters for the application. This IP address is a non-overlapping RFC1918 private IP address from outside of VPC CIDR block and is configured as an entry in the route table or tables. The route directs the connection to the active node and is updated during a failover by the cluster software.

You can select any one of the following RFC1918 private IP addresses for your overlay IP address:
+ 10.0.0.0 – 10.255.255.255 (10/8 prefix)
+ 172.16.0.0 – 172.31.255.255 (172.16/12 prefix)
+ 192.168.0.0 – 192.168.255.255 (192.168/16 prefix)

If, for example, you use the 10/8 prefix in your SAP VPC, selecting a 172 or a 192 IP address may help to differentiate the overlay IP. Consider the use of an IP Address Management (IPAM) tool such as Amazon VPC IP Address Manager to plan, track, and monitor IP addresses for your AWS workloads. For more information, see [What is IPAM?](https://docs.aws.amazon.com/vpc/latest/ipam/what-it-is-ipam.html) 

The overlay IP agent in the cluster can also be configured to update multiple route tables which contain the Overlay IP entry if your subnet association or connectivity requires it.

### Access to the Overlay IP
<a name="_access_to_the_overlay_ip"></a>

The overlay IP is outside of the range of the VPC, and therefore cannot be reached from locations that are not associated with the route table, including on-premises and other VPCs.

Use AWS Transit Gateway as a central hub to facilitate the network connection to an overlay IP address from multiple locations, including Amazon VPCs, other AWS Regions, and on-premises using AWS Direct Connect or AWS Client VPN.

If you do not have AWS Transit Gateway set up as a network transit hub or if it is not available in your preferred AWS Region, you can use a Network Load Balancer to enable network access to an overlay IP.

For more information, see [SAP on AWS High Availability Setup](sap-oip-sap-on-aws-high-availability-setup.md).

## AWS – Shared VPC
<a name="shared_aws_shared_vpc"></a>

An enterprise landing zone setup or security requirements may require the use of a separate cluster account to restrict the route table access required for the Overlay IP to an isolated account. For more information, see [Share your VPC with other accounts](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-sharing.html).

Evaluate the operational impact against your security posture before setting up shared VPC.

## Pacemaker - STONITH Fencing Agent
<a name="fencing-rhel"></a>

In SAP HANA deployments, whether in a scale-up configuration (two-node) or a scale-out configuration (two or more nodes per site), it is crucial that data consistency is maintained by ensuring only the designated primary node or nodes can process write operations at any given time. When a node becomes unresponsive or incommunicable, maintaining data consistency may require that the faulty node is isolated by powering it down before the cluster commences other actions, such as promoting a new primary. This arbitration is the role of the fencing agent.

In a two-node scale-up scenario, fence racing is a critical concern. This occurs when a communication failure causes both nodes to simultaneously attempt to fence (power off) each other, believing the other node has failed. The fencing agent addresses this risk by providing an external witness. In scale-out deployments, while fence racing is less likely due to the presence of multiple nodes that can participate in quorum decisions, proper fencing remains critical for maintaining data consistency across the larger node set.

Red Hat supports several fencing agents, including the one recommended for use with Amazon EC2 Instances (fence\$1aws).

# Automated Deployment
<a name="sap-hana-pacemaker-rhel-automation"></a>

You can set up a cluster manually using the instructions provided here. You can also automate parts of this process to ensure consistent and repeatable deployments.

Use AWS Launch Wizard for SAP to automated deployments of SAP Hana Platform, SAP NetWeaver, SAP S/4 HANA, SAP BW/4HANA, and Solution Manager. Launch Wizard uses AWS CloudFormation templates and advanced scripts to quickly provision the required resources. The automation handles SAP HANA Installation, HANA System Replication and Pacemaker setup, requiring only post-deployment validation and testing. For more information, see [AWS Launch Wizard for SAP](https://docs.aws.amazon.com/launchwizard/latest/userguide/launch-wizard-sap.html).

**Important**  
For reliable cluster operations, thoroughly test your system regardless of setup method. Testing helps identify system anomalies, validate changing requirements, and build operational understanding. See [Testing](sap-hana-pacemaker-rhel-testing.md) for more details.

# Parameter Reference
<a name="sap-hana-pacemaker-rhel-parameters"></a>

The cluster setup uses parameters, including SID and System Number that are unique to your setup. It is useful to predetermine the values with the following examples and guidance.

**Topics**
+ [Global AWS Parameters](#global_shared_aws_parameters)
+ [Amazon EC2 Instance Parameters](#_amazon_ec2_instance_parameters)
+ [SAP and Pacemaker Resource Parameters](#_sap_and_pacemaker_resource_parameters)
+ [Red Hat Cluster Parameters](#_red_hat_cluster_parameters)

## Global AWS Parameters
<a name="global_shared_aws_parameters"></a>


| Name | Parameter | Example | 
| --- | --- | --- | 
|   AWS account ID  |   `<account_id>`   |   `123456789100`   | 
|   AWS Region  |   `<region>`   |   `us-east-1`   | 
+  AWS account – For more details, see [Your AWS account ID and its alias](https://docs.aws.amazon.com/IAM/latest/UserGuide/console-account-alias.html).
+  AWS Region – For more details, see [Describe your Regions](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html#concepts-availability-zones).

## Amazon EC2 Instance Parameters
<a name="_amazon_ec2_instance_parameters"></a>


| Name | Parameter | Primary example | Secondary example | 
| --- | --- | --- | --- | 
|  Amazon EC2 instance ID  |   `<instance_id_x>`   |   `i-xxxxinstidforhost1`   |   `i-xxxxinstidforhost2`   | 
|  Hostname  |   `<hostname_x>`   |   `hanahost01`   |   `hanahost02`   | 
|  Host IP  |   `<host_ip_x>`   |   `10.1.20.1`   |   `10.2.20.1`   | 
|  Host additional IP  |   `<host_additional_ip_x>`   |   `10.1.20.2`   |   `10.2.20.2`   | 
|  Configured subnet  |   `<subnet_id>`   |   `subnet-xxxxxxxxxxsubnet1`   |   `subnet-xxxxxxxxxxsubnet2`   | 
+ Hostnames must comply with SAP requirements outlined in [SAP Note 611361 - Hostnames of SAP ABAP Platform servers](https://me.sap.com/notes/611361) (requires SAP portal access).
+ Run the following command on your instances to retrieve the hostname:

  ```
  $ hostname
  ```
+ Amazon EC2 instance ID – run the following command (IMDSv2 compatible) on your instances to retrieve instance metadata:

  ```
  $ /usr/bin/curl --noproxy '*' -w "\n" -s -H "X-aws-ec2-metadata-token: $(curl --noproxy '*' -s -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")" http://169.254.169.254/latest/meta-data/instance-id
  ```

  For more details, see [Retrieve instance metadata](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html) and [Instance identity documents](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-identity-documents.html).

 **For Scale-out Deployments** 


| Role | Primary Coordinator | Primary Worker | Primary Worker | Secondary Coordinator | Secondary Worker | Secondary Worker | Majority Maker | 
| --- | --- | --- | --- | --- | --- | --- | --- | 
|  Hostname  |   `hanahost01`   |   `hanahostworker01a`   |   `hanahostworker01b`   |   `hanahost02`   |   `hanahostworker02a`   |   `hanahostworker02b`   |   `hanamm`   | 
|  Subnet  |   `subnet-xxxxxxxxxxsubnet1`   |   `subnet-xxxxxxxxxxsubnet1`   |   `subnet-xxxxxxxxxxsubnet1`   |   `subnet-xxxxxxxxxxsubnet2`   |   `subnet-xxxxxxxxxxsubnet2`   |   `subnet-xxxxxxxxxxsubnet2`   |   `subnet-xxxxxxxxxxsubnet3`   | 
+ Example for a 6 node cluster with a majority maker
+ The majority maker can use minimal resources as it only provides cluster quorum functionality

## SAP and Pacemaker Resource Parameters
<a name="_sap_and_pacemaker_resource_parameters"></a>


| Name | Parameter | Example | 
| --- | --- | --- | 
|  SAP HANA SID  |   `<SID>` or `<sid>`   |   `HDB`   | 
|  SAP HANA System Number  |   `<hana_sys_nr>`   |   `00`   | 
|  SAP HANA Virtual Hostname  |   `<hana_virt_hostname>`   |   `hanahdb`   | 
|  SAP HANA Overlay IP  |   `<hana_overlayip>`   |   `172.16.52.1`   | 
|  SAP HANA Read Enabled Overlay IP (optional)  |   `<readenabled_overlayip>`   |   `172.16.52.2`   | 
|  VPC Route Tables  |   `<routetable_id>`   |   `rtb-xxxxxroutetable1`   | 
+ SAP details – SAP parameters, including SID and instance number must follow the guidance and limitations of SAP and Software Provisioning Manager. Refer to [SAP Note 1979280 - Reserved SAP System Identifiers (SAPSID) with Software Provisioning Manager](https://me.sap.com/notes/1979280) for more details.
+ Post-installation, use the following command to find the details of the instances running on a host:

  ```
  $ sudo /usr/sap/hostctrl/exe/saphostctrl -function ListInstances
  ```
+ Overlay IP – This value is defined by you. For more information, see [Overlay IP](sap-hana-pacemaker-rhel-concepts.md#overlay-ip-rhel).

## Red Hat Cluster Parameters
<a name="_red_hat_cluster_parameters"></a>


| Name | Parameter | Example | 
| --- | --- | --- | 
|  Cluster user  |   `<cluster_user>`   |   `hacluster`   | 
|  Cluster password  |   `<cluster_password>`   |  | 
|  Cluster name  |   `<cluster_name>`   |   `myCluster`   | 
|   AWS CLI cluster profile  |   `<cli_cluster_profile>`   |   `cluster`   | 
+ Cluster user – Installing cluster packages will create the user hacluster, set a password to this account to ensure that the cluster can perform the tasks which do not require root access.

# Architecture Diagrams
<a name="sap-hana-pacemaker-rhel-arch-diagrams"></a>

## Pacemaker - Scale-Up Architecture
<a name="_pacemaker_scale_up_architecture"></a>

![\[SAP Hana Pacemaker Red Hat Enterprise Linux Scale-Up\]](http://docs.aws.amazon.com/sap/latest/sap-hana/images/sap-hana-pacemaker-rhel-scaleup.png)


# Prerequisites
<a name="sap-hana-pacemaker-rhel-prerequisites"></a>

**Topics**
+ [AWS Infrastructure Setup](sap-hana-pacemaker-rhel-infra-setup.md)
+ [EC2 Instance Configuration](sap-hana-pacemaker-rhel-ec2-configuration.md)
+ [Operating System Requirements](sap-hana-pacemaker-rhel-os-settings.md)

# AWS Infrastructure Setup
<a name="sap-hana-pacemaker-rhel-infra-setup"></a>

This section covers the one-time setup tasks required to prepare your AWS environment for the cluster deployment:

**Topics**
+ [Create IAM Roles and Policies for Pacemaker](#iam_roles_rhel)
+ [Modify Security Groups for Cluster Communication](#sg-rhel)
+ [Add VPC Route Table Entries for Overlay IPs](#rt-rhel)

## Create IAM Roles and Policies for Pacemaker
<a name="iam_roles_rhel"></a>

In addition to the permissions required for standard SAP operations, two IAM policies are required for the cluster to control AWS resources. These policies must be assigned to your Amazon EC2 instance using an IAM role. This enables Amazon EC2 instance, and therefore the cluster to call AWS services.

**Note**  
Create policies with least-privilege permissions, granting access to only the specific resources that are required within the cluster. For multiple clusters, you may need to create multiple policies.

For more information, see [IAM roles for Amazon EC2](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html#ec2-instance-profile).

### STONITH Policy
<a name="stonith_policy"></a>

The Red Hat STONITH resource agent (`fence_aws`) requires permission to start and stop both the nodes of the cluster. Create a policy as shown in the following example. Attach this policy to the IAM role assigned to both Amazon EC2 instances in the cluster.

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeInstances",
        "ec2:DescribeTags"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "ec2:StartInstances",
        "ec2:StopInstances"
      ],
      "Resource": [
        "arn:aws:ec2:us-east-1:123456789012:instance/arn:aws:ec2:us-east-1:123456789012:instance/i-1234567890abcdef0",
        "arn:aws:ec2:us-east-1:123456789012:instance/arn:aws:ec2:us-east-1:123456789012:instance/i-1234567890abcdef0"
      ]
    }
  ]
}
```

### AWS Overlay IP Policy
<a name="overlay_policy"></a>

The Red Hat Overlay IP resource agent (`aws-vpc-move-ip`) requires permission to modify a routing entry in route tables. Create a policy as shown in the following example. Attach this policy to the IAM role assigned to both Amazon EC2 instances in the cluster.

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "ec2:ReplaceRoute",
            "Resource": [
                 "arn:aws:ec2:us-east-1:123456789012:route-table/rtb-0123456789abcdef0",
                 "arn:aws:ec2:us-east-1:123456789012:route-table/rtb-0123456789abcdef0"
                        ]
        },
        {
            "Effect": "Allow",
            "Action": "ec2:DescribeRouteTables",
            "Resource": "*"
        }
    ]
}
```

### Shared VPC (optional)
<a name="_shared_vpc_optional"></a>

**Note**  
The following directions are only required for setups which include a Shared VPC.

Amazon VPC sharing enables you to share subnets with other AWS accounts within the same AWS Organizations. Amazon EC2 instances can be deployed using the subnets of the shared Amazon VPC.

In the pacemaker cluster, the aws-vpc-move-ip resource agent has been enhanced to support a shared VPC setup while maintaining backward compatibility with previous existing features.

The following checks and changes are required. We refer to the AWS account that owns Amazon VPC as the sharing VPC account, and to the consumer account where the cluster nodes are going to be deployed as the cluster account.

**IAM Roles and Policies**  
Using the Overlay IP agent with a shared Amazon VPC requires a different set of IAM permissions to be granted on both AWS accounts (sharing VPC account and cluster account).

**Sharing VPC Account**  
In sharing VPC account, create an IAM role to delegate permissions to the EC2 instances that will be part of the cluster. During the IAM Role creation, select "Another AWS account" as the type of trusted entity, and enter the AWS account ID where the EC2 instances will be deployed/running from.

After the IAM role has been created, create the following IAM policy on the sharing VPC account, and attach it to an IAM role. Add or remove route table entries as needed.

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "VisualEditor0",
      "Effect": "Allow",
      "Action": "ec2:ReplaceRoute",
      "Resource": [
        "arn:aws:ec2:us-east-1:123456789012:route-table/rtb-0123456789abcdef0",
        "arn:aws:ec2:us-east-1:123456789012:route-table/rtb-0123456789abcdef0"
      ]
    },
    {
      "Sid": "VisualEditor1",
      "Effect": "Allow",
      "Action": "ec2:DescribeRouteTables",
      "Resource": "*"
    }
  ]
}
```

Next, edit move to the "Trust relationships" tab in the IAM role, and ensure that the AWS account you entered while creating the role has been correctly added.

In cluster account, create the following IAM policy, and attach it to an IAM role. This is the IAM Role that is going to be attached to the EC2 instances.

 **STS Policy** 

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "VisualEditor0",
      "Effect": "Allow",
      "Action": "sts:AssumeRole",
      "Resource": "arn:aws:iam::123456789012:role/sharing-vpc-account-cluster-role"
    }
  ]
}
```

 **STONITH Policy** 

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "VisualEditor0",
      "Effect": "Allow",
      "Action": [
        "ec2:StartInstances",
        "ec2:StopInstances"
      ],
      "Resource": [
        "arn:aws:ec2:us-east-1:123456789012:instance/arn:aws:ec2:us-east-1:123456789012:instance/i-1234567890abcdef0",
        "arn:aws:ec2:us-east-1:123456789012:instance/arn:aws:ec2:us-east-1:123456789012:instance/i-1234567890abcdef0"
      ]
    },
    {
      "Sid": "VisualEditor1",
      "Effect": "Allow",
      "Action": "ec2:DescribeInstances",
      "Resource": "*"
    }
  ]
}
```

## Modify Security Groups for Cluster Communication
<a name="sg-rhel"></a>

A security group controls the traffic that is allowed to reach and leave the resources that it is associated with. For more information, see [Control traffic to your AWS resources using security groups](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-security-groups.html).

In addition to the standard ports required to access SAP and administrative functions, the following rules must be applied to the security groups assigned to all Amazon EC2 instances in the cluster.


| Source | Protocol | Port range | Description | 
| --- | --- | --- | --- | 
|  The security group ID (its own resource ID)  |  UDP  |  5405  |  Allows UDP traffic between cluster resources for corosync communication  | 
+ Note the use of the `UDP` protocol.
+ If you are running a local firewall, such as iptables, ensure that communication on the preceding ports is allowed between two Amazon EC2 instances.

## Add VPC Route Table Entries for Overlay IPs
<a name="rt-rhel"></a>

You need to add initial route table entries for the Overlay IP. For more information on Overlay IP, see [Overlay IP Concept](sap-hana-pacemaker-rhel-concepts.md#overlay-ip-rhel) 

Add entries to the VPC route table or tables associated with the subnets of your Amazon EC2 instance for the cluster. The entries for destination (Overlay IP CIDR) and target (Amazon EC2 instance or ENI) must be added manually for the SAP HANA Primary Database mpde. This ensures that the cluster resource has a route to modify. It also supports the install of SAP using the virtual names associated with the Overlay IP before the configuration of the cluster.

Using either the Amazon VPC console, or an AWS CLI command add a route to the table or tables for the Overlay IP.

------
#### [  AWS Console ]

1. Open the Amazon VPC console at https://console.aws.amazon.com/vpc/.

1. In the navigation pane, choose **Route Tables**, then select the route table associated with your cluster node subnets.

1. Choose **Actions** → **Edit routes**.

1. Choose **Add route** and configure the HANA route:    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sap/latest/sap-hana/sap-hana-pacemaker-rhel-infra-setup.html)

1. (Optional) Add a route for read-enabled access to the secondary:    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sap/latest/sap-hana/sap-hana-pacemaker-rhel-infra-setup.html)

1. Choose **Save changes**.

   Your route table now includes entries for required Overlay IPs, in addition to the standard routes.

------
#### [  AWS CLI ]

The preceding steps can also be performed programmatically. We suggest performing the steps using administrative privileges, instead of instance-based privileges to preserve least privilege. CreateRoute API isn’t necessary for ongoing operations.

For example:

```
$ aws ec2 create-route --route-table-id <routetable_id> --destination-cidr-block <hana_overlayip>/32 --instance-id <instance_id_1>
```

If required for read enabled access

```
$ aws ec2 create-route --route-table-id <routetable_id> --destination-cidr-block <readenabled_overlayip>/32 --instance-id <instance_id_2>
```

------

# EC2 Instance Configuration
<a name="sap-hana-pacemaker-rhel-ec2-configuration"></a>

Amazon EC2 instance settings can be applied using Infrastructure as Code or manually using AWS Command Line Interface or AWS Console. We recommend Infrastructure as Code automation to reduce manual steps, and ensure consistency.

**Topics**
+ [Assign or Review Pacemaker IAM Role](#_assign_or_review_pacemaker_iam_role)
+ [Assign or Review Security Groups](#_assign_or_review_security_groups)
+ [Assign Secondary IP Addresses](#_assign_secondary_ip_addresses)
+ [Disable Source/Destination Check](#source_dest)
+ [Review Stop Protection](#stop_protection)
+ [Review Automatic Recovery](#auto_recovery)

**Important**  
The following configurations must be performed on all cluster nodes. Ensure consistency across nodes to prevent cluster issues.

## Assign or Review Pacemaker IAM Role
<a name="_assign_or_review_pacemaker_iam_role"></a>

The two cluster resource IAM policies must be assigned to an IAM role associated with your Amazon EC2 instance. If an IAM role is not associated to your instance, create a new IAM role for cluster operations.

1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.

1. Select one of your cluster nodes.

1. In the navigation pane, choose **Actions** → **Security** → **Modify IAM role**.

1. Choose the IAM role that contains the policies created in [Create IAM Roles and Policies for Pacemaker](sap-hana-pacemaker-rhel-infra-setup.md#iam_roles_rhel).

1. Choose **Update IAM role**.

1. Repeat these steps for all nodes in the cluster.

## Assign or Review Security Groups
<a name="_assign_or_review_security_groups"></a>

The security group rules created in the AWS [Modify Security Groups for Cluster Communication](sap-hana-pacemaker-rhel-infra-setup.md#sg-rhel) section must be assigned to your Amazon EC2 instances. If a security group is not associated with your instance, or if the required rules are not present in the assigned security group, add the security group or update the rules.

1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.

1. Select one of your cluster nodes.

1. In the **Security** tab, review the security groups, ports, and source of traffic.

1. If required, choose **Actions** → **Security** → **Change security groups**.

1. Under **Associated security groups**, search for and select the required groups.

1. Choose **Save**.

1. Repeat these steps for all nodes in the cluster.

You can verify the security group rules on your instances using the AWS CLI:

```
$ aws ec2 describe-instance-attribute --instance-id <instance_id> --attribute groupSet
```

## Assign Secondary IP Addresses
<a name="_assign_secondary_ip_addresses"></a>

Secondary IP addresses are used to create a redundant communication channel (secondary ring) in corosync for clusters. The cluster nodes can use the secondary ring to communicate in case of underlying network disruptions.

These IPs are only used in cluster configurations. The secondary IPs provide the same fault tolerance as a secondary Elastic Network Interface (ENI). For more information, see [Secondary IP addresses for your EC2 Instance](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-secondary-ip-addresses.html).

You can verify the secondary IP configuration on your instances using the AWS CLI:

```
$ aws ec2 describe-instances --instance-id <instance_id> \
    --query 'Reservations[*].Instances[*].NetworkInterfaces[*].PrivateIpAddresses[*].PrivateIpAddress' \
    --output text
```

Verify that:
+ Each instance returns two IP addresses from the same subnet
+ The primary network interface (eth0) has both IPs assigned
+ The secondary IPs will be used later for ring0\$1addr and ring1\$1addr in corosync.conf

## Disable Source/Destination Check
<a name="source_dest"></a>

Amazon EC2 instances perform source/destination checks by default, requiring that an instance is either the source or the destination of any traffic it sends or receives. In the pacemaker cluster, source/destination check must be disabled on both instances receiving traffic from the Overlay IP.

The following AWS Console or AWS CLI commands can be used to modify the attribute.

------
#### [  AWS Console ]

1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.

1. Select one of your cluster nodes.

1. In the navigation pane, choose **Actions** → **Networking** → **Change source/destination check**.

1. For Source/Destination Checking, choose **Stop** to allow traffic when the source or destination is not the instance itself.

1. Repeat these steps for all nodes in the cluster.

------
#### [  AWS CLI ]

To modify using the AWS CLI (requires appropriate configuration permissions):

```
$ aws ec2 modify-instance-attribute --instance-id <instance_id> --no-source-dest-check
```

Repeat for all nodes in the cluster.

------

To confirm the value of an attribute for a particular instance, use the following command. The value `false` means source/destination checking is disabled

```
$ aws ec2 describe-instance-attribute --instance-id <instance_id> --attribute sourceDestCheck
```

The output

```
{
    "InstanceId": "i-xxxxinstidforhost1",
    "SourceDestCheck": {
        "Value": false
    }
}
```

## Review Stop Protection
<a name="stop_protection"></a>

To ensure that STONITH actions can be executed, you must ensure that stop protection is disabled for Amazon EC2 instances that are part of a pacemaker cluster. If the default settings have been modified, use the following commands for both instances to disable stop protection via AWS CLI.

The following AWS Console or CLI commands can be used to modify the attribute.

------
#### [  AWS Console ]

1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.

1. Select one of your cluster nodes.

1. Choose **Actions** → **Instance settings** → **Change stop protection**.

1. Ensure **Stop protection** is not enabled.

1. Repeat these steps for all nodes in the cluster.

------
#### [  AWS CLI ]

To modify using the AWS CLI (requires appropriate configuration permissions):

```
$ aws ec2 modify-instance-attribute --instance-id <instance_id> --no-disable-api-stop
```

Repeat this command for all nodes in the cluster.

------

To confirm the value of an attribute for a particular instance, use the following command. The value `false` means it is possible to stop the instance using an AWS CLI.

```
$ aws ec2 describe-instance-attribute --instance-id <instance_id> --attribute disableApiStop
```

The output

```
{
    "InstanceId": "i-xxxxinstidforhost1",
    "DisableApiStop": {
        "Value": false
    }
}
```

## Review Automatic Recovery
<a name="auto_recovery"></a>

After a failure, cluster-controlled operations must be resumed in a coordinated way. This helps ensure that the cause of failure is known and addressed, and the status of the cluster is as expected. For example, verifying that there are no pending fencing actions.

The following AWS Console or CLI commands can be used to modify the attribute.

------
#### [  AWS Console ]

1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.

1. Select one of your cluster nodes.

1. Choose **Actions** → **Instance settings** → **Change auto-recovery behavior**.

1. Select **Off** to disable auto-recovery for system status check failures.

1. Repeat these steps for all nodes in the cluster.

------
#### [  AWS CLI ]

To modify auto-recovery settings (requires appropriate configuration permissions):

```
$ aws ec2 modify-instance-maintenance-options --instance-id <instance_id> --auto-recovery disabled
```

Repeat this command for all nodes in the cluster.

------

To confirm the value of an attribute for a particular instance, use the following command. The value `disabled` means autorecovery will not be attempted.

```
$ aws ec2 describe-instances --instance-ids <instance_id> --query 'Reservations[*].Instances[*].MaintenanceOptions.AutoRecovery'
```

The output:

```
[
    [
        "disabled"
    ]
]
```

# Operating System Requirements
<a name="sap-hana-pacemaker-rhel-os-settings"></a>

This section outlines the required operating system configurations for Red Hat Enterprise Linux for SAP (RHEL for SAP) cluster nodes. Note that this is not a comprehensive list of configuration requirements for running SAP HANA on AWS, but rather focuses specifically on cluster management prerequisites.

Consider using configuration management tools or automated deployment scripts to ensure accurate and repeatable setup across your cluster infrastructure.

**Topics**
+ [Root Access](#_root_access)
+ [Install Missing Operating System Packages](#packages)
+ [Update and Check Operating System Versions](#_update_and_check_operating_system_versions)
+ [System Logging](#_system_logging)
+ [Disable NetworkManager Cloud Services](#_disable_networkmanager_cloud_services)
+ [Time Synchronization Services](#_time_synchronization_services)
+ [AWS CLI Profile](#shared_aws_cli_profile)
+ [Pacemaker Proxy Settings (Optional)](#_pacemaker_proxy_settings_optional)
+ [Add Overlay IP for Initial Database Access](#_add_overlay_ip_for_initial_database_access)
+ [Hostname Resolution](#_hostname_resolution)

**Important**  
The following configurations must be performed on all cluster nodes. Ensure consistency across nodes to prevent cluster issues.

## Root Access
<a name="_root_access"></a>

Verify root access on both cluster nodes. The majority of the setup commands in this document are performed with the root user. Assume that commands should be run as root unless there is an explicit call out to choose otherwise.

## Install Missing Operating System Packages
<a name="packages"></a>

This is applicable to all cluster nodes. You must install any missing operating system packages.

The following packages and their dependencies are required for the pacemaker setup. Depending on your baseline image, for example, RHEL for SAP, these packages may already be installed.


| Package | Description | Category | Required | Configuration Pattern | 
| --- | --- | --- | --- | --- | 
|  chrony  |  Time Synchronization  |  System Support  |  Mandatory  |  All  | 
|  pacemaker  |  Cluster Resource Manager  |  Core Cluster  |  Mandatory  |  All  | 
|  corosync  |  Cluster Communication Engine  |  Core Cluster  |  Mandatory  |  All  | 
|  pcs  |  Cluster Management CLI  |  Core Cluster  |  Mandatory  |  All  | 
|  resource-agents  |  Basic Resource Agents  |  Core Cluster  |  Mandatory  |  All  | 
|  resource-agents-cloud  |  Cloud Resource agents including aws-vpc-move-ip  |  Core Cluster  |  Mandatory  |  All  | 
|  fence-agents-aws  |  Fencing Capabilities  |  Core Cluster  |  Mandatory  |  All  | 
|  sap-hana-ha  |  New Generation HANA System Replication Agent  |  SAP HANA HA  |  Mandatory\$1  |  SAPHANAScaleUp-SAPANGI, SAPHANAScaleOut-SAPANGI  | 
|  resource-agents-sap-hana  |  SAP HANA Resource Agents  |  SAP HANA HA  |  Mandatory\$1  |  SAPHANAScaleUp-Classic  | 
|  resource-agents-sap-hana-scaleout  |  SAP HANA Resource Agents  |  SAP HANA HA  |  Mandatory\$1  |  SAPHANAScaleOut-Classic  | 
|  sos  |  System Information Gathering  |  Support Tools  |  Mandatory  |  All  | 
|  sysstat  |  Performance Monitoring Tools  |  Support Tools  |  Mandatory  |  All  | 
|  pcp-system-tools  |  Performance Co-Pilot Tools  |  Monitoring  |  Recommended  |  All  | 

**Note**  
Refer to [Vendor Support of Deployment Types](sap-hana-pacemaker-rhel-references.md#deployments-rhel) for more information on Configuration Patterns. `Mandatory*` indicates that this package is mandatory based on the Configuration Pattern.

```
#!/bin/bash

# Mandatory core packages for SAP HANA HA on AWS
mandatory_packages="pacemaker corosync pcs chrony resource-agents resource-agents-sap-hana resource-agents-cloud fence-agents-aws"

# HANA SR packages - Previous Generation (still in common use)
hanaSR_scaleup="resource-agents-sap-hana"  # For scale-up deployments
hanaSR_scaleout="resource-agents-sap-hana-scaleout"  # For scale-out deployment

# HANA SR packages - New Generation
hanaSR_angi="sap-hana-ha"  # New generation package for both scale-up and scale-out

# Recommended monitoring and support packages
support_packages="pcp-system-tools sos sysstat"

# Note: Choose hanaSR_scaleup/hanaSR_scaleout or hanaSR_angi
# Uncomment the appropriate line based on your deployment:
packages="${mandatory_packages} ${hanaSR_scaleup} ${support_packages}"
#packages="${mandatory_packages} ${hanaSR_scaleout} ${support_packages}"
#packages="${mandatory_packages} ${hanaSR_angi} ${support_packages}"

missingpackages=""

for package in ${packages}; do
    echo "Checking if ${package} is installed..."
    if ! rpm -q ${package} &>/dev/null; then
        echo " ${package} is missing and needs to be installed"
        missingpackages="${missingpackages} ${package}"
    fi
done

if [ -z "$missingpackages" ]; then
    echo "All packages are installed."
else
    echo "Missing mandatory packages: $(echo ${missingpackages} | tr ' ' '\n' | grep -E "^($(echo ${mandatory_packages} | tr ' ' '|'))$")"
    echo "Missing support packages: $(echo ${missingpackages} | tr ' ' '\n' | grep -E "^($(echo ${support_packages} | tr ' ' '|'))$")"
    echo -n "Do you want to install the missing packages (y/n)? "
    read response
    if [ "$response" = "y" ]; then
        dnf install -y $missingpackages
    fi
fi
```

If you encounter issues installing high availability packages, verify repository access:

```
$ sudo dnf repolist
```

For BYOL (Bring Your Own License) systems, also verify subscription status using subscription-manager.

To install or update a package or packages with confirmation, use the following command:

```
$ sudo dnf install <package_name(s)>
```

## Update and Check Operating System Versions
<a name="_update_and_check_operating_system_versions"></a>

You must update and confirm versions across nodes. Apply all the latest patches to your operating system versions. This ensures that bugs are addressed and new features are available.

You can update the patches individually or update all system patches using the `dnf update` command. A clean reboot is recommended prior to setting up a cluster.

```
$ sudo dnf update
$ sudo reboot
```

Compare the operating system package versions on the two cluster nodes and ensure that the versions match on both nodes.

## System Logging
<a name="_system_logging"></a>

Both systemd-journald and rsyslog are suggested for comprehensive logging. Systemd-journald (enabled by default) provides structured, indexed logging with immediate access to events, while rsyslog is maintained for backward compatibility and traditional file-based logging. This dual approach ensures both modern logging capabilities and compatibility with existing log management tools and practices.

 **1. Enable and start rsyslog:** 

```
# systemctl enable --now rsyslog
```

**2. (Optional) Configure persistent logging for systemd-journald:**  
If you are not using a logging agent (like the AWS CloudWatch Unified Agent or Vector) to ship logs to a centralized location, you may want to configure persistent logging to retain logs after system reboots.

```
# mkdir -p /etc/systemd/journald.conf.d
```

Create `/etc/systemd/journald.conf.d/99-logstorage.conf` with:

```
[Journal]
Storage=persistent
```

Persistent logging requires careful storage management. Configure appropriate retention and rotation settings in `journald.conf` to prevent logs from consuming excessive disk space. Review `man journald.conf` for available options such as SystemMaxUse, RuntimeMaxUse, and MaxRetentionSec.

To apply the changes, restart journald:

```
# systemctl restart systemd-journald
```

After enabling persistent storage, only new logs will be stored persistently. Existing logs from the current boot session will remain in volatile storage until the next reboot.

 **3. Verify services are running:** 

```
# systemctl status systemd-journald
# systemctl status rsyslog
```

## Disable NetworkManager Cloud Services
<a name="_disable_networkmanager_cloud_services"></a>

When using Red Hat Enterprise Linux 8.6 or later, the NetworkManager cloud setup services must be disabled to maintain cluster stability. These services can interfere with cluster operations by automatically removing the overlay IP address from network interfaces.

Run these commands on each cluster node:

```
# systemctl disable --now nm-cloud-setup.timer
# systemctl disable --now nm-cloud-setup
```

Verify the services are disabled and stopped:

```
# systemctl status nm-cloud-setup.timer
# systemctl status nm-cloud-setup
```

The status commands should show both services as "disabled" and "inactive (dead)".

## Time Synchronization Services
<a name="_time_synchronization_services"></a>

Time synchronization is important for cluster operation. Ensure that chrony rpm is installed, and configure appropriate time servers in the configuration file.

You can use Amazon Time Sync Service that is available on any instance running in a VPC. It does not require internet access. To ensure consistency in the handling of leap seconds, don’t mix Amazon Time Sync Service with any other ntp time sync servers or pools.

Create or check the `/etc/chrony.d/ec2.conf` file to define the server:

```
# Amazon EC2 time source config
server 169.254.169.123 prefer iburst minpoll 4 maxpoll 4
```

Start the chronyd.service, using the following command:

```
# systemctl enable --now chronyd.service
# systemctl status chronyd
```

For more information, see [Set the time for your Linux instance](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-time.html).

## AWS CLI Profile
<a name="shared_aws_cli_profile"></a>

The AWS cluster resource agents use AWS Command Line Interface (AWS CLI). You need to create an AWS CLI profile for the root account.

You can either edit the config file at `/root/.aws` manually or by using the `aws configure` AWS CLI command.

You should skip providing the information for the access and secret access keys. The permissions are provided through IAM roles attached to Amazon EC2 instances.

```
# aws configure
AWS Access Key ID [None]:
AWS Secret Access Key [None]:
Default region name [None]: <region>
Default output format [None]:
```

The profile name is `default` unless configured. If you choose to use a different name you can specify `--profile`. The name chosen in this example is cluster. It is used in the AWS resource agent definition for pacemaker. The AWS Region must be the default AWS Region of the instance.

```
# aws configure --profile cluster
AWS Access Key ID [None]:
AWS Secret Access Key [None]:
Default region name [None]: <region>
Default output format [None]:
```

On the hosts, you can verify the available profiles using the following command:

```
# aws configure list-profiles
```

And review that an assumed role is associated by querying the caller identity:

```
# aws sts get-caller-identity --profile=<profile_name>
```

## Pacemaker Proxy Settings (Optional)
<a name="_pacemaker_proxy_settings_optional"></a>

If your Amazon EC2 instance has been configured to access the internet and/or AWS Cloud through proxy servers, then you need to replicate the settings in the pacemaker configuration. For more information, see [Using an HTTP Proxy](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-proxy.html).

Add the following lines to `/etc/sysconfig/pacemaker`:

```
http_proxy=http://<proxyhost>:<proxyport>
https_proxy=http://<proxyhost>:<proxyport>
no_proxy=127.0.0.1,localhost,169.254.169.254,fd00:ec2::254
```
+ Modify proxyhost and proxyport to match your settings.
+ Ensure that you exempt the address used to access the instance metadata.
+ Configure no\$1proxy to include the IP address of the instance metadata service – 169.254.169.254 (IPV4) and fd00:ec2::254 (IPV6). This address does not vary.

## Add Overlay IP for Initial Database Access
<a name="_add_overlay_ip_for_initial_database_access"></a>

This step is optional and only needed if you require client connectivity to the SAP HANA database before cluster setup. The Overlay IP will later be managed automatically by the cluster resources.

To enable initial database access, manually add the Overlay IP to the primary instance (where the SAP HANA database is currently running):

```
# ip addr add <hana_overlayip>/32 dev eth0
```
+ This configuration is temporary and will be lost after instance reboot
+ Only configure this on the current primary instance
+ The cluster will take over management of this IP once configured

## Hostname Resolution
<a name="_hostname_resolution"></a>

You must ensure that all instances can resolve all hostnames in use. Add the hostnames for cluster nodes to `/etc/hosts` file on all cluster nodes. This ensures that hostnames for cluster nodes can be resolved even in case of DNS issues. See the following example for a two-node cluster:

```
# cat /etc/hosts
10.2.10.1 hanahost01.example.com hanahost01
10.2.20.1 hanahost02.example.com hanahost02
172.16.52.1 hanahdb.example.com hanahdb
```

In this example, the secondary IPs used for the second cluster ring are not mentioned. They are only used in the cluster configuration. You can allocate virtual hostnames for administration and identification purposes.

**Important**  
The Overlay IP is out of VPC range, and cannot be reached from locations not associated with the route table, including on-premises.

# SAP HANA and Cluster Setup
<a name="sap-hana-pacemaker-rhel-deployment-cluster"></a>

**Topics**
+ [SAP HANA Setup and HSR](sap-hana-pacemaker-rhel-hana-setup-hsr.md)
+ [SAP HANA Service Control](sap-hana-pacemaker-rhel-hana-control.md)
+ [Cluster Node Setup](sap-hana-pacemaker-rhel-cluster-node-setup.md)
+ [Cluster Configuration](sap-hana-pacemaker-rhel-cluster-config.md)
+ [Client Connectivity](sap-hana-pacemaker-rhel-client-connectivity.md)

# SAP HANA Setup and HSR
<a name="sap-hana-pacemaker-rhel-hana-setup-hsr"></a>

Prepare SAP HANA for System Replication (HSR) by configuring parameters and creating required backups.

**Topics**
+ [Review AWS and SAP Installation Guides](#review_guides)
+ [Check global.ini parameters](#global_ini)
+ [Create a SAP HANA Backup on the Primary System](#pre_setup_backup)
+ [Configure System Replication on Primary and Secondary Systems](#register_hsr)
+ [Check SAP Host Agent Version](#sap_host_agent)

**Important**  
This guide assumes that SAP HANA Platform has been installed either as a scale-up configuration with two EC2 instances in different availability zones, or as a scale-out configuration with multiple EC2 instances in two availability zones, following the guidance from AWS and SAP.

## Review AWS and SAP Installation Guides
<a name="review_guides"></a>
+  AWS Documentation - [SAP HANA Environment Setup on AWS](https://docs.aws.amazon.com/sap/latest/sap-hana/std-sap-hana-environment-setup.html) 
+ SAP Documentation - [SAP HANA Server Installation and Update Guide](https://help.sap.com/docs/SAP_HANA_PLATFORM/2c1988d620e04368aa4103bf26f17727/7eb0167eb35e4e2885415205b8383584.html) 

SAP provides documentation on how to configure SAP HANA System Replication using the SAP HANA Cockpit, SAP HANA Studio or `hdbnsutil` on the command line. Review the documentation for your SAP HANA Version to ensure no changes to the guidance or to use a method other than command line.
+ SAP Documentation: [Configuring SAP HANA System Replication](https://help.sap.com/docs/SAP_HANA_PLATFORM/4e9b18c116aa42fc84c7dbfd02111aba/442bf027937746248f69701aa9b94112.html) 

## Check global.ini parameters
<a name="global_ini"></a>

Run the following as <sid>adm. These commands will prompt for the system password for the SYSTEMDB database.

**Check log\$1mode is set to normal**  
Ensure that the configuration parameter log\$1mode is set to `normal` in the persistence section of the global.ini file:

```
hdbsql -jx -i <hana_sys_nr> -u system -d SYSTEMDB "SELECT VALUE FROM M_INIFILE_CONTENTS WHERE FILE_NAME = 'global.ini' AND SECTION = 'persistence' AND KEY = 'log_mode';"
```

For example:

```
hdbadm> hdbsql -jx -i 00 -u system -d SYSTEMDB "SELECT VALUE FROM M_INIFILE_CONTENTS WHERE FILE_NAME = 'global.ini' AND SECTION = 'persistence' AND KEY = 'log_mode';"
VALUE
"normal"
```

**Review global.ini file replication**  
SAP HANA System Replication requires consistent configuration between primary and secondary systems to ensure proper operation, especially during failover scenarios. The `inifile_checker/replicate` parameter in global.ini provides an automated solution to this requirement. When enabled on the primary system, any configuration changes made to ini files on the primary are automatically synchronized to the secondary site. This removes the need for manual configuration replication and helps prevent configuration mismatches that could impact system availability. The parameter only needs to be configured on the primary system, as the secondary system will receive these configuration changes through the normal System Replication process.

Add the following to `global.ini`:

```
[inifile_checker]
replicate = true
```

See SAP Note [2978895 - Changing parameters on Primary and Secondary site of SAP HANA system](https://me.sap.com/notes/2978895) 

## Create a SAP HANA Backup on the Primary System
<a name="pre_setup_backup"></a>

 **Get a list of all active databases:** 

```
hdbsql -jx -i <hana_sys_nr> -u system -d SYSTEMDB "SELECT DATABASE_NAME,ACTIVE_STATUS from M_DATABASES"
```

For example:

```
hdbadm> hdbsql -jx -i 00 -u system -d SYSTEMDB "SELECT DATABASE_NAME,ACTIVE_STATUS from M_DATABASES"
Password:
DATABASE_NAME,ACTIVE_STATUS
"SYSTEMDB","YES"
"HDB","YES"
```

**Create a backup of the SYSTEMDB and each tenant database:**  
The following commands are examples for file based backups. Backups can be performed using your preferred tool and location. If using a filesystem (e.g. /backup) ensure there is sufficient space for a full backup.

------
#### [ Backint ]

For the SystemDB

```
hdbsql -i 00 -u SYSTEM  -d SYSTEMDB "BACKUP DATA USING BACKINT ('initial_hsr_db_SYSTEMDB') COMMENT 'Initial backup for HSR'";
```

For each Tenant DB

```
hdbsql -i 00 -u SYSTEM  -d <TENANT_DB> "BACKUP DATA USING BACKINT ('initial_hsr_db_<TENANT_DB>') COMMENT 'Initial backup for HSR'";
```
+ Run as <sid>adm
+ Ensure that backint has been configured correctly
+ You will be prompted to provide a password or alternatively can use `-p password` 

------
#### [ File ]

For the SystemDB

```
hdbsql -i <hana_sys_nr> -u system -d SYSTEMDB "BACKUP DATA USING FILE ('/<backup location>/initial_hsr_db_SYSTEMDB') COMMENT 'Initial backup for HSR'";
```

For each Tenant DB

```
hdbsql -i <hana_sys_nr> -u system -d <TENANT_DB> "BACKUP DATA USING FILE ('/<backup location>/initial_hsr_db_<TENANT_DB>') COMMENT 'Initial backup for HSR'";
```
+ Run as <sid>adm
+ Ensure that a backup location exists with sufficient space and the correct file permissions
+ You will be prompted to provide a password or alternatively can use `-p password` 

------

### Stop the Secondary System and Copy System PKI Keys
<a name="copy_keys"></a>

**Stop the secondary system**  
Stop the hana application on the secondary, as <sid>adm

```
sapcontrol -nr <hana_sys_nr> -function StopSystem <SID>
```

**Copy the system PKI keys**  
Copy the following system PKI SSFS key and data files from the primary system to the same location on the secondary system using scp, a shared file system, or an S3 bucket:

```
/usr/sap/<SID>/SYS/global/security/rsecssfs/data/SSFS_<SID>.DAT
/usr/sap/<SID>/SYS/global/security/rsecssfs/key/SSFS_<SID>.KEY
```

For example using scp:

```
hdbadm>scp -p /usr/sap/HDB/SYS/global/security/rsecssfs/data/SSFS_HDB.DAT hdbadm@hanahost02:/usr/sap/HDB/SYS/global/security/rsecssfs/data/SSFS_HDB.DAT
hdbadm>scp -p /usr/sap/HDB/SYS/global/security/rsecssfs/key/SSFS_HDB.KEY hdbadm@hanahost02:/usr/sap/HDB/SYS/global/security/rsecssfs/key/SSFS_HDB.KEY
```

## Configure System Replication on Primary and Secondary Systems
<a name="register_hsr"></a>

**Enable System Replication on the Primary System**  
Ensure the primary SAP HANA system is **started**, then as <sid>adm, enable system replication using a unique site name:

```
hdbnsutil -sr_enable --name=<site_1>
```

For example:

```
hdbadm> hdbnsutil -sr_enable --name=siteA
```

**Register System Replication on the Secondary System**  
Ensure the secondary SAP HANA system is **stopped**, then as <sid>adm, enable system replication using a unique site name, the connection details of the primary system and preferred replication options.

```
hdbnsutil -sr_register \
 --name=<site_2> \
 --remoteHost=<hostname_1> \
 --remoteInstance=<hana_sys_nr> \
 --replicationMode=[sync|syncmem] \
 --operationMode=[logreplay|logreplay_readenabled]
```

For example:

```
hdbadm> hdbnsutil -sr_register --name=siteB --remoteHost=hanahost01 --remoteInstance=00 --replicationMode=syncmem --operationMode=logreplay
```

Alternatively, if your setup requires active/active read-enabled access to the secondary:

```
hdbadm> hdbnsutil -sr_register --name=siteB --remoteHost=hanahost01 --remoteInstance=00 --replicationMode=syncmem --operationMode=logreplay_readenabled
```
+  `hostname_1` is the hostname used to install SAP HANA, which may be a virtual name.
+ The replication mode can be either `sync` or `syncmem`.
+ For replication to support a clustered system and a hot standby, the operation mode must be `logreplay` or `logreplay_readenabled`.
+ For more information review the SAP Documentation
  + SAP Documentation: [Replication Modes for SAP HANA System Replication](https://help.sap.com/docs/SAP_HANA_PLATFORM/6b94445c94ae495c83a19646e7c3fd56/c039a1a5b8824ecfa754b55e0caffc01.html) 
  + SAP Documentation: [Operaton Modes for SAP HANA System Replication](https://help.sap.com/docs/SAP_HANA_PLATFORM/6b94445c94ae495c83a19646e7c3fd56/627bd11e86c84ec2b9fcdf585d24011c.html) 
  + SAP Documentation: [SAP HANA System Replication - Active/Active (Read Enabled)](https://help.sap.com/docs/SAP_HANA_PLATFORM/6b94445c94ae495c83a19646e7c3fd56/fe5fc53706a34048bf4a3a93a5d7c866.html) 

## Check SAP Host Agent Version
<a name="sap_host_agent"></a>

The SAP host agent is used for SAP instance control and monitoring. This agent is used by SAP cluster resource agents and hooks. It is recommended that you have the latest version installed on all instances. For more details, see [SAP Note 2219592 – Upgrade Strategy of SAP Host Agent](https://me.sap.com/notes/2219592).

Use the following command to check the version of the host agent, repeat on all SAP HANA nodes:

```
# /usr/sap/hostctrl/exe/saphostexec -version
```

# SAP HANA Service Control
<a name="sap-hana-pacemaker-rhel-hana-control"></a>

Modify how SAP HANA services are managed to enable cluster takeover and operation.

**Topics**
+ [Add sidadm to haclient Group](#_add_sidadm_to_haclient_group)
+ [Modify SAP Profile for HANA](#_modify_sap_profile_for_hana)
+ [Configure SAPHanaSR Cluster Hook for Optimized Cluster Response](#hook_saphanasr)
+ [(Optional) Configure Fast Start Option](#_optional_configure_fast_start_option)
+ [Review systemd Integration](#_review_systemd_integration)

## Add sidadm to haclient Group
<a name="_add_sidadm_to_haclient_group"></a>

The pacemaker software creates a haclient operating system group. To ensure proper cluster access permissions, add the sidadm user to this group on all cluster nodes. Run the following command as root:

```
# usermod -a -G haclient hdbadm
```

## Modify SAP Profile for HANA
<a name="_modify_sap_profile_for_hana"></a>

To prevent automatic SAP HANA startup by the SAP start framework when an instance restarts, modify the SAP HANA instance profiles on all nodes. These profiles are located at `/usr/sap/<SID>/SYS/profile/`.

As <sid>adm, edit the SAP HANA profile `<SID>_HDB<hana_sys_nr>_<hostname>` and modify or add the Autostart parameter, ensuring it is set to 0:

```
Autostart = 0
```

## Configure SAPHanaSR Cluster Hook for Optimized Cluster Response
<a name="hook_saphanasr"></a>

The SAPHanaSR hook provides immediate notification to the cluster if system replication fails, complementing the standard cluster polling mechanism. This optimization can significantly improve failover response time.

Follow these steps to configure the SAPHanaSR hook:

1.  **Verify Cluster Package** 

   The hook configuration varies based on the resource agents in use (see [Deployment Guidance](sap-hana-pacemaker-rhel-references.md#deployments-rhel) for details).

------
#### [ SAPHanaSR ]

   Check the expected package is installed

   ```
   # rpm -qa resource-agents-sap-hana
   ```

------
#### [ SAPHanaSR-angi ]

   Check the expected package is installed

   ```
   # rpm -qa sap-hana-ha
   ```

------

1.  **Confirm Hook Location** 

   By default the package is installed in `/usr/share/sap-hana-ha/` or `/usr/share/SAPHanaSR/srHook`. We suggest using the default location but optionally you can copy it to a custom directory; for example, `/hana/share/myHooks`. The hook must be available on all SAP HANA cluster nodes.

1.  **Configure global.ini** 

   Update the `global.ini` file located at `/hana/shared/<SID>/global/hdb/custom/config/` on each SAP HANA cluster node. Make a backup copy before proceeding.

------
#### [ SAPHanaSR ]

   ```
   [ha_dr_provider_SAPHanaSR]
   provider = SAPHanaSR
   path = /usr/share/SAPHanaSR/srHook
   execution_order = 1
   
   [trace]
   ha_dr_saphanasr = info
   ```

**Note**  
Update the path if you have modified the package location.

------
#### [ sap-hana-ha (newer agent) ]

   ```
   [ha_dr_provider_sushanasr]
   provider = HanaSR
   path = /usr/share/sap-hana-ha/
   execution_order = 1
   
   [trace]
   ha_dr_sushanasr = info
   ```

**Note**  
Update the path if you have modified the package location.

------

1.  **Configure Sudo Privileges** 

   The SAPHanaSR Python hook requires sudo privileges for the <sid>adm user to access cluster attributes:

   1. Create a new sudoers file as root user in `/etc/sudoers.d/`, for example `60-SAPHanaSR-hook` 

   1. Use visudo to safely edit the new file `visudo /etc/sudoers.d/60-SAPHanaSR-hook` 

   1. Add the following configuration, replacing <sid> with lowercase system ID and <SID> with uppercase system ID:

      ```
      Cmnd_Alias SITE_SOK = /usr/sbin/crm_attribute -n hana_<sid>_site_srHook_[a-zA-Z0-9_]* -v SOK -t crm_config -s SAPHanaSR
      Cmnd_Alias SITE_SFAIL = /usr/sbin/crm_attribute -n hana_<sid>_site_srHook_[a-zA-Z0-9_]* -v SFAIL -t crm_config -s SAPHanaSR
      Cmnd_Alias HOOK_HELPER  = /usr/sbin/SAPHanaSR-hookHelper --sid=<SID> --case=checkTakeover
      hdbadm ALL=(ALL) NOPASSWD: SITE_SOK, SITE_SFAIL, HOOK_HELPER
      ```

      For example:

      ```
      Cmnd_Alias SITE_SOK = /usr/sbin/crm_attribute -n hana_hdb_site_srHook_[a-zA-Z0-9_]* -v SOK -t crm_config -s SAPHanaSR
      Cmnd_Alias SITE_SFAIL = /usr/sbin/crm_attribute -n hana_hdb_site_srHook_[a-zA-Z0-9_]* -v SFAIL -t crm_config -s SAPHanaSR
      Cmnd_Alias HOOK_HELPER  = /usr/sbin/SAPHanaSR-hookHelper --sid=HDB --case=checkTakeover
      hdbadm ALL=(ALL) NOPASSWD: SITE_SOK, SITE_SFAIL, HOOK_HELPER
      ```
**Note**  
The syntax uses a glob expression which allows it to adapt to different HSR site names whilst avoiding the use of wild cards. This ensures flexibility and security. A modification is still required if the SID changes. Replace the `<sid>` with a lowercase `sid` and `<SID>` with an uppercase `SID` which matches your installation.

1.  **Reload Configuration** 

   As <sid>adm reload the changes to `global.ini` using either a HANA restart or the command:

   ```
   hdbadm> hdbnsutil -reconfig
   ```

1.  **Verify Hook Configuration** 

   As <sid>adm, verify the hook is loaded:

   ```
   hdbadm> cdtrace
   hdbadm> grep "loading HA/DR Provider" nameserver*
   ```

1.  **Replicate Configuration to Secondary** 

   1. Confirm that global.ini changes have been replicated to the secondary system

   1. Create corresponding sudoers.d file on the secondary system

## (Optional) Configure Fast Start Option
<a name="_optional_configure_fast_start_option"></a>

Although out of scope of this document, the SAP HANA Fast Restart option uses tmpfs file systems to preserve and reuse MAIN data fragments to speed up SAP HANA restarts. This is effective in cases where the operating system is not restarted including local restarts of the Index Server.

Fast Start Option may be an alternative to the susChkSrv hook.

For more information, see SAP Documentation: [SAP HANA Fast Restart Option](https://help.sap.com/docs/SAP_HANA_PLATFORM/6b94445c94ae495c83a19646e7c3fd56/ce158d28135147f099b761f8b1ee43fc.html) 

## Review systemd Integration
<a name="_review_systemd_integration"></a>

Review HANA version and systemd version to determine whether the prerequisites for systemd are available:

```
sidadm> systemctl --version
```

**OS versions**
+ Red Hat Enterprise Linux 8 (systemd version 239)

**SAP HANA Revisions**
+ SAP HANA SPS07 revision 70

When using an SAP HANA version with systemd integration (SPS07 and later), you must run the following steps to prevent the nodes from being fenced when Amazon EC2 instances are intentionally stopped. See Note [3189534 - Linux: systemd integration for sapstartsrv and SAP HANA](https://me.sap.com/notes/3189534) 

1. Verify if SAP HANA is integrated with systemd. If it is integrated, a systemd service name, such as SAP<SID>\$1<hana\$1sys\$1nr>.service is present. For example, for SID HDB and instance number 00, SAPHDB\$100.service is the service name.

   Use the following command as root to find SAP systemd services:

   ```
   # systemctl list-unit-files | grep -i sap
   ```

1. Create a pacemaker service drop-in file:

   ```
   # mkdir -p /etc/systemd/system/pacemaker.service.d/
   ```

1. Create the file `/etc/systemd/system/pacemaker.service.d/50-saphana.conf` with the following content:

   ```
   [Unit]
   Description=pacemaker needs SAP instance service
   Documentation=man:SAPHanaSR_basic_cluster(7)
   Wants=SAP<SID>_<hana_sys_nr>.service
   After=SAP<SID>_<hana_sys_nr>.service
   ```

1. Enable the drop-in file by reloading systemd:

   ```
   # systemctl daemon-reload
   ```

1. Verify that the change is active:

   ```
   # systemctl show pacemaker.service | grep SAP<SID>_<hana_sys_nr>
   ```

   For example, for SID HDB and instance number 00, the following output is expected:

   ```
   # systemctl show pacemaker.service | grep SAPHDB_00
   Wants=SAPHDB_00.service resource-agents-deps.target dbus.service
   After=system.slice network.target corosync.service resource-agents-deps.target basic.target rsyslog.service SAPHDB_00.service systemd-journald.socket sysinit.target time-sync.target dbus.service sbd.service
   ```

# Cluster Node Setup
<a name="sap-hana-pacemaker-rhel-cluster-node-setup"></a>

Establish cluster communication between nodes using Corosync and configure required authentication.

**Topics**
+ [Deploy a Majority Maker Node (Scale-Out Clusters Only)](#_deploy_a_majority_maker_node_scale_out_clusters_only)
+ [Setup Passwordless Authentication](#_setup_passwordless_authentication)
+ [Start and Enable the pcsd service](#_start_and_enable_the_pcsd_service)
+ [Authorize the Cluster](#_authorize_the_cluster)
+ [Generate Corosync Configuration](#_generate_corosync_configuration)
+ [Verify Configuration](#_verify_configuration)

## Deploy a Majority Maker Node (Scale-Out Clusters Only)
<a name="_deploy_a_majority_maker_node_scale_out_clusters_only"></a>

**Note**  
Only required for clusters with more than two nodes.

When deploying an SAP HANA Scale-Out cluster in AWS, you must include a majority maker node in a third Availability Zone (AZ). The majority maker (tie-breaker) node ensures the cluster remains operational if one AZ fails by preserving the quorum. For the Scale-Out cluster to function, at least all nodes in one AZ plus the majority maker node must be running. If this minimum requirement is not met, the cluster loses its quorum state and any remaining SAP HANA nodes are fenced.

The majority maker requires a minimum EC2 instance configuration of 2 vCPUs, 2 GB RAM, and 50 GB disk space; this instance is exclusively used for quorum management and does not host an SAP HANA database or any other cluster resources. === Change the hacluster Password

On all cluster nodes, change the password of the operating system user hacluster:

```
# passwd hacluster
```

## Setup Passwordless Authentication
<a name="_setup_passwordless_authentication"></a>

Red Hat cluster tools provide comprehensive reporting and troubleshooting capabilities for cluster activity. Many of these tools require passwordless SSH access between nodes to collect cluster-wide information effectively. Red Hat recommends configuring passwordless SSH for the root user to enable seamless cluster diagnostics and reporting.

See Redhat Documentation [How to setup SSH Key passwordless login in Red Hat Enterprise Linux](https://access.redhat.com/solutions/9194) 

See [Accessing the Red Hat Knowledge base portal](https://docs.aws.amazon.com/systems-manager/latest/userguide/fleet-manager-red-hat-knowledge-base-access.html) 

**Warning**  
Review the security implications for your organization, including root access controls and network segmentation, before implementing this configuration.

## Start and Enable the pcsd service
<a name="_start_and_enable_the_pcsd_service"></a>

```
# systemctl enable pcsd --now
```

## Authorize the Cluster
<a name="_authorize_the_cluster"></a>

Run the following command to enable and start the pacemaker cluster service on both nodes:

```
# pcs host auth <hostname_1> <hostname_2> -u hacluster -p <password>
```
+ You will be prompted for the hacluster password you set earlier.

## Generate Corosync Configuration
<a name="_generate_corosync_configuration"></a>

Corosync provides membership and member-communication needs for high availability clusters.

Initial setup can be performed using the following command

```
# pcs cluster setup <cluster_name> \
<hostname_1> addr=<host_ip_1> addr=<host_additional_ip_1> \
<hostname_2> addr=<host_ip_2> addr=<host_additional_ip_2>
```
+ Example

```
# pcs cluster setup hana_cluster hanahost01 addr=10.1.20.1 addr=10.1.20.2 hanahost02 addr=10.2.20.1 addr=10.2.20.2
```


| IP address type | Example | 
| --- | --- | 
|  <host\$1ip\$11>  |  10.2.10.1  | 
|  <host\$1additional\$1ip\$11>  |  10.2.10.2  | 
|  <host\$1ip\$12>  |  10.2.20.1  | 
|  <host\$1additional\$1ip\$12>  |  10.2.20.2  | 

The timing parameters are optimized for AWS cloud environments:
+ Increasing the value of totem token to 15s provides reliable cluster operation while accommodating normal cloud network characteristics. These settings prevent unnecessary failovers during brief network variations
+ When scaling beyond two nodes, remove the two\$1node parameter from the quorum section. The timing parameters will automatically adjust using the token\$1coefficient feature to maintain appropriate failure detection as nodes are added.

```
# pcs cluster config update totem token=15000
```

## Verify Configuration
<a name="_verify_configuration"></a>

```
# pcs cluster start --all
```

**Example**  
By enabling the pacemaker service, the server automatically joins the cluster after a reboot. This ensures that your system is protected. Alternatively, you can start the pacemaker service manually on boot. You can then investigate the cause of failure.

Run the following command to check the status of the pacemaker service:

```
# systemctl status pacemaker
```

Example output:

```
● pacemaker.service - Pacemaker High Availability Cluster Manager
     Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; enabled; vendor preset: disabled)
     Active: active (running) since Mon 2025-06-02 13:27:48 AEST; 39s ago
       Docs: man:pacemakerd
             https://clusterlabs.org/pacemaker/doc/
   Main PID: 38554 (pacemakerd)
      Tasks: 7
     Memory: 31.3M
        CPU: 136ms
     CGroup: /system.slice/pacemaker.service
             ├─38554 /usr/sbin/pacemakerd
             ├─38555 /usr/libexec/pacemaker/pacemaker-based
             ├─38556 /usr/libexec/pacemaker/pacemaker-fenced
             ├─38557 /usr/libexec/pacemaker/pacemaker-execd
             ├─38558 /usr/libexec/pacemaker/pacemaker-attrd
             ├─38559 /usr/libexec/pacemaker/pacemaker-schedulerd
             └─38560 /usr/libexec/pacemaker/pacemaker-controld
```

Once the cluster service pacemaker is started, check the cluster status with pcs command, as shown in the following example:

```
# pcs status
```

Example output:

```
# pcs status
Cluster name: hana_cluster

WARNINGS:
No stonith devices and stonith-enabled is not false

Cluster Summary:
  * Stack: corosync
  * Current DC: hanahost02 (version 2.0.5-9.el8_4.8-ba59be7122) - partition with quorum
  * Last updated: Mon May 12 12:59:35 2025
  * Last change:  Mon May 12 12:59:25 2025 by hacluster via crmd on hanahost02
  * 2 nodes configured
  * 0 resource instances configured

Node List:
  * Online: [ hanahost01 hanahost02 ]

Full List of Resources:
  * No resources

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
```

The primary (hanahost01) and secondary (hanahost02) must show up as online. You can find the ring status and the associated IP address of the cluster with corosync-cfgtool command, as shown in the following example:

```
# corosync-cfgtool -s
```

Example output:

```
Local node ID 1, transport knet
LINK ID 0 udp
        addr    = 10.2.10.1
        status:
                nodeid:          1:     localhost
                nodeid:          2:     connected
LINK ID 1 udp
        addr    = 10.2.10.2
        status:
                nodeid:          1:     localhost
                nodeid:          2:     connected
```

# Cluster Configuration
<a name="sap-hana-pacemaker-rhel-cluster-config"></a>

Bootstrap the cluster and configure all required cluster resources and constraints.

**Topics**
+ [Prepare for Resource Creation](#_prepare_for_resource_creation)
+ [Cluster Bootstrap](#cluster-bootstrap)
+ [Create STONITH Fencing Resource](#resource-stonith)
+ [Create Overlay IP Resources](#resource-overlayip)
+ [Create SAPHanaTopology Resource](#resource-saphanatop)
+ [Create SAPHANA Resource (based on resource agent SAPHana or SAPHanaController)](#resource-saphana)
+ [Create Resource Constraints](#resource-constraints)
+ [Reset Configuration – Optional](#_reset_configuration_optional)

## Prepare for Resource Creation
<a name="_prepare_for_resource_creation"></a>

To ensure that the cluster does not perform any unexpected actions during setup of resources and configuration, set the maintenance mode to true.

Run the following command to put the cluster in maintenance mode:

```
# pcs property set maintenance-mode=true
```

To verify the current maintenance state:

```
$ pcs status
```

**Note**  
There are two types of maintenance mode:  
Cluster-wide maintenance (set with `pcs property set maintenance-mode=true`)
Node-specific maintenance (set with `pcs node maintenance nodename`)
Always use cluster-wide maintenance mode when making configuration changes. For node-specific operations like hardware maintenance, refer to the Operations section for proper procedures.  
To disable maintenance mode after configuration is complete:  

```
# pcs property set maintenance-mode=false
```

## Cluster Bootstrap
<a name="cluster-bootstrap"></a>

### Configure Cluster Properties
<a name="_configure_cluster_properties"></a>

Configure cluster properties to establish fencing behavior and resource failover settings:

```
# pcs property set stonith-enabled="true"
# pcs property set stonith-timeout="600"
# pcs property set priority-fencing-delay="20"
```
+ The **priority-fencing-delay** is recommended for protecting SAP HANA nodes during network partitioning events. When a cluster partition occurs, this delay gives preference to nodes hosting higher priority resources, with SAP HANA Primary (promoted) instances receiving additional priority weighting. This helps ensure the Primary HANA node survives in split-brain scenarios. The recommended 20 second priority-fencing-delay works in conjunction with the pcmk\$1delay\$1max (10 seconds) configured in the stonith resource, providing a total potential delay of up to 30 seconds before fencing occurs.

To verify your cluster property settings:

```
# pcs property list
# pcs property config <property_name>
```

### Configure Resource Defaults
<a name="_configure_resource_defaults"></a>

Configure resource default behaviors:

------
#### [ RHEL 8.4 and above ]

```
# pcs resource defaults update resource-stickiness="1000"
# pcs resource defaults update migration-threshold="5000"
```

------
#### [ RHEL 7.x and RHEL 8.0 to 8.3 ]

```
# pcs resource defaults resource-stickiness="1000"
# pcs resource defaults migration-threshold="5000"
```
+ The **resource-stickiness** value prevents unnecessary resource movement, effectively setting a "cost" for moving resources. A value of 1000 strongly encourages resources to remain on their current node, avoiding the downtime associated with movement.
+ The **migration-threshold** of 5000 ensures the cluster will attempt to recover a resource on the same node many times before declaring that node unsuitable for hosting the resource.

------

Individual resources may override these defaults with their own defined values.

To verify your resource default settings:

### Configure Operation Defaults
<a name="_configure_operation_defaults"></a>

```
# pcs resource op defaults update timeout="600"
```

The op\$1defaults timeout ensures all cluster operations have a reasonable default timeout of 600 seconds when resource-specific timeouts are not defined. Defaults do not apply to resources which override them with their own defined values

## Create STONITH Fencing Resource
<a name="resource-stonith"></a>

An AWS STONITH resource is required for proper cluster fencing operations. The `fence_aws` resource is recommended for AWS deployments as it leverages the AWS API to safely fence failed or incommunicable nodes by stopping their EC2 instances.

Create the STONITH resource using resource agent ** `fence_aws` **:

```
# pcs stonith create <stonith_resource_name> fence_aws \
pcmk_host_map="<hostname_1>:<instance_id_1>;<hostname_2>:<instance_id_2>" \
region="<aws_region>" \
skip_os_shutdown="true" \
pcmk_delay_max="10" \
pcmk_reboot_timeout="600" \
pcmk_reboot_retries="4" \
op start interval="0" timeout="600" \
op stop interval="0" timeout="180" \
op monitor interval="300" timeout="60"
```

Details:
+  **pcmk\$1host\$1map** - Maps cluster node hostnames to their EC2 instance IDs. This mapping must be unique within the AWS account and follow the format hostname:instance-id, with multiple entries separated by semicolons.
+  **region** - AWS region where the EC2 instances are deployed
+  **pcmk\$1delay\$1max** - Random delay before fencing operations. Works in conjunction with cluster property `priority-fencing-delay` to prevent simultaneous fencing in 2-node clusters. Historically set to higher values, but with `priority-fencing-delay` now handling primary node protection, a lower value (10s) is sufficient. Omit in clusters with real quorum (3\$1 nodes) to avoid unnecessary delay.
+  **pcmk\$1reboot\$1timeout** - Maximum time in seconds allowed for a reboot operation
+  **pcmk\$1reboot\$1retries** - Number of times to retry a failed reboot operation
+  **skip\$1os\$1shutdown** (NEW) - Leverages a new ec2 stop-instance API flag to forcefully stop an EC2 Instance by skipping the shutdown of the Operating System.
  +  [Red Hat Solution 4963741 - fence\$1aws fence action fails with "Timed out waiting to power OFF"](https://access.redhat.com/solutions/4963741) (requires Red Hat Customer Portal access)
+  *Example using values from [Parameter Reference](sap-hana-pacemaker-rhel-parameters.md) * :  
**Example**  

  ```
  # pcs stonith create rsc_fence_aws fence_aws \
  pcmk_host_map="hanahost01:i-xxxxinstidforhost1;hanahost02:i-xxxxinstidforhost2" \
  region="us-east-1" \
  skip_os_shutdown="true" \
  pcmk_delay_max="10" \
  pcmk_reboot_timeout="600" \
  pcmk_reboot_retries="4" \
  op start interval="0" timeout="600" \
  op stop interval="0" timeout="180" \
  op monitor interval="300" timeout="60"
  ```

**Note**  
When configuring the STONITH resource, consider your instance’s startup and shutdown times. The default pcmk\$1reboot\$1action is 'reboot', where the cluster waits for both stop and start actions to complete before considering the fencing action successful. This allows the cluster to return to a protected state. Setting `pcmk_reboot_action=off` allows the cluster to proceed immediately after shutdown. For High Memory Metal instances, only 'off' is recommended due to the extended time to initialize memory during startup.  

```
# pcs resource update <stonith_resource_name> pcmk_reboot_action="off"
# pcs resource update <stonith_resource_name> pcmk_off_timeout="600"
# pcs resource update <stonith_resource_name> pcmk_off_retries="4"
```

## Create Overlay IP Resources
<a name="resource-overlayip"></a>

This resource ensures client connections follow the SAP HANA primary instance during failover by updating AWS route table entries. It manages an overlay IP address that always points to the active SAP HANA database

Create the IP resource:

```
# pcs resource create rsc_ip_<SID>_HDB<hana_sys_nr> ocf:heartbeat:aws-vpc-move-ip \
ip="<hana_overlayip>" \
routing_table="<routetable_id>" \
interface="eth0" \
profile="<cli_cluster_profile>" \
op start interval="0" timeout="180" \
op stop interval="0" timeout="180" \
op monitor interval="60" timeout="60"
```
+  **ip** - Overlay IP address that will be used to connect to the Primary SAP HANA database. See [Overlay IP Concept](sap-hana-pacemaker-rhel-concepts.md#overlay-ip-rhel) 
+  **routing\$1table** - AWS route table ID(s) that need to be updated. Multiple route tables can be specified using commas (For example, `routing_table=rtb-xxxxxroutetable1,rtb-xxxxxroutetable2`). Ensure initial entries have been created following [Add VPC Route Table Entries for Overlay IPs](sap-hana-pacemaker-rhel-infra-setup.md#rt-rhel) 
+  **interface** - Network interface for the IP address (typically eth0)
+  **profile** - (optional) AWS CLI profile name for API authentication. Verify profile exists with `aws configure list-profiles`. If a profile is not explicitly configured the default profile will be used.
+  **awscli** - (optional) Path to the AWS CLI executable. The default path is `/usr/bin/aws`. Only specify this parameter if the AWS CLI is installed in a different location. To confirm the path on your system, run `which aws`.
+  *Example using values from [Parameter Reference](sap-hana-pacemaker-rhel-parameters.md) * :  
**Example**  

  ```
  # pcs resource create rsc_ip_HDB_HDB00 ocf:heartbeat:aws-vpc-move-ip \
  ip="172.16.52.1" \
  routing_table="rtb-xxxxxroutetable1" \
  interface="eth0" \
  profile="cluster" \
  op start interval="0" timeout="180" \
  op stop interval="0" timeout="180" \
  op monitor interval="60" timeout="60"
  ```

**Note**  
To update any resource parameter after creation, use `pcs resource update`. For example, if the AWS CLI is not installed at the default path (`/usr/bin/aws`), run:  

```
# pcs resource update rsc_ip_<SID>_HDB<hana_sys_nr> awscli=$(which aws)
```

**For Active/Active Read Enabled**  
Only if you are using `logreplay_readenabled` and require that your secondary is accessible via overlay IP. You can create an additional IP resource.

```
# pcs resource create primitive rsc_ip_<SID>_HDB<hana_sys_nr>_readenabled ocf:heartbeat:aws-vpc-move-ip \
ip="<readenabled_overlayip>" \
routing_table="<routetable_id>" \
interface="eth0" \
profile="<cli_cluster_profile>" \
op start interval="0" timeout="180" \
op stop interval="0" timeout="180" \
op monitor interval="60" timeout="60"
```
+  *Example using values from [Parameter Reference](sap-hana-pacemaker-rhel-parameters.md) * :  
**Example**  

  ```
  # crm configure primitive rsc_ip_HDB_HDB00_readenabled ocf:heartbeat:aws-vpc-move-ip \
  params ip="172.16.52.2" \
  routing_table="rtb-xxxxxroutetable1" \
  interface="eth0" \
  profile="cluster" \
  op start interval="0" timeout="180" \
  op stop interval="0" timeout="180" \
  op monitor interval="60" timeout="60"
  ```

**For Shared VPC**  
If your configuration requires a shared vpc, two additional parameters are required.

```
# pcs resource create primitive rsc_ip_<SID>_HDB<hana_sys_nr> ocf:heartbeat:aws-vpc-move-ip \
ip="<hana_overlayip>" routing_table=<routetable_id> interface=eth0 \
profile="<cli_cluster_profile>" lookup_type=NetworkInterfaceId \
routing_table_role="arn:aws:iam::<sharing_vpc_account_id>:role/<sharing_vpc_account_cluster_role>" \
op start interval="0" timeout="180" \
op stop interval="0" timeout="180" \
op monitor interval="60" timeout="60"
```

Additional details:
+ lookup\$1type=NetworkInterfaceId
+ routing\$1table\$1role="arn:aws:iam::<shared\$1vpc\$1account\$1id>:role/<sharing\$1vpc\$1account\$1cluster\$1role>"

## Create SAPHanaTopology Resource
<a name="resource-saphanatop"></a>

The SAPHanaTopology resource agent helps manage high availability for SAP HANA databases with system replication. It analyzes the HANA topology and reports findings via node status attributes. These attributes are used by either the SAPHana or SAPHanaController resource agents to control the HANA databases. SAPHanaTopology starts and monitors the local saphostagent, leveraging SAP interfaces like landscapeHostConfiguration.py, hdbnsutil, and saphostctrl to gather information about system status, roles, and configuration.

For both scale-up and scale-out deployments

For documentation on the resource you can review the man page.

```
# man ocf_heartbeat_SAPHanaTopology
```

------
#### [ For scale-up (2-node) ]

For the primitive and clone:

```
# pcs resource create rsc_SAPHanaTopology_<SID>_HDB<hana_sys_nr> ocf:heartbeat:SAPHanaTopology \
SID="<SID>" InstanceNumber="<hana_sys_nr>" \
op start interval="0" timeout="600" \
op stop interval="0" timeout="300" \
op monitor interval="10" timeout="600" \
clone clone-node-max="1" interleave="true" clone-max="2"
```
+  *Example using values from [Parameter Reference](sap-hana-pacemaker-rhel-parameters.md) * :  
**Example**  

  ```
  # pcs resource create rsc_SAPHanaTopology_HDB_HDB00 ocf:heartbeat:SAPHanaTopology \
  SID="HDB" \
  InstanceNumber="00" \
  op start interval="0" timeout="600" \
  op stop interval="0" timeout="300" \
  op monitor interval="10" timeout="600" \
  clone clone-node-max="1" interleave="true" clone-max="2"
  ```

------
#### [ For scale-out ]

For the primitive and clone:

```
# pcs resource create rsc_SAPHanaTopology_<SID>_HDB<hana_sys_nr> ocf:heartbeat:SAPHanaTopology \
SID="<SID>" InstanceNumber="<hana_sys_nr>" \
op start interval="0" timeout="600" \
op stop interval="0" timeout="300" \
op monitor interval="10" timeout="600" \
clone clone-node-max="1" interleave="true" clone-max="<number-of-nodes>"
```
+  *Example using values from [Parameter Reference](sap-hana-pacemaker-rhel-parameters.md) * :  
**Example**  

  ```
  # pcs resource create rsc_SAPHanaTopology_HDB_HDB00 ocf:heartbeat:SAPHanaTopology \
  SID="HDB" InstanceNumber="00" \
  op start interval="0" timeout="600" \
  op stop interval="0" timeout="300" \
  op monitor interval="10" timeout="600" \
  clone clone-node-max="1" interleave="true" clone-max="6"
  ```

------

Details:
+  **SID** - SAP System ID for the HANA instance
+  **InstanceNumber** - Instance number of the SAP HANA instance
+  **clone-node-max** - Defines how many copies of the resource agent can be started on a single node (set to 1)
+  **interleave** - Enables parallel starting of dependent clone resources on the same node (set to true)
+  **clone-max** - Defines the total number of clone instances that can be started in the cluster (For example, use 2 for scale-out or set to 6 for scale-out with 3 nodes per site, do not include majority maker node)

## Create SAPHANA Resource (based on resource agent SAPHana or SAPHanaController)
<a name="resource-saphana"></a>

The SAP HANA resource agents manage system replication and failover between SAP HANA databases. These agents control start, stop, and monitoring operations while checking synchronization status to maintain data consistency. They leverage SAP interfaces including sapcontrol, landscapeHostConfiguration, hdbnsutil, systemReplicationStatus, and saphostctrl. All configurations work in conjunction with the SAPHanaTopology agent, which gathers information about the system replication status across cluster nodes.

Choose the appropriate resource agent configuration based on your SAP HANA architecture:

### SAPHanaSR-angi Deployments (Available in RHEL 9.6 and 10\$1)
<a name="_saphanasr_angi_deployments_available_in_rhel_9_6_and_10"></a>

Available and recommended for new deployments on RHEL 9.6 and 10 \$1. The SAPHanaController resource agent with next generation system replication architecture (SAPHanaSR-angi) provides improved integration and management capabilities for both scale-up and scale-out deployments. For detailed information:

For documentation on the resource you can review the man page.

```
# man ocf_heartbeat_SAPHanaController
```

------
#### [ For scale-up (2-node) ]

Create the primitive

```
# pcs resource create rsc_SAPHanaController_<SID>_HDB<hana_sys_nr> ocf:heartbeat:SAPHanaController \
SID="<SID>" \
InstanceNumber="<hana_sys_nr>" \
PREFER_SITE_TAKEOVER="true" \
DUPLICATE_PRIMARY_TIMEOUT="7200" \
AUTOMATED_REGISTER="true" \
op start interval="0" timeout="3600" \
op stop interval="0" timeout="3600" \
op promote interval="0" timeout="3600" \
op monitor interval="60" role="Promoted" timeout="700" \
op monitor interval="61" role="Unpromoted" timeout="700" \
promotable notify="true" clone-node-max="1" interleave="true" clone-max="2" \
meta priority="100"
```
+  *Example using values from [Parameter Reference](sap-hana-pacemaker-rhel-parameters.md) * :

  ```
  # pcs resource create rsc_SAPHanaController_HDB_HDB00 ocf:heartbeat:SAPHanaController \
  SID="HDB" \
  InstanceNumber="00" \
  PREFER_SITE_TAKEOVER="true" \
  DUPLICATE_PRIMARY_TIMEOUT="7200" \
  AUTOMATED_REGISTER="true" \
  op start interval="0" timeout="3600" \
  op stop interval="0" timeout="3600" \
  op promote interval="0" timeout="3600" \
  op monitor interval="60" role="Promoted" timeout="700" \
  op monitor interval="61" role="Unpromoted" timeout="700" \
  promotable notify="true" clone-node-max="1" interleave="true" clone-max="2" \
  meta priority="100"
  ```

------
#### [ For scale-out ]

Create the primitive using the SAPHanaController Resource Agent:

```
# pcs resource create rsc_SAPHanaController_<SID>_HDB<hana_sys_nr> ocf:heartbeat:SAPHanaController \
SID="<SID>" \
InstanceNumber="<hana_sys_nr>" \
PREFER_SITE_TAKEOVER="true" \
DUPLICATE_PRIMARY_TIMEOUT="7200" \
AUTOMATED_REGISTER="true" \
op start interval="0" timeout="3600" \
op stop interval="0" timeout="3600" \
op promote interval="0" timeout="3600" \
op monitor interval="60" role="Promoted" timeout="700" \
op monitor interval="61" role="Unpromoted" timeout="700" \
promotable notify="true" clone-node-max="1" interleave="true" clone-max="<number-of-nodes>"
```
+  *Example using values from [Parameter Reference](sap-hana-pacemaker-rhel-parameters.md) * :  
**Example**  

  ```
  # pcs resource create rsc_SAPHanaController_<SID>_HDB<hana_sys_nr> ocf:heartbeat:SAPHanaController \
  params SID="HDB" \
  InstanceNumber="00" \
  PREFER_SITE_TAKEOVER="true" \
  DUPLICATE_PRIMARY_TIMEOUT="7200" \
  AUTOMATED_REGISTER="true" \
  op start interval="0" timeout="3600" \
  op stop interval="0" timeout="3600" \
  op promote interval="0" timeout="3600" "\
  op monitor interval="60" role="Promoted" timeout="700" \
  op monitor interval="61" role="Unpromoted" timeout="700" \
  promotable notify="true" clone-node-max="1" interleave="true" clone-max="<number-of-nodes>"
  ```

------

Details:
+  **SID** - SAP System ID for the HANA instance
+  **InstanceNumber** - Instance number of the SAP HANA instance
+  **clone-node-max** - Defines how many copies of the resource agent can be started on a single node (set to 1)
+  **interleave** - Enables parallel starting of dependent clone resources on the same node (set to true)
+  **clone-max** - Defines the total number of clone instances that can be started in the cluster (For example, use 2 for scale-out or set to 6 for scale-out with 3 nodes per site, do not include majority maker node)
+  **PREFER\$1SITE\$1TAKEOVER** defines whether a takeover to the secondary is preferred. Review for non standard deployments.
+  **AUTOMATED\$1REGISTER** defines whether the ex-primary should be registered as a secondary. Review for non standard deployments.
+  **DUPLICATE\$1PRIMARY\$1TIMEOUT** is the wait time to minimise the risk of an unintended dual primary.
+  **meta priority** - Setting this to 100 works in conjunction with priority-fencing-delay to ensure proper failover order and prevent simultaneous fencing operations
+ The start and stop timeout values (3600s) may need to be increased for larger databases. Adjust these values based on your database size and observed startup/shutdown times
+ If you need to update your configuration, the following examples may help you with the right command

  ```
  # pcs resource update rsc_SAPHanaController_HDB_HDB00 op monitor role="Promoted" timeout=900
  # pcs resource update rsc_SAPHanaController_HDB_HDB00 DUPLICATE_PRIMARY_TIMEOUT=3600
  # pcs resource meta rsc_SAPHanaController_HDB_HDB00-clone priority=100
  ```

### Classic Deployments
<a name="_classic_deployments"></a>

For classic scale-up deployments, the SAPHana resource agent manages takeover between two SAP HANA databases. For detailed information:

```
# man ocf_heartbeat_SAPHana
```

------
#### [ For scale-up (2-node) ]

Create the primitive using the SAPHana Resource Agent

```
# pcs resource create rsc_SAPHana_<SID>_HDB<hana_sys_nr> ocf:heartbeat:SAPHana \
SID="<SID>" \
InstanceNumber="<hana_sys_nr>" \
PREFER_SITE_TAKEOVER="true" \
DUPLICATE_PRIMARY_TIMEOUT="7200" \
AUTOMATED_REGISTER="true" \
op start interval="0" timeout="3600" \
op stop interval="0" timeout="3600" \
op promote interval="0" timeout="3600" \
op monitor interval="60" role="Promoted" timeout="700" \
op monitor interval="61" role="Unpromoted" timeout="700" \
promotable notify="true" clone-node-max="1" interleave="true" clone-max="2" \
meta priority="100"
```
+  *Example using values from [Parameter Reference](sap-hana-pacemaker-rhel-parameters.md) * :

  ```
  # pcs resource create rsc_SAPHana_HDB_HDB00 ocf:heartbeat:SAPHana \
  SID="HDB" \
  InstanceNumber="00" \
  PREFER_SITE_TAKEOVER="true" \
  DUPLICATE_PRIMARY_TIMEOUT="7200" \
  AUTOMATED_REGISTER="true" \
  op start interval="0" timeout="3600" \
  op stop interval="0" timeout="3600" \
  op promote interval="0" timeout="3600" \
  op monitor interval="60" role="Promoted" timeout="700" \
  op monitor interval="61" role="Unpromoted" timeout="700" \
  promotable notify="true" clone-node-max="1" interleave="true" clone-max="2" \
  meta priority="100"
  ```

------
#### [ For scale-out ]

Create the primitive using the SAPHanaController Resource Agent:

```
# pcs resource create rsc_SAPHanaController_<SID>_HDB<hana_sys_nr> ocf:heartbeat:SAPHanaController \
SID="<SID>" \
InstanceNumber="<hana_sys_nr>" \
PREFER_SITE_TAKEOVER="true" \
DUPLICATE_PRIMARY_TIMEOUT="7200" \
AUTOMATED_REGISTER="true" \
op start interval="0" timeout="3600" \
op stop interval="0" timeout="3600" \
op promote interval="0" timeout="3600" \
op monitor interval="60" role="Promoted" timeout="700" \
op monitor interval="61" role="Unpromoted" timeout="700" \
promotable notify="true" clone-node-max="1" interleave="true" clone-max="<number-of-nodes>"
```
+  *Example using values from [Parameter Reference](sap-hana-pacemaker-rhel-parameters.md) * :

  ```
  # pcs resource create rsc_SAPHanaController_<SID>_HDB<hana_sys_nr> ocf:heartbeat:SAPHanaController \
  params SID="HDB" \
  InstanceNumber="00" \
  PREFER_SITE_TAKEOVER="true" \
  DUPLICATE_PRIMARY_TIMEOUT="7200" \
  AUTOMATED_REGISTER="true" \
  op start interval="0" timeout="3600" \
  op stop interval="0" timeout="3600" \
  op promote interval="0" timeout="3600" "\
  op monitor interval="60" role="Promoted" timeout="700" \
  op monitor interval="61" role="Unpromoted" timeout="700" \
  promotable notify="true" clone-node-max="1" interleave="true" clone-max="<number-of-nodes>"
  ```

------

Details:
+  **SID** - SAP System ID for the HANA instance
+  **InstanceNumber** - Instance number of the SAP HANA instance
+  **clone-node-max** - Defines how many copies of the resource agent can be started on a single node (set to 1)
+  **interleave** - Enables parallel starting of dependent clone resources on the same node (set to true)
+  **clone-max** - Defines the total number of clone instances that can be started in the cluster (For example, use 2 for scale-out or set to 6 for scale-out with 3 nodes per site, do not include majority maker node)
+  **PREFER\$1SITE\$1TAKEOVER** defines whether a takeover to the secondary is preferred. Review for non standard deployments.
+  **AUTOMATED\$1REGISTER** defines whether the ex-primary should be registered as a secondary. Review for non standard deployments.
+  **DUPLICATE\$1PRIMARY\$1TIMEOUT** is the wait time to minimise the risk of an unintended dual primary.
+  **meta priority** - Setting this to 100 works in conjunction with priority-fencing-delay to ensure proper failover order and prevent simultaneous fencing operations
+ The start and stop timeout values (3600s) may need to be increased for larger databases. Adjust these values based on your database size and observed startup/shutdown times
+ If you need to update your configuration, the following examples may help you with the right command

  ```
  # pcs resource update rsc_SAPHana_HDB_HDB00 op monitor role="Promoted" timeout=900
  # pcs resource update rsc_SAPHana_HDB_HDB00 DUPLICATE_PRIMARY_TIMEOUT=3600
  # pcs resource meta rsc_SAPHana_HDB_HDB00-clone priority=100
  ```

## Create Resource Constraints
<a name="resource-constraints"></a>

The following constraints are required.

### Order Constraint
<a name="_order_constraint"></a>

This constraint defines the start order between the SAPHanaTopology and SAPHana resources:

```
# pcs constraint order <SAPHanaTopology-clone> <SAPHana/SAPHanaController-clone> symmetrical=false
```
+  *Example* :

  ```
  # pcs constraint order start rsc_SAPHanaTopology_HDB_HDB00-clone then rsc_SAPHana_HDB_HDB00-clone symmetrical=false
  ```

### Colocation Constraint
<a name="_colocation_constraint"></a>

#### IP with Primary
<a name="_ip_with_primary"></a>

This constraint ensures that the IP resource which determines the target of the overlay IP runs on the node which has the primary SAP Hana role:

```
# pcs constraint colocation add <ip_resource> with promoted <SAPHana/SAPHanaController-clone> 2000
```
+  *Example* :

  ```
  # pcs constraint colocation add rsc_ip_HDB_HDB00 with promoted rsc_SAPHana_HDB_HDB00-clone 2000
  ```

#### ReadOnly IP with Secondary (Only for ReadOnly Patterns)
<a name="_readonly_ip_with_secondary_only_for_readonly_patterns"></a>

This constraint ensures that the read-enabled IP resource runs on the secondary (Unpromoted) node. When the secondary node is unavailable, the IP will move to the primary node, where read workloads will share capacity with primary workloads:

```
# pcs constraint colocation add <ip_resource> with unpromoted <SAPHana/SAPHanaController-clone> 2000
```
+  *Example* :

  ```
  # pcs constraint colocation add rsc_ip_HDB_HDB00_readenabled  with unpromoted rsc_SAPHana_HDB_HDB00-clone 2000
  ```

### Location Constraint
<a name="_location_constraint"></a>

#### No SAP HANA Resources on the Majority Maker (Scale Out Only)
<a name="_no_sap_hana_resources_on_the_majority_maker_scale_out_only"></a>

This location constraint ensures that SAP HANA Resources avoid the Majority Maker, which is not suited to running them.

```
# pcs constraint location <SAPHanaTopology-clone> avoids <hostname_mm>
# pcs constraint location <SAPHana/SAPHanaController-clone> avoids <hostname_mm>
```

### Activate Cluster
<a name="_activate_cluster"></a>

Use `pcs config show` to review that all the values have been entered correctly.

On confirmation of correct values, set the maintenance mode to false using the following command. This allows the cluster to take control of the resources:

```
# pcs property set maintenance-mode=false
```

## Reset Configuration – Optional
<a name="_reset_configuration_optional"></a>

**Important**  
The following instructions help you reset the complete configuration. Run these commands only if you want to start setup from the beginning.

Run the following command to back up the current configuration for reference:

```
# pcs config backup /tmp/cluster_backup_$(date +%Y%m%d)
# pcs config show > /tmp/config_backup_$(date +%Y%m%d).txt
```

Run the following command to stop and clear the current configuration

```
# pcs cluster stop --all
hanahost02: Stopping Cluster (pacemaker)...
hanahost01: Stopping Cluster (pacemaker)...
hanahost02: Stopping Cluster (corosync)...
hanahost01: Stopping Cluster (corosync)...
# pcs cluster destroy
Shutting down pacemaker/corosync services...
Killing any remaining services...
Removing all cluster configuration files...
```

Once the preceding erase command is executed, it removes all of the cluster resources from Cluster Information Base (CIB), and disconnects the communication from corosync to the cluster. Only perform these steps if you absolutely need to reset everything to defaults. For minor changes, use pcs resource update or pcs property set instead.

# Client Connectivity
<a name="sap-hana-pacemaker-rhel-client-connectivity"></a>

For proper SAP HANA database connectivity:
+ Ensure that the Overlay IP can be correctly resolved in all application servers
+ DNS configuration or local host entries must be valid
+ Network routing must be properly configured
+ SAP HANA client libraries must be installed and up to date

Ensure that the connectivity data for the SAP HANA Database references the hostname associated with the Overlay IP. For more information see SAP Documentation: [Setting Connectivity Data for the SAP HANA Database](https://help.sap.com/docs/SLTOOLSET/39c32e9783f6439e871410848f61544c/b7ed2d55b0a7f857e10000000a441470.html?version=CURRENT_VERSION_SWPM20) 

Test database connectivity using R3trans utility:

```
sidadm> R3trans -d
```

Review additional connections to SAP HANA that require High Availability. While application connectivity should use the overlay IP, administrative tools (SAP HANA Studio, hdbsql commands, monitoring tools) require direct connectivity to individual SAP HANA instances.

# Operations
<a name="sap-hana-pacemaker-rhel-operations"></a>

**Topics**
+ [Viewing the cluster state](sap-hana-pacemaker-rhel-ops-cluster-state.md)
+ [Performing planned maintenance](sap-hana-pacemaker-rhel-ops-planned-maint.md)
+ [Post-failure analysis and reset](sap-hana-pacemaker-rhel-ops-post-failure.md)
+ [Alerting and monitoring](sap-hana-pacemaker-rhel-ops-alert-monitor.md)

# Viewing the cluster state
<a name="sap-hana-pacemaker-rhel-ops-cluster-state"></a>

**Topics**
+ [Operating system based](#_operating_system_based)

## Operating system based
<a name="_operating_system_based"></a>

There are multiple operating system commands that can be run as root or as a user with appropriate permissions. The commands enable you to get an overview of the status of the cluster and its services.

```
# pcs status --full
```

Note: Omit the `--full` for a more concise output if you do not need to view the node attributes.

Sample output:

```
Cluster name: hacluster
Cluster Summary:
  * Stack: corosync
  * Current DC: hanahost02 (version 2.1.2-4.el9_0.5-ada5c3b36e2) - partition with quorum
  * Last updated: Tue Jun  3 15:47:15 2025
  * Last change:  Tue Jun  3 15:47:12 2025 by hacluster via crmd on hanahost02
  * 2 nodes configured
  * 6 resource instances configured

Node List:
  * Online: [ hanahost01 hanahost02 ]

Full List of Resources:
  * rsc_fence_aws       (stonith:fence_aws):     Started hanahost01
  * rsc_ip_HDB_HDB00    (ocf:heartbeat:aws-vpc-move-ip):         Stopped
  * Clone Set: rsc_SAPHanaTopology_HDB_HDB00-clone [rsc_SAPHanaTopology_HDB_HDB00]:
    * Started: [ hanahost01 hanahost02 ]
  * Clone Set: rsc_SAPHana_HDB_HDB00-clone [rsc_SAPHana_HDB_HDB00] (promotable):
    * Promoted: [ hanahost02 ]
    * Unpromoted: [ hanahost01 ]

Node Attributes:
  * Node: hanahost01 (1):
    * hana_hdb_clone_state              : PROMOTED
    * hana_hdb_op_mode                  : logreplay
    * hana_hdb_remoteHost               : hanavirt02
    * hana_hdb_roles                    : 4:P:master1:master:worker:master
    * hana_hdb_site                     : siteA
    * hana_hdb_srah                     : -
    * hana_hdb_srmode                   : syncmem
    * hana_hdb_sync_state               : PRIM
    * hana_hdb_version                  : 2.00.073.00
    * hana_hdb_vhost                    : hanavirt01
    * lpa_hdb_lpt                       : 1755493611
    * master-rsc_SAPHana_HDB_HDB00      : 150
  * Node: hanahost02 (2):
    * hana_hdb_clone_state              : DEMOTED
    * hana_hdb_op_mode                  : logreplay
    * hana_hdb_remoteHost               : hanavirt01
    * hana_hdb_roles                    : 4:S:master1:master:worker:master
    * hana_hdb_site                     : siteB
    * hana_hdb_srah                     : -
    * hana_hdb_srmode                   : syncmem
    * hana_hdb_sync_state               : SOK
    * hana_hdb_version                  : 2.00.073.00
    * hana_hdb_vhost                    : hanavirt02
    * lpa_hdb_lpt                       : 30
    * master-rsc_SAPHana_HDB_HDB00      : 100

Migration Summary:

Tickets:

PCSD Status:
  hanahost01: Online
  hanahost02: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
```

The following table provides a list of useful commands.


| Command | Description | 
| --- | --- | 
|   `crm_mon`   |  Display cluster status on the console with updates as they occur  | 
|   `crm_mon -1`   |  Display cluster status on the console just once, and exit  | 
|   `crm_mon -Arnf`   |  -A Display node attributes -n Group resources by node -r Display inactive resources -f Display resource fail counts  | 
|   `crm help`   |  View more options  | 
|   `crm_mon --help-all`   |  View more options  | 

# Performing planned maintenance
<a name="sap-hana-pacemaker-rhel-ops-planned-maint"></a>

When performing maintenance on SAP HANA systems in a cluster environment, it’s important to understand how the cluster interacts with SAP HANA system replication. Planned maintenance activities should be conducted carefully to prevent unnecessary failovers or cluster interventions.

There are different options to perform planned maintenance on nodes, resources, and the cluster.

**Topics**
+ [Maintenance mode](#_maintenance_mode)
+ [Placing a node in standby mode](#_placing_a_node_in_standby_mode)
+ [Moving a resource](#_moving_a_resource)

## Maintenance mode
<a name="_maintenance_mode"></a>

Use maintenance mode if you want to make any changes to the configuration or take control of the resources and nodes in the cluster. In most cases, this is the safest option for administrative tasks.

**Example**  
Use the following commands to turn on maintenance mode.  

```
# pcs property maintenance-mode=true
```
Use the following command to turn off maintenance mode.  

```
# pcs property maintenance-mode=false
```

## Placing a node in standby mode
<a name="_placing_a_node_in_standby_mode"></a>

To perform maintenance on the cluster without a full system outage, the recommended method for moving active resources is to place the node you want to remove from the cluster in standby mode.

```
# pcs node standby <hostname>
```

The cluster will cleanly relocate resources, and you can perform activities, including reboots on the node in standby mode. When maintenance activities are complete, you can re-introduce the node with the following command.

```
# pcs node unstandby <hostname>
```

## Moving a resource
<a name="_moving_a_resource"></a>

When moving individual resources, be sure you understand resource dependencies and constraints. The following commands demonstrate how to force a HANA takeover. Always review the cluster status and verify any temporary location constraints afterwards.

For example:

```
# pcs resource move rsc_SAPHana_HDB_HDB00-clone hanahost02
Location constraint to move resource 'rsc_SAPHana_HDB_HDB00-clone' has been created
Waiting for the cluster to apply configuration changes...
Location constraint created to move resource 'rsc_SAPHana_HDB_HDB00-clone' has been removed
Waiting for the cluster to apply configuration changes...
resource 'rsc_SAPHana_HDB_HDB00-clone' is promoted on node 'hanahost02'; unpromoted on node 'hanahost01'
```

Note: The exact resource name will vary depending on your SAP HANA system ID and instance number. Adjust the commands accordingly.

# Post-failure analysis and reset
<a name="sap-hana-pacemaker-rhel-ops-post-failure"></a>

A review must be conducted after each failure to understand the source of failure as well the reaction of the cluster. In most scenarios, the cluster prevents an application outage. However, a manual action is often required to reset the cluster to a protective state for any subsequent failures.

**Topics**
+ [Checking the Logs](#_checking_the_logs)
+ [Cleanup pcs status](#_cleanup_pcs_status)
+ [Restart failed nodes or pacemaker](#_restart_failed_nodes_or_pacemaker)
+ [Further Analysis](#_further_analysis)

## Checking the Logs
<a name="_checking_the_logs"></a>
+ For troubleshooting cluster issues, use journalctl to examine both pacemaker and corosync logs:

  ```
  # journalctl -u pacemaker -u corosync --since "1 hour ago"
  ```
  + Use `--since` to specify time periods (e.g., "2 hours ago", "today")
  + Add `-f` to follow logs in real-time
  + Combine with grep for specific searches
+ System messages and resource agent activity can be found in `/var/log/messages`.
+ For HANA-specific issues, check the HANA trace directory. This can be reached using 'cdtrace' when logged in as <sid>adm. Also consult the DB\$1<tenantdb> directory within the HANA trace directory.

## Cleanup pcs status
<a name="_cleanup_pcs_status"></a>

If failed actions are reported using the `pcs status` command, and if they have already been investigated, then you can clear the reports with the following command.

```
# pcs resource cleanup <resource> <hostname>
```

## Restart failed nodes or pacemaker
<a name="_restart_failed_nodes_or_pacemaker"></a>

It is recommended that failed (or fenced) nodes are not automatically restarted. It gives operators a chance to investigate the failure, and ensure that the cluster doesn’t make assumptions about the state of resources.

You need to restart the instance or the pacemaker service based on your approach.

## Further Analysis
<a name="_further_analysis"></a>

For cluster-specific issues, use `pcs cluster report` to generate a targeted analysis of cluster components across all nodes:

```
# pcs cluster report --from="YYYY-MM-DD HH:MM:SS" --to="YYYY-MM-DD HH:MM:SS" /tmp/cluster-report
```

**Using pcs cluster report**
+ Specify a time range that encompasses the incident
+ The report includes logs and configuration from all nodes
+ Review the generated tarball for cluster events, resource operations, and configuration changes

# Alerting and monitoring
<a name="sap-hana-pacemaker-rhel-ops-alert-monitor"></a>

This section covers the following topics.

**Topics**
+ [Using Amazon CloudWatch Application Insights](#_using_amazon_cloudwatch_application_insights)
+ [Using the cluster alert agents](#_using_the_cluster_alert_agents)

## Using Amazon CloudWatch Application Insights
<a name="_using_amazon_cloudwatch_application_insights"></a>

For monitoring and visibility of cluster state and actions, Application Insights includes metrics for monitoring enqueue replication state, cluster metrics, and SAP and high availability checks. Additional metrics, such as EFS and CPU monitoring can also help with root cause analysis.

For more information, see [Get started with Amazon CloudWatch Application Insights](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/appinsights-getting-started.html) and [SAP HANA High Availability on Amazon EC2](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/component-configuration-examples-hana-ha.html).

## Using the cluster alert agents
<a name="_using_the_cluster_alert_agents"></a>

Within the cluster configuration, you can call an external program (an alert agent) to handle alerts. This is a *push* notification. It passes information about the event via environment variables.

The agents can then be configured to send emails, log to a file, update a monitoring system, etc. For example, the following script can be used to access Amazon SNS.

```
#!/bin/sh

# alert_sns.sh
# modified from /usr/share/pacemaker/alerts/alert_smtp.sh.sample

##############################################################################
# SETUP
# * Create an SNS Topic and subscribe email or chatbot
# * Note down the ARN for the SNS topic
# * Give the IAM Role attached to both Instances permission to publish to the SNS Topic
# * Ensure the aws cli is installed
# * Copy this file to /usr/share/pacemaker/alerts/alert_sns.sh or other location on BOTH nodes
# * Ensure the permissions allow for hacluster and root to execute the script
# * Run the following as root (modify file location if necessary and replace SNS ARN):
#
# SLES:
# crm configure alert aws_sns_alert /usr/share/pacemaker/alerts/alert_sns.sh meta timeout=30s timestamp-format="%Y-%m-%d_%H:%M:%S" to <{ arn:aws:sns:region:account-id:myPacemakerAlerts  }>
#
# RHEL:
# pcs alert create id=aws_sns_alert path=/usr/share/pacemaker/alerts/alert_sns.sh meta timeout=30s timestamp-format="%Y-%m-%d_%H:%M:%S"
# pcs alert recipient add aws_sns_alert value=arn:aws:sns:region:account-id:myPacemakerAlerts
##############################################################################

# Additional information to send with the alerts
node_name=`uname -n`
sns_body=`env | grep CRM_alert_`

# Required for SNS
TOKEN=$(/usr/bin/curl --noproxy '*' -s -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")

# Get metadata
REGION=$(/usr/bin/curl --noproxy '*' -w "\n" -s -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/dynamic/instance-identity/document | grep region | awk -F\" '{print $4}')

sns_subscription_arn=${CRM_alert_recipient}

# Format depending on alert type
case ${CRM_alert_kind} in
   node)
     sns_subject="${CRM_alert_timestamp} ${cluster_name}: Node '${CRM_alert_node}' is now '${CRM_alert_desc}'"
   ;;
   fencing)
     sns_subject="${CRM_alert_timestamp} ${cluster_name}: Fencing ${CRM_alert_desc}"
   ;;
   resource)
     if [ ${CRM_alert_interval} = "0" ]; then
         CRM_alert_interval=""
     else
         CRM_alert_interval=" (${CRM_alert_interval})"
     fi
     if [ ${CRM_alert_target_rc} = "0" ]; then
         CRM_alert_target_rc=""
     else
         CRM_alert_target_rc=" (target: ${CRM_alert_target_rc})"
     fi
     case ${CRM_alert_desc} in
         Cancelled)
           ;;
         *)
           sns_subject="${CRM_alert_timestamp}: Resource operation '${CRM_alert_task}${CRM_alert_interval}' for '${CRM_alert_rsc}' on '${CRM_alert_node}': ${CRM_alert_desc}${CRM_alert_target_rc}"
           ;;
     esac
     ;;
   attribute)
     sns_subject="${CRM_alert_timestamp}: The '${CRM_alert_attribute_name}' attribute of the '${CRM_alert_node}' node was updated in '${CRM_alert_attribute_value}'"
     ;;
   *)
     sns_subject="${CRM_alert_timestamp}: Unhandled $CRM_alert_kind alert"
     ;;
esac

# Use this information to send the email
aws sns publish --topic-arn "${sns_subscription_arn}" --subject "${sns_subject}" --message "${sns_body}" --region ${REGION}
```

# Testing
<a name="sap-hana-pacemaker-rhel-testing"></a>

We recommend scheduling regular fault scenario recovery testing at least annually, and as part of the operating system or HANA Upgrades that may impact operations. For more details on best practices for regular testing, see SAP Lens – [Best Practice 4.3 – Regularly test business continuity plans and fault recovery](https://docs.aws.amazon.com/wellarchitected/latest/sap-lens/best-practice-4-3.html).

The tests described here simulate failures. These can help you understand the behavior and operational requirements of your cluster.

In addition to checking the state of cluster resources, ensure that the service you are trying to protect is in the required state. Is client connectivity still possible? Define the recovery time to ensure that it aligns with your business objectives. Record recovery actions in runbooks.

**Topics**
+ [Test 1: Stop HANA on the primary node using `HDB kill-9`](#_test_1_stop_hana_on_the_primary_node_using_hdb_kill_9)
+ [Test 2: Simulate a hardware failure](#_test_2_simulate_a_hardware_failure)
+ [Test 3: Simulate a kernel panic](#_test_3_simulate_a_kernel_panic)
+ [Test 4: Simulate a network failure](#_test_4_simulate_a_network_failure)
+ [Test 5: Accidental shutdown](#_test_5_accidental_shutdown)
+ [Other Tests](#_other_tests)

## Test 1: Stop HANA on the primary node using `HDB kill-9`
<a name="_test_1_stop_hana_on_the_primary_node_using_hdb_kill_9"></a>

 **Why** – Tests cluster response to an immediate HANA process termination. This validates the cluster’s ability to detect and respond to critical database process failures and ensures proper failover mechanisms are working.

 **Simulate failure** – On `hanahost01` as `hdbadm`:

```
hdbadm> HDB kill-9
```

 **Expected behavior** – The cluster detects the HANA process failure and triggers immediate failover to the secondary node. The secondary node is promoted to primary, taking over the workload without attempting local recovery.

 **Recovery action** –

1. Monitor cluster status using `crm_mon -r` 

1. Verify HANA system replication status using `hdbnsutil -sr_state` 

1. If AUTOMATED\$1REGISTER is "false", manually reregister the former primary:
   + See more details on how to register the secondary in [HSR Setup](sap-hana-pacemaker-rhel-hana-setup-hsr.md) :

     ```
     hdbnsutil -sr_register --name=<site_name> --remoteHost=<primary_host> --remoteInstance=<instance_number> --mode=sync --operationMode=logreplay
     ```

## Test 2: Simulate a hardware failure
<a name="_test_2_simulate_a_hardware_failure"></a>

 **Why** – Tests cluster response to complete node failure, validating proper fencing behavior and resource failover when a node becomes completely unresponsive.

 **Notes** – The double force option (`--force --force`) is used to simulate a hardware failure as closely as possible in a test environment. This command bypasses the system manager and forces an immediate shutdown without any cleanup, similar to a power loss or hardware failure. However, it’s important to note that this is still a simulation - some OS-level cleanup may still occur that wouldn’t happen in a real hardware failure or power loss scenario.

 **Simulate failure** – On `hanahost01` as `root`:

```
# poweroff --force --force
```

 **Expected behavior** – Corosync detects the loss of node communication and Pacemaker on the surviving node initiates fencing through the fencing agent, followed by promotion of the secondary HANA instance to primary. Application connections should automatically reconnect to the new primary.

 **Recovery action** –

1. Start the powered-off Amazon EC2 instance

1. Verify cluster status using `crm_mon -r` 

1. Clean up STONITH history using `pcs stonith history cleanup` 

1. Check HANA replication status using `hdbnsutil -sr_state` 

1. If AUTOMATED\$1REGISTER is "false", manually register as secondary

1. Verify application connectivity to the new primary

## Test 3: Simulate a kernel panic
<a name="_test_3_simulate_a_kernel_panic"></a>

 **Why** – Tests cluster response to catastrophic kernel failure, ensuring proper recovery mechanisms work when a node experiences a complete system crash.

 **Notes** – To simulate a system crash, you must first ensure that `/proc/sys/kernel/sysrq` is set to 1.

 **Simulate failure** – On `hanahost01` as `root`:

```
# echo 'c' > /proc/sysrq-trigger
```

 **Expected behavior** – The cluster detects node failure through lost heartbeat. The surviving node initiates fencing through the fencing agent, followed by promotion of the secondary HANA instance to primary.

 **Recovery action** –

1. Restart the node after kernel panic

1. Verify cluster status using `crm_mon -r` 

1. Clean up STONITH history using `pcs stonith history cleanup` 

1. Check HANA replication status using `hdbnsutil -sr_state` 

1. If AUTOMATED\$1REGISTER is "false", manually register as secondary

1. Verify all cluster resources are clean

## Test 4: Simulate a network failure
<a name="_test_4_simulate_a_network_failure"></a>

 **Why** – Tests cluster behavior during network partition scenarios, ensuring split-brain prevention mechanisms work and proper fencing occurs when nodes can’t communicate.

 **Notes** –
+ Iptables must be installed
+ Use a subnet in this command because of the secondary ring
+ Check for any existing iptables rules as iptables -F will flush all rules
+ Review pcmk\$1delay and priority parameters if you see neither node survives the fence race

 **Simulate failure** – On either node as root:

```
# iptables -A INPUT -s <CIDR_of_other_subnet> -j DROP; iptables -A OUTPUT -d <CIDR_of_other_subnet> -j DROP
```

 **Expected behavior** – The cluster detects the network failure and fences one of the nodes to avoid a split-brain situation. The surviving node assumes control of cluster resources.

 **Recovery action** –

1. If the failure is simulated on the surviving node, execute `iptables -F` to clear the network failure

1. Start the EC2 node and pacemaker service

1. Verify cluster status and resource placement

## Test 5: Accidental shutdown
<a name="_test_5_accidental_shutdown"></a>

 **Why** – Tests proper handling of shutdown scenarios, ensuring the cluster manages resources appropriately during both planned and unplanned shutdowns.

 **Notes** –
+ Avoid shutdowns without cluster awareness
+ We recommend the use of systemd to ensure predictable behavior
+ Ensure the resource dependencies are in place

 **Simulate failure** – Login to AWS Management Console, and stop the instance or issue a shutdown command.

 **Expected behavior** – The node which has been shut down fails. The cluster will move the resources which were running on the failed node to the surviving node. If systemd and resource dependencies are not configured correctly, the cluster may detect an unclean stop of cluster services and fence the shutting-down instance.

 **Recovery action** –

1. Start the EC2 node and pacemaker service

1. Verify cluster status and resource placement

1. Ensure resources are properly distributed according to constraints

## Other Tests
<a name="_other_tests"></a>

Consider these additional tests based on your environment and project requirements:
+  **Secondary Node Testing** 
  + Execute previous tests on the secondary, to ensure that secondary disruptions do not impact service availability on the primary
  + Execute previous tests with the nodes in reversed roles to validate full operational capability in either configuration
+  **Scale-out Testing** (for scale-out deployments)
  + Test failures on coordinator and worker nodes
  + Test concurrent failure of multiple worker nodes to verify failover order
  + Test failures with blocked access to storage access including /hana/shared
+  **Component-Level Testing** 
  + Test index server failures and measure recovery times
  + Validate Fast Start Option behavior and hook script execution
+  **Cluster Configuration Testing** 
  + Direct fencing operations using `pcs stonith fence <node_name>` 
  + Resource movement and constraint verification

Remember to document all test results, recovery times, and any unexpected behaviors for future reference and runbook updates.