

# SAP HANA High Availability on SUSE Enterprise Linux using Pacemaker Software
<a name="sap-hana-on-aws-sles-pacemaker"></a>

**Topics**
+ [Planning](sap-hana-pacemaker-sles-planning.md)
+ [Prerequisites](sap-hana-pacemaker-sles-prerequisites.md)
+ [SAP HANA and Cluster Setup](sap-hana-pacemaker-sles-deployment-cluster.md)
+ [Operations](sap-hana-pacemaker-operations.md)
+ [Testing](sap-hana-pacemaker-sles-testing.md)

# Planning
<a name="sap-hana-pacemaker-sles-planning"></a>

Review the following prerequisites carefully before beginning your high availability cluster deployment, ensuring all infrastructure, operating system, and access requirements are met. Familiarize yourself with linked references, supported configurations and the core concepts which are used in this solution.

**Topics**
+ [Setup Overview](sap-hana-pacemaker-sles-setup-overview.md)
+ [Vendor Support](sap-hana-pacemaker-sles-references.md)
+ [Concepts](sap-hana-pacemaker-sles-concepts.md)
+ [Automated Deployment](sap-hana-pacemaker-sles-automation.md)
+ [Parameter Reference](sap-hana-pacemaker-sles-parameters.md)
+ [Architecture Diagrams](sap-hana-pacemaker-sles-arch-diagrams.md)

# Setup Overview
<a name="sap-hana-pacemaker-sles-setup-overview"></a>

## Deployed Cluster Infrastructure
<a name="_deployed_cluster_infrastructure"></a>

Ensure that your AWS networking requirements and Amazon EC2 instances where SAP workloads are installed, are correctly configured for SAP.

The following SAP HANA cluster specific requirements must be met:
+ Two cluster nodes created in private subnets in separate Availability Zones within the same Amazon VPC and AWS Region.
+ Access to the route table(s) that are associated with the chosen subnets. For more information, see [Overlay IP](sap-hana-pacemaker-sles-concepts.md#overlay-ip-sles).
+ Targeted Amazon EC2 instances must have connectivity to the Amazon EC2 endpoint via internet or an Amazon VPC endpoint.

## Supported Operating System
<a name="_supported_operating_system"></a>

Protecting the SAP HANA Database with a pacemaker cluster requires packages from SUSE, including cluster resource agents for SAP and AWS that are not available in standard repositories.

For deploying SAP HANA on SUSE, SAP and SUSE recommend using SUSE Linux Enterprise Server for SAP applications (SLES for SAP). SLES for SAP provides additional benefits, including:
+ Extended Service Pack Overlap Support (ESPOS)
+ Configuration and tuning packages for SAP applications
+ High Availability Extensions (HAE)

To learn more, see [SUSE Linux Enterprise Server for SAP Applications](https://www.suse.com/products/sles-for-sap/).

SLES for SAP is available at AWS Marketplace with:
+ Hourly subscription
+ Annual subscription
+ Bring Your Own Subscription (BYOS) mode

## Required Access for Setup
<a name="_required_access_for_setup"></a>

The following access is required for setting up the cluster:

An IAM user with the following privileges:
+ Modify Amazon VPC route tables
+ Modify Amazon EC2 instance properties
+ Create IAM policies and roles
+ Create Amazon EFS file systems

Additional required access:
+ Root access to the operating system of both cluster nodes
+ SAP HANA administrative user access – <sid>adm
+ SAP HANA SystemDB Administrative access for changing configuration and backup administration.

**Example**  
These access requirements are specific to the cluster setup process and can be restricted for ongoing cluster operations and maintenance.

## Reliability Requirements Defined
<a name="_reliability_requirements_defined"></a>

The SAP Lens of the Well-Architected framework, in particular the Reliability pillar, can be used to understand the reliability requirements for your SAP workload.

The SAP HANA application is a single point of failure in a highly available SAP architecture. The impact of an outage of this component must be evaluated against factors, such as, recovery point objective (RPO), recovery time objective (RTO), cost and operation complexity. For more information, see [Reliability in SAP Lens - AWS Well-Architected Framework](https://docs.aws.amazon.com/wellarchitected/latest/sap-lens/reliability.html).

# Vendor Support
<a name="sap-hana-pacemaker-sles-references"></a>

## SAP and SUSE References
<a name="_sap_and_suse_references"></a>

In addition to this guide, see the following references for more details:
+ SUSE Documentation: [SLES for SAP - SAP HANA High Availability Cluster for the AWS Cloud](https://documentation.suse.com/en-us/sbp/sap-15/html/SLES4SAP-hana-sr-guide-perfopt-15-aws/index.html) 
+ SUSE Documentation: [An overview of supported High Availability Solutions by SLES for SAP applications](https://documentation.suse.com/en-us/sles-sap/sap-ha-support/html/sap-ha-support/index.html) 
+ SAP Note: [1656099 - SAP Applications on AWS: Supported DB/OS and Amazon EC2 products](https://me.sap.com/notes/1656099) 
+ SAP Note: [1984787 - SUSE Linux Enterprise Server 12: Installation Notes](https://me.sap.com/notes/1984787) 
+ SAP Note: [2205917 - SAP HANA DB: Recommended OS settings for SLES 12 / SLES for SAP Applications 12](https://me.sap.com/notes/2205917) 
+ SAP Note: [2578899 - SUSE Linux Enterprise Server 15: Installation Notes](https://me.sap.com/notes/2578899) 
+ SAP Note: [2684254 - SAP HANA DB: Recommended OS settings for SLES 15 / SLES for SAP Applications 15](https://me.sap.com/notes/2684254) 
+ SAP Note: [1275776 - Linux: Preparing SLES for SAP environments](https://me.sap.com/notes/1275776) 

**Note**  
SAP portal access is required to access SAP Notes.

## Deployment Guidance
<a name="deployments-sles"></a>

 AWS works in collaboration with SUSE to support SAP HANA deployments on AWS. AWS provides detailed guidance on configuring EC2 instances and AWS-specific resources to meet SAP HANA requirements. While we strive to consolidate documentation to simplify the user experience, the underlying software components and resources owned by Pacemaker remain under the purview of the software vendor for development and support.


| SAP HANA Deployment Type | Support Status | Notes |  AWS Configuration Patterns | 
| --- | --- | --- | --- | 
|  SAP HANA Scale-Up Standard  |   AWS Documented & Supported  |  Covered in AWS SAP HANA guides  |  SAPHANAScaleUp-Classic, SAPHANAScaleUp-ANGI  | 
|  SAP HANA Scale-Up Secondary Read-Enabled  |  Vendor Documented & Supported  |  Follows SAP documentation  |  | 
|  SAP HANA Scale-Up Multi-Tier Replication  |  Vendor Documented & Supported  |  Follows SAP documentation  |  | 
|  SAP HANA Scale-Up Multi-Target Replication  |  Vendor Documented & Supported  |  Follows SAP documentation  |  | 
|  SAP HANA Scale-Out Standard  |   AWS Documented & Supported  |  Covered in AWS SAP HANA guides  |  SAPHANAScaleOut-Classic, SAPHANAScaleOut-ANGI  | 
|  SAP HANA Scale-Out Secondary Read-Enabled  |  Vendor Documented & Supported  |  Follows SAP documentation  |  | 
|  SAP HANA Scale-Out Multi-Tier Replication  |  Vendor Documented & Supported  |  Follows SAP documentation  |  | 
|  SAP HANA Scale-Out Multi-Target Replication  |  Vendor Documented & Supported  |  Follows SAP documentation  |  | 

**Note**  
 AWS configuration patterns represent standardized deployment templates that have been validated for specific use cases. In the documentation we will highlight where instructions deviate according to the configuration pattern.

**What is Angi?**  
SAPHanaSR-angi (SAP HANA SR - Advanced Next Generation Interface) is the latest unified high availability solution for managing SAP HANA System Replication in Pacemaker clusters, supported on SLES-for-SAP 15 SP4 and newer. The solution consolidates the management of both scale-up and scale-out deployments into a single package and introduces technical improvements such as faster takeover times during filesystem failures, unresponsive HANA instances, and node failures in scale-out configurations.

This document covers new implementations using SAPHanaSR-angi. For migrations from existing SAPHanaSR or SAPHanaSR-ScaleOut installations to SAPHanaSR-angi, refer to the SUSE documentation for detailed upgrade procedures.

# Concepts
<a name="sap-hana-pacemaker-sles-concepts"></a>

## SAP – SAP HANA and Hana System Replication
<a name="_sap_sap_hana_and_hana_system_replication"></a>

SAP HANA is an in-memory, column-oriented, relational database management system developed by SAP. It uses HANA System Replication (HSR) to replicate data and changes from a primary system to one or more secondary systems. In scale-out deployments, this replication occurs between corresponding nodes across the primary and secondary systems, with each service having its counterpart in the secondary system. HSR ensures changes are continuously replicated to minimize the Recovery Point Objective (RPO). While takeovers can be manually triggered using HANA tooling, the addition of a Pacemaker cluster automates the failover process through monitoring, orchestration, and integration with resource agents for hardware connectivity and management.

## AWS – Availability Zones
<a name="shared_aws_availability_zones"></a>

An Availability Zone is one or more discreet data centers with redundant power, networking, and connectivity in an AWS Region. For more information, see Regions and Availability Zones.

For mission critical deployments of SAP on AWS where the goal is to minimise the recovery time objective (RTO), we suggest distributing single points of failure across Availability Zones. Compared with single instance or single Availability Zone deployments, this increases resilience and isolation against a broad range of failure scenarios and issues, including natural disasters.

Each Availability Zone is physically separated by a meaningful distance (many kilometers) from another Availability Zone. All Availability Zones in an AWS Region re interconnected with high-bandwidth, low-latency network, over fully redundant, dedicated metro fiber. This enables synchronous replication. All traffic between Availability Zones is encrypted.

## AWS – Overlay IP
<a name="overlay-ip-sles"></a>

An Overlay IP enables a connection to the application, regardless of which Availability Zone (and subnet) contains the active primary node.

When deploying an Amazon EC2 instance in AWS, IP addresses are allocated from the CIDR range of the allocated subnet. The subnet cannot span across multiple Availability Zones, and therefore the subnet IP addresses may be unavailable after faults, including network connectivity or hardware issues which require a failover to the replication target in a different Availability Zone.

To address this, we suggest that you configure an overlay IP, and use this in the connection parameters for the application. This IP address is a non-overlapping RFC1918 private IP address from outside of VPC CIDR block and is configured as an entry in the route table or tables. The route directs the connection to the active node and is updated during a failover by the cluster software.

You can select any one of the following RFC1918 private IP addresses for your overlay IP address:
+ 10.0.0.0 – 10.255.255.255 (10/8 prefix)
+ 172.16.0.0 – 172.31.255.255 (172.16/12 prefix)
+ 192.168.0.0 – 192.168.255.255 (192.168/16 prefix)

If, for example, you use the 10/8 prefix in your SAP VPC, selecting a 172 or a 192 IP address may help to differentiate the overlay IP. Consider the use of an IP Address Management (IPAM) tool such as Amazon VPC IP Address Manager to plan, track, and monitor IP addresses for your AWS workloads. For more information, see [What is IPAM?](https://docs.aws.amazon.com/vpc/latest/ipam/what-it-is-ipam.html) 

The overlay IP agent in the cluster can also be configured to update multiple route tables which contain the Overlay IP entry if your subnet association or connectivity requires it.

### Access to the Overlay IP
<a name="_access_to_the_overlay_ip"></a>

The overlay IP is outside of the range of the VPC, and therefore cannot be reached from locations that are not associated with the route table, including on-premises and other VPCs.

Use AWS Transit Gateway as a central hub to facilitate the network connection to an overlay IP address from multiple locations, including Amazon VPCs, other AWS Regions, and on-premises using AWS Direct Connect or AWS Client VPN.

If you do not have AWS Transit Gateway set up as a network transit hub or if it is not available in your preferred AWS Region, you can use a Network Load Balancer to enable network access to an overlay IP.

For more information, see [SAP on AWS High Availability Setup](sap-oip-sap-on-aws-high-availability-setup.md).

## AWS – Shared VPC
<a name="shared_aws_shared_vpc"></a>

An enterprise landing zone setup or security requirements may require the use of a separate cluster account to restrict the route table access required for the Overlay IP to an isolated account. For more information, see [Share your VPC with other accounts](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-sharing.html).

Evaluate the operational impact against your security posture before setting up shared VPC.

## Pacemaker - STONITH Fencing Agent
<a name="fencing-sles"></a>

In SAP HANA deployments, whether in a scale-up configuration (two-node) or a scale-out configuration (two or more nodes per site), it is crucial that data consistency is maintained by ensuring only the designated primary node or nodes can process write operations at any given time. When a node becomes unresponsive or incommunicable, maintaining data consistency may require that the faulty node is isolated by powering it down before the cluster commences other actions, such as promoting a new primary. This arbitration is the role of the fencing agent.

In a two-node scale-up scenario, fence racing is a critical concern. This occurs when a communication failure causes both nodes to simultaneously attempt to fence (power off) each other, believing the other node has failed. The fencing agent addresses this risk by providing an external witness. In scale-out deployments, while fence racing is less likely due to the presence of multiple nodes that can participate in quorum decisions, proper fencing remains critical for maintaining data consistency across the larger node set.

SUSE supports several fencing agents, including the one recommended for use with Amazon EC2 Instances (external/ec2).

# Automated Deployment
<a name="sap-hana-pacemaker-sles-automation"></a>

You can set up a cluster manually using the instructions provided here. You can also automate parts of this process to ensure consistent and repeatable deployments.

Use AWS Launch Wizard for SAP to automated deployments of SAP Hana Platform, SAP NetWeaver, SAP S/4 HANA, SAP BW/4HANA, and Solution Manager. Launch Wizard uses AWS CloudFormation templates and advanced scripts to quickly provision the required resources. The automation handles SAP HANA Installation, HANA System Replication and Pacemaker setup, requiring only post-deployment validation and testing. For more information, see [AWS Launch Wizard for SAP](https://docs.aws.amazon.com/launchwizard/latest/userguide/launch-wizard-sap.html).

**Important**  
For reliable cluster operations, thoroughly test your system regardless of setup method. Testing helps identify system anomalies, validate changing requirements, and build operational understanding. See [Testing](sap-hana-pacemaker-sles-testing.md) for more details.

# Parameter Reference
<a name="sap-hana-pacemaker-sles-parameters"></a>

The cluster setup uses parameters, including SID and System Number that are unique to your setup. It is useful to predetermine the values with the following examples and guidance.

**Topics**
+ [Global AWS Parameters](#global_shared_aws_parameters)
+ [Amazon EC2 Instance Parameters](#_amazon_ec2_instance_parameters)
+ [SAP and Pacemaker Resource Parameters](#_sap_and_pacemaker_resource_parameters)
+ [SLES Cluster Parameters](#_sles_cluster_parameters)

## Global AWS Parameters
<a name="global_shared_aws_parameters"></a>


| Name | Parameter | Example | 
| --- | --- | --- | 
|   AWS account ID  |   `<account_id>`   |   `123456789100`   | 
|   AWS Region  |   `<region>`   |   `us-east-1`   | 
+  AWS account – For more details, see [Your AWS account ID and its alias](https://docs.aws.amazon.com/IAM/latest/UserGuide/console-account-alias.html).
+  AWS Region – For more details, see [Describe your Regions](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html#concepts-availability-zones).

## Amazon EC2 Instance Parameters
<a name="_amazon_ec2_instance_parameters"></a>


| Name | Parameter | Primary example | Secondary example | 
| --- | --- | --- | --- | 
|  Amazon EC2 instance ID  |   `<instance_id_x>`   |   `i-xxxxinstidforhost1`   |   `i-xxxxinstidforhost2`   | 
|  Hostname  |   `<hostname_x>`   |   `hanahost01`   |   `hanahost02`   | 
|  Host IP  |   `<host_ip_x>`   |   `10.1.20.1`   |   `10.2.20.1`   | 
|  Host additional IP  |   `<host_additional_ip_x>`   |   `10.1.20.2`   |   `10.2.20.2`   | 
|  Configured subnet  |   `<subnet_id>`   |   `subnet-xxxxxxxxxxsubnet1`   |   `subnet-xxxxxxxxxxsubnet2`   | 
+ Hostnames must comply with SAP requirements outlined in [SAP Note 611361 - Hostnames of SAP ABAP Platform servers](https://me.sap.com/notes/611361) (requires SAP portal access).
+ Run the following command on your instances to retrieve the hostname:

  ```
  $ hostname
  ```
+ Amazon EC2 instance ID – run the following command (IMDSv2 compatible) on your instances to retrieve instance metadata:

  ```
  $ /usr/bin/curl --noproxy '*' -w "\n" -s -H "X-aws-ec2-metadata-token: $(curl --noproxy '*' -s -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")" http://169.254.169.254/latest/meta-data/instance-id
  ```

  For more details, see [Retrieve instance metadata](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html) and [Instance identity documents](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-identity-documents.html).

 **For Scale-out Deployments** 


| Role | Primary Coordinator | Primary Worker | Primary Worker | Secondary Coordinator | Secondary Worker | Secondary Worker | Majority Maker | 
| --- | --- | --- | --- | --- | --- | --- | --- | 
|  Hostname  |   `hanahost01`   |   `hanahostworker01a`   |   `hanahostworker01b`   |   `hanahost02`   |   `hanahostworker02a`   |   `hanahostworker02b`   |   `hanamm`   | 
|  Subnet  |   `subnet-xxxxxxxxxxsubnet1`   |   `subnet-xxxxxxxxxxsubnet1`   |   `subnet-xxxxxxxxxxsubnet1`   |   `subnet-xxxxxxxxxxsubnet2`   |   `subnet-xxxxxxxxxxsubnet2`   |   `subnet-xxxxxxxxxxsubnet2`   |   `subnet-xxxxxxxxxxsubnet3`   | 
+ Example for a 6 node cluster with a majority maker
+ The majority maker can use minimal resources as it only provides cluster quorum functionality

## SAP and Pacemaker Resource Parameters
<a name="_sap_and_pacemaker_resource_parameters"></a>


| Name | Parameter | Example | 
| --- | --- | --- | 
|  SAP HANA SID  |   `<SID>` or `<sid>`   |   `HDB`   | 
|  SAP HANA System Number  |   `<hana_sys_nr>`   |   `00`   | 
|  SAP HANA Virtual Hostname  |   `<hana_virt_hostname>`   |   `hanahdb`   | 
|  SAP HANA Overlay IP  |   `<hana_overlayip>`   |   `172.16.52.1`   | 
|  SAP HANA Read Enabled Overlay IP (optional)  |   `<readenabled_overlayip>`   |   `172.16.52.2`   | 
|  VPC Route Tables  |   `<routetable_id>`   |   `rtb-xxxxxroutetable1`   | 
+ SAP details – SAP parameters, including SID and instance number must follow the guidance and limitations of SAP and Software Provisioning Manager. Refer to [SAP Note 1979280 - Reserved SAP System Identifiers (SAPSID) with Software Provisioning Manager](https://me.sap.com/notes/1979280) for more details.
+ Post-installation, use the following command to find the details of the instances running on a host:

  ```
  $ sudo /usr/sap/hostctrl/exe/saphostctrl -function ListInstances
  ```
+ Overlay IP – This value is defined by you. For more information, see [Overlay IP](sap-hana-pacemaker-sles-concepts.md#overlay-ip-sles).

## SLES Cluster Parameters
<a name="_sles_cluster_parameters"></a>


| Name | Parameter | Example | 
| --- | --- | --- | 
|  Cluster user  |   `<cluster_user>`   |   `hacluster`   | 
|  Cluster password  |   `<cluster_password>`   |  | 
|  Cluster name  |   `<cluster_name>`   |   `myCluster`   | 
|  Cluster tag  |   `<cluster_tag>`   |   `pacemaker`   | 
|   AWS CLI cluster profile  |   `<cli_cluster_profile>`   |   `cluster`   | 
+ Cluster user – Installing cluster packages will create the user hacluster, set a password to this account to ensure that the cluster can perform the tasks which do not require root access.
+ Cluster tag – This tag is used by the AWS Stonith agent to ensure it is able to identify the correct Amazon EC2 Instances to fence. The name of the tag is customisable, and should be unique across your AWS account for this cluster pair.
+  AWS CLI cluster profile – It is possible to define a named profile for use with Cluster API calls distinct from other use of the CLI. Each profile can specify different credentials and can also specify different AWS Regions and output formats.

# Architecture Diagrams
<a name="sap-hana-pacemaker-sles-arch-diagrams"></a>

## Pacemaker - Scale-Up Architecture
<a name="_pacemaker_scale_up_architecture"></a>

![\[SAP Hana Pacemaker SUSE Enterprise Linux Scale-Up\]](http://docs.aws.amazon.com/sap/latest/sap-hana/images/sap-hana-pacemaker-sles-scaleup.png)


# Prerequisites
<a name="sap-hana-pacemaker-sles-prerequisites"></a>

**Topics**
+ [AWS Infrastructure Setup](sap-hana-pacemaker-sles-infra-setup.md)
+ [EC2 Instance Configuration](sap-hana-pacemaker-sles-ec2-configuration.md)
+ [Operating System Requirements](sap-hana-pacemaker-sles-os-settings.md)

# AWS Infrastructure Setup
<a name="sap-hana-pacemaker-sles-infra-setup"></a>

This section covers the one-time setup tasks required to prepare your AWS environment for the cluster deployment:

**Topics**
+ [Create IAM Roles and Policies for Pacemaker](#iam_roles_sles)
+ [Modify Security Groups for Cluster Communication](#sg-sles)
+ [Add VPC Route Table Entries for Overlay IPs](#rt-sles)

## Create IAM Roles and Policies for Pacemaker
<a name="iam_roles_sles"></a>

In addition to the permissions required for standard SAP operations, two IAM policies are required for the cluster to control AWS resources. These policies must be assigned to your Amazon EC2 instance using an IAM role. This enables Amazon EC2 instance, and therefore the cluster to call AWS services.

**Note**  
Create policies with least-privilege permissions, granting access to only the specific resources that are required within the cluster. For multiple clusters, you may need to create multiple policies.

For more information, see [IAM roles for Amazon EC2](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html#ec2-instance-profile).

### STONITH Policy
<a name="stonith_policy"></a>

The SLES STONITH resource agent (external/ec2) requires permission to start and stop both the nodes of the cluster. Create a policy as shown in the following example. Attach this policy to the IAM role assigned to both Amazon EC2 instances in the cluster.

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeInstances",
        "ec2:DescribeTags"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "ec2:StartInstances",
        "ec2:StopInstances"
      ],
      "Resource": [
        "arn:aws:ec2:us-east-1:123456789012:instance/arn:aws:ec2:us-east-1:123456789012:instance/i-1234567890abcdef0",
        "arn:aws:ec2:us-east-1:123456789012:instance/arn:aws:ec2:us-east-1:123456789012:instance/i-1234567890abcdef0"
      ]
    }
  ]
}
```

### AWS Overlay IP Policy
<a name="overlay_policy"></a>

The SLES Overlay IP resource agent (aws-vpc-move-ip) requires permission to modify a routing entry in route tables. Create a policy as shown in the following example. Attach this policy to the IAM role assigned to both Amazon EC2 instances in the cluster.

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "ec2:ReplaceRoute",
            "Resource": [
                 "arn:aws:ec2:us-east-1:123456789012:route-table/rtb-0123456789abcdef0",
                 "arn:aws:ec2:us-east-1:123456789012:route-table/rtb-0123456789abcdef0"
                        ]
        },
        {
            "Effect": "Allow",
            "Action": "ec2:DescribeRouteTables",
            "Resource": "*"
        }
    ]
}
```

### Shared VPC (optional)
<a name="_shared_vpc_optional"></a>

**Note**  
The following directions are only required for setups which include a Shared VPC.

Amazon VPC sharing enables you to share subnets with other AWS accounts within the same AWS Organizations. Amazon EC2 instances can be deployed using the subnets of the shared Amazon VPC.

In the pacemaker cluster, the aws-vpc-move-ip resource agent has been enhanced to support a shared VPC setup while maintaining backward compatibility with previous existing features.

The following checks and changes are required. We refer to the AWS account that owns Amazon VPC as the sharing VPC account, and to the consumer account where the cluster nodes are going to be deployed as the cluster account.

**Minimum Version Requirements**  
The latest version of the aws-vpc-move-ip agent shipped with SLES15 SP3 supports the shared VPC setup by default. The following are the minimum version required to support a shared VPC Setup:
+ SLES 12 SP5 - resource-agents-4.3.018.a7fb5035-3.79.1.x86\$164
+ SLES 15 SP2 - resource-agents-4.4.0\$1git57.70549516-3.30.1.x86\$164
+ SLES 15 SP3 - resource-agents-4.8.0\$1git30.d0077df0-8.5.1

**IAM Roles and Policies**  
Using the Overlay IP agent with a shared Amazon VPC requires a different set of IAM permissions to be granted on both AWS accounts (sharing VPC account and cluster account).

**Sharing VPC Account**  
In sharing VPC account, create an IAM role to delegate permissions to the EC2 instances that will be part of the cluster. During the IAM Role creation, select "Another AWS account" as the type of trusted entity, and enter the AWS account ID where the EC2 instances will be deployed/running from.

After the IAM role has been created, create the following IAM policy on the sharing VPC account, and attach it to an IAM role. Add or remove route table entries as needed.

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "VisualEditor0",
      "Effect": "Allow",
      "Action": "ec2:ReplaceRoute",
      "Resource": [
        "arn:aws:ec2:us-east-1:123456789012:route-table/rtb-0123456789abcdef0",
        "arn:aws:ec2:us-east-1:123456789012:route-table/rtb-0123456789abcdef0"
      ]
    },
    {
      "Sid": "VisualEditor1",
      "Effect": "Allow",
      "Action": "ec2:DescribeRouteTables",
      "Resource": "*"
    }
  ]
}
```

Next, edit move to the "Trust relationships" tab in the IAM role, and ensure that the AWS account you entered while creating the role has been correctly added.

In cluster account, create the following IAM policy, and attach it to an IAM role. This is the IAM Role that is going to be attached to the EC2 instances.

 **STS Policy** 

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "VisualEditor0",
      "Effect": "Allow",
      "Action": "sts:AssumeRole",
      "Resource": "arn:aws:iam::123456789012:role/sharing-vpc-account-cluster-role"
    }
  ]
}
```

 **STONITH Policy** 

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "VisualEditor0",
      "Effect": "Allow",
      "Action": [
        "ec2:StartInstances",
        "ec2:StopInstances"
      ],
      "Resource": [
        "arn:aws:ec2:us-east-1:123456789012:instance/arn:aws:ec2:us-east-1:123456789012:instance/i-1234567890abcdef0",
        "arn:aws:ec2:us-east-1:123456789012:instance/arn:aws:ec2:us-east-1:123456789012:instance/i-1234567890abcdef0"
      ]
    },
    {
      "Sid": "VisualEditor1",
      "Effect": "Allow",
      "Action": "ec2:DescribeInstances",
      "Resource": "*"
    }
  ]
}
```

## Modify Security Groups for Cluster Communication
<a name="sg-sles"></a>

A security group controls the traffic that is allowed to reach and leave the resources that it is associated with. For more information, see [Control traffic to your AWS resources using security groups](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-security-groups.html).

In addition to the standard ports required to access SAP and administrative functions, the following rules must be applied to the security groups assigned to all Amazon EC2 instances in the cluster.


| Source | Protocol | Port range | Description | 
| --- | --- | --- | --- | 
|  The security group ID (its own resource ID)  |  UDP  |  5405  |  Allows UDP traffic between cluster resources for corosync communication  | 
|  Bastion host security group or CIDR range for administration  |  TCP  |  7630  |  (optional) Used for SLES Hawk2 Interface for monitoring and administration using a Web Interface. For more details, see SUSE documentation [Configuring and Managing Cluster Resources with Hawk2](https://documentation.suse.com/sle-ha/15-SP6/html/SLE-HA-all/cha-ha-manage-resources.html#sec-conf-hawk2-manage-edit).  | 
+ Note the use of the `UDP` protocol.
+ If you are running a local firewall, such as iptables, ensure that communication on the preceding ports is allowed between two Amazon EC2 instances.

## Add VPC Route Table Entries for Overlay IPs
<a name="rt-sles"></a>

You need to add initial route table entries for the Overlay IP. For more information on Overlay IP, see [Overlay IP Concept](sap-hana-pacemaker-sles-concepts.md#overlay-ip-sles) 

Add entries to the VPC route table or tables associated with the subnets of your Amazon EC2 instance for the cluster. The entries for destination (Overlay IP CIDR) and target (Amazon EC2 instance or ENI) must be added manually for the SAP HANA Primary Database node. This ensures that the cluster resource has a route to modify. It also supports the install of SAP using the virtual names associated with the Overlay IP before the configuration of the cluster.

Using either the Amazon VPC console, or an AWS CLI command add a route to the table or tables for the Overlay IP.

------
#### [  AWS Console ]

1. Open the Amazon VPC console at https://console.aws.amazon.com/vpc/.

1. In the navigation pane, choose **Route Tables**, then select the route table associated with your cluster node subnets.

1. Choose **Actions** → **Edit routes**.

1. Choose **Add route** and configure the HANA route:    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sap/latest/sap-hana/sap-hana-pacemaker-sles-infra-setup.html)

1. (Optional) Add a route for read-enabled access to the secondary:    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sap/latest/sap-hana/sap-hana-pacemaker-sles-infra-setup.html)

1. Choose **Save changes**.

   Your route table now includes entries for required Overlay IPs, in addition to the standard routes.

------
#### [  AWS CLI ]

The preceding steps can also be performed programmatically. We suggest performing the steps using administrative privileges, instead of instance-based privileges to preserve least privilege. CreateRoute API isn’t necessary for ongoing operations.

For example:

```
$ aws ec2 create-route --route-table-id <routetable_id> --destination-cidr-block <hana_overlayip>/32 --instance-id <instance_id_1>
```

If required for read enabled access

```
$ aws ec2 create-route --route-table-id <routetable_id> --destination-cidr-block <readenabled_overlayip>/32 --instance-id <instance_id_2>
```

------

# EC2 Instance Configuration
<a name="sap-hana-pacemaker-sles-ec2-configuration"></a>

Amazon EC2 instance settings can be applied using Infrastructure as Code or manually using AWS Command Line Interface or AWS Console. We recommend Infrastructure as Code automation to reduce manual steps, and ensure consistency.

**Topics**
+ [Assign or Review Pacemaker IAM Role](#_assign_or_review_pacemaker_iam_role)
+ [Assign or Review Security Groups](#_assign_or_review_security_groups)
+ [Assign Secondary IP Addresses](#_assign_secondary_ip_addresses)
+ [Disable Source/Destination Check](#source_dest)
+ [Review Stop Protection](#stop_protection)
+ [Review Automatic Recovery](#auto_recovery)
+ [Create Amazon EC2 Resource Tags Used by Amazon EC2 STONITH Agent](#create-cluster-tags)

**Important**  
The following configurations must be performed on all cluster nodes. Ensure consistency across nodes to prevent cluster issues.

## Assign or Review Pacemaker IAM Role
<a name="_assign_or_review_pacemaker_iam_role"></a>

The two cluster resource IAM policies must be assigned to an IAM role associated with your Amazon EC2 instance. If an IAM role is not associated to your instance, create a new IAM role for cluster operations.

1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.

1. Select one of your cluster nodes.

1. In the navigation pane, choose **Actions** → **Security** → **Modify IAM role**.

1. Choose the IAM role that contains the policies created in [Create IAM Roles and Policies for Pacemaker](sap-hana-pacemaker-sles-infra-setup.md#iam_roles_sles).

1. Choose **Update IAM role**.

1. Repeat these steps for all nodes in the cluster.

## Assign or Review Security Groups
<a name="_assign_or_review_security_groups"></a>

The security group rules created in the AWS [Modify Security Groups for Cluster Communication](sap-hana-pacemaker-sles-infra-setup.md#sg-sles) section must be assigned to your Amazon EC2 instances. If a security group is not associated with your instance, or if the required rules are not present in the assigned security group, add the security group or update the rules.

1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.

1. Select one of your cluster nodes.

1. In the **Security** tab, review the security groups, ports, and source of traffic.

1. If required, choose **Actions** → **Security** → **Change security groups**.

1. Under **Associated security groups**, search for and select the required groups.

1. Choose **Save**.

1. Repeat these steps for all nodes in the cluster.

You can verify the security group rules on your instances using the AWS CLI:

```
$ aws ec2 describe-instance-attribute --instance-id <instance_id> --attribute groupSet
```

## Assign Secondary IP Addresses
<a name="_assign_secondary_ip_addresses"></a>

Secondary IP addresses are used to create a redundant communication channel (secondary ring) in corosync for clusters. The cluster nodes can use the secondary ring to communicate in case of underlying network disruptions.

These IPs are only used in cluster configurations. The secondary IPs provide the same fault tolerance as a secondary Elastic Network Interface (ENI). For more information, see [Secondary IP addresses for your EC2 Instance](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-secondary-ip-addresses.html).

You can verify the secondary IP configuration on your instances using the AWS CLI:

```
$ aws ec2 describe-instances --instance-id <instance_id> \
    --query 'Reservations[*].Instances[*].NetworkInterfaces[*].PrivateIpAddresses[*].PrivateIpAddress' \
    --output text
```

Verify that:
+ Each instance returns two IP addresses from the same subnet
+ The primary network interface (eth0) has both IPs assigned
+ The secondary IPs will be used later for ring0\$1addr and ring1\$1addr in corosync.conf

## Disable Source/Destination Check
<a name="source_dest"></a>

Amazon EC2 instances perform source/destination checks by default, requiring that an instance is either the source or the destination of any traffic it sends or receives. In the pacemaker cluster, source/destination check must be disabled on both instances receiving traffic from the Overlay IP.

The following AWS Console or AWS CLI commands can be used to modify the attribute.

------
#### [  AWS Console ]

1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.

1. Select one of your cluster nodes.

1. In the navigation pane, choose **Actions** → **Networking** → **Change source/destination check**.

1. For Source/Destination Checking, choose **Stop** to allow traffic when the source or destination is not the instance itself.

1. Repeat these steps for all nodes in the cluster.

------
#### [  AWS CLI ]

To modify using the AWS CLI (requires appropriate configuration permissions):

```
$ aws ec2 modify-instance-attribute --instance-id <instance_id> --no-source-dest-check
```

Repeat for all nodes in the cluster.

------

To confirm the value of an attribute for a particular instance, use the following command. The value `false` means source/destination checking is disabled

```
$ aws ec2 describe-instance-attribute --instance-id <instance_id> --attribute sourceDestCheck
```

The output

```
{
    "InstanceId": "i-xxxxinstidforhost1",
    "SourceDestCheck": {
        "Value": false
    }
}
```

## Review Stop Protection
<a name="stop_protection"></a>

To ensure that STONITH actions can be executed, you must ensure that stop protection is disabled for Amazon EC2 instances that are part of a pacemaker cluster. If the default settings have been modified, use the following commands for both instances to disable stop protection via AWS CLI.

The following AWS Console or CLI commands can be used to modify the attribute.

------
#### [  AWS Console ]

1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.

1. Select one of your cluster nodes.

1. Choose **Actions** → **Instance settings** → **Change stop protection**.

1. Ensure **Stop protection** is not enabled.

1. Repeat these steps for all nodes in the cluster.

------
#### [  AWS CLI ]

To modify using the AWS CLI (requires appropriate configuration permissions):

```
$ aws ec2 modify-instance-attribute --instance-id <instance_id> --no-disable-api-stop
```

Repeat this command for all nodes in the cluster.

------

To confirm the value of an attribute for a particular instance, use the following command. The value `false` means it is possible to stop the instance using an AWS CLI.

```
$ aws ec2 describe-instance-attribute --instance-id <instance_id> --attribute disableApiStop
```

The output

```
{
    "InstanceId": "i-xxxxinstidforhost1",
    "DisableApiStop": {
        "Value": false
    }
}
```

## Review Automatic Recovery
<a name="auto_recovery"></a>

After a failure, cluster-controlled operations must be resumed in a coordinated way. This helps ensure that the cause of failure is known and addressed, and the status of the cluster is as expected. For example, verifying that there are no pending fencing actions.

The following AWS Console or CLI commands can be used to modify the attribute.

------
#### [  AWS Console ]

1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.

1. Select one of your cluster nodes.

1. Choose **Actions** → **Instance settings** → **Change auto-recovery behavior**.

1. Select **Off** to disable auto-recovery for system status check failures.

1. Repeat these steps for all nodes in the cluster.

------
#### [  AWS CLI ]

To modify auto-recovery settings (requires appropriate configuration permissions):

```
$ aws ec2 modify-instance-maintenance-options --instance-id <instance_id> --auto-recovery disabled
```

Repeat this command for all nodes in the cluster.

------

To confirm the value of an attribute for a particular instance, use the following command. The value `disabled` means autorecovery will not be attempted.

```
$ aws ec2 describe-instances --instance-ids <instance_id> --query 'Reservations[*].Instances[*].MaintenanceOptions.AutoRecovery'
```

The output:

```
[
    [
        "disabled"
    ]
]
```

## Create Amazon EC2 Resource Tags Used by Amazon EC2 STONITH Agent
<a name="create-cluster-tags"></a>

Amazon EC2 STONITH agent uses AWS resource tags to identify Amazon EC2 instances. Create tag for the primary and secondary Amazon EC2 instances via AWS Console or AWS CLI. For more information, see [Using Tags](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Using_Tags.html).

Use the same tag key and the local hostname returned using the command hostname across instances. For example, a configuration with the values defined in Global AWS parameters would require the tags shown in the following table.


| Amazon EC2 | Key example | Value example | 
| --- | --- | --- | 
|  <instance\$1id>  |  <cluster\$1tag>  |  <hostname>  | 
|  Instance 1  |  pacemaker  |  hanahost1  | 
|  Instance 2  |  pacemaker  |  hanahost2  | 

You can run the following command locally to validate the tag values and IAM permissions to describe the tags. Run this command on all instances in the cluster, for all instances in the cluster.

```
$ aws ec2 describe-tags --filters "Name=resource-id,Values=<instance_id>" "Name=key,Values=<cluster_tag>" --region=<region> --output=text | cut -f5
```

# Operating System Requirements
<a name="sap-hana-pacemaker-sles-os-settings"></a>

This section outlines the required operating system configurations for SUSE Linux Enterprise Server for SAP (SLES for SAP) cluster nodes. Note that this is not a comprehensive list of configuration requirements for running SAP HANA on AWS, but rather focuses specifically on cluster management prerequisites.

Consider using configuration management tools or automated deployment scripts to ensure accurate and repeatable setup across your cluster infrastructure.

**Topics**
+ [Root Access](#_root_access)
+ [Install Missing Operating System Packages](#packages)
+ [Update and Check Operating System Versions](#_update_and_check_operating_system_versions)
+ [System Logging](#_system_logging)
+ [Time Synchronization Services](#_time_synchronization_services)
+ [AWS CLI Profile](#shared_aws_cli_profile)
+ [Pacemaker Proxy Settings (Optional)](#_pacemaker_proxy_settings_optional)
+ [Add Overlay IP for Initial Database Access](#_add_overlay_ip_for_initial_database_access)
+ [Hostname Resolution](#_hostname_resolution)

**Important**  
The following configurations must be performed on all cluster nodes. Ensure consistency across nodes to prevent cluster issues.

## Root Access
<a name="_root_access"></a>

Verify root access on both cluster nodes. The majority of the setup commands in this document are performed with the root user. Assume that commands should be run as root unless there is an explicit call out to choose otherwise.

## Install Missing Operating System Packages
<a name="packages"></a>

This is applicable to all cluster nodes. You must install any missing operating system packages.

The following packages and their dependencies are required for the pacemaker setup. Depending on your baseline image, for example, SLES for SAP, these packages may already be installed.


| Package | Description | Category | Required | Configuration Pattern | 
| --- | --- | --- | --- | --- | 
|  chrony  |  Time Synchronization  |  System Support  |  Mandatory  |  All  | 
|  rsyslog  |  System Logging  |  System Support  |  Mandatory  |  All  | 
|  pacemaker  |  Cluster Resource Manager  |  Core Cluster  |  Mandatory  |  All  | 
|  corosync  |  Cluster Communication Engine  |  Core Cluster  |  Mandatory  |  All  | 
|  cluster-glue  |  Cluster Infrastructure  |  Core Cluster  |  Mandatory  |  All  | 
|  crmsh  |  Cluster Management CLI  |  Core Cluster  |  Mandatory  |  All  | 
|  resource-agents  |  Basic Resource Agents  |  Core Cluster  |  Mandatory  |  All  | 
|  fence-agents  |  Fencing Capabilities  |  Core Cluster  |  Mandatory  |  All  | 
|  SAPHanaSR-angi  |  New Generation HANA System Replication Agent  |  SAP HANA HA  |  Mandatory\$1  |  SAPHANAScaleUp-SAPANGI, SAPHANAScaleOut-SAPANGI  | 
|  SAPHanaSR  |  Previous Generation Scale-Up SR Agent  |  SAP HANA HA  |  Mandatory\$1  |  SAPHANAScaleUp-Classic  | 
|  SAPHanaSR-doc  |  Documentation for Scale-Up Configuration  |  SAP HANA HA  |  Mandatory\$1  |  SAPHANAScaleUp-Classic  | 
|  SAPHanaSR-ScaleOut  |  Previous Generation Scale-Out SR Agent  |  SAP HANA HA  |  Mandatory\$1  |  SAPHANAScaleOut-Classic  | 
|  SAPHanaSR-ScaleOut-doc  |  Documentation for Scale-Out Configuration  |  SAP HANA HA  |  Mandatory\$1  |  SAPHANAScaleOut-Classic  | 
|  supportutils  |  System Information Gathering  |  Support Tools  |  Mandatory  |  All  | 
|  sysstat  |  Performance Monitoring Tools  |  Support Tools  |  Mandatory  |  All  | 
|  zypper-lifecycle-plugin  |  Software Lifecycle Management  |  Support Tools  |  Recommended  |  All  | 
|  supportutils-plugin-ha-sap  |  HA/SAP Support Data Collection  |  Support Tools  |  Recommended  |  All  | 
|  supportutils-plugin-suse-public-cloud  |  Cloud Support Data Collection  |  Support Tools  |  Recommended  |  All  | 
|  dstat  |  System Resource Statistics  |  Monitoring  |  Recommended  |  All  | 
|  iotop  |  I/O Monitoring  |  Monitoring  |  Recommended  |  All  | 

**Note**  
Refer to [Vendor Support of Deployment Types](sap-hana-pacemaker-sles-references.md#deployments-sles) for more information on Configuration Patterns. `Mandatory*` indicates that this package is mandatory based on the Configuration Pattern.

```
#!/bin/bash
# Mandatory core packages for SAP HANA HA on AWS
mandatory_packages="corosync pacemaker cluster-glue crmsh rsyslog chrony resource-agents fence-agents "

# HANA SR packages - New Generation
hanaSR_angi="SAPHanaSR-angi"  # New generation package for both scale-up and scale-out

# HANA SR packages - Previous Generation (still in common use)
hanaSR_scaleup="SAPHanaSR SAPHanaSR-doc"  # For scale-up deployments
hanaSR_scaleout="SAPHanaSR-ScaleOut SAPHanaSR-ScaleOut-doc"  # For scale-out deployments

# Recommended monitoring and support packages
support_packages="supportutils supportutils-plugin-ha-sap supportutils-plugin-suse-public-cloud sysstat dstat iotop zypper-lifecycle-plugin"

# Note: Choose either hanaSR_angi OR one of hanaSR_scaleup/hanaSR_scaleout
# Uncomment the appropriate line based on your deployment:
packages="${mandatory_packages} ${hanaSR_angi} ${support_packages}"
#packages="${mandatory_packages} ${hanaSR_scaleup} ${support_packages}"
#packages="${mandatory_packages} ${hanaSR_scaleout} ${support_packages}"

missingpackages=""

for package in ${packages}; do
    echo "Checking if ${package} is installed..."
    if ! rpm -q ${package} --quiet; then
        echo " ${package} is missing and needs to be installed"
        missingpackages="${missingpackages} ${package}"
    fi
done

if [ -z "$missingpackages" ]; then
    echo "All packages are installed."
else
    echo "Missing mandatory packages: $(echo ${missingpackages} | tr ' ' '\n' | grep -E "^($(echo ${mandatory_packages} | tr ' ' '|'))$")"
    echo "Missing support packages: $(echo ${missingpackages} | tr ' ' '\n' | grep -E "^($(echo ${support_packages} | tr ' ' '|'))$")"
    echo -n "Do you want to install the missing packages (y/n)? "
    read response
    if [ "$response" = "y" ]; then
        zypper install -y $missingpackages
    fi
fi
```

If a package is not installed, and you are unable to install it using zypper, it may be because SUSE Linux Enterprise High Availability extension is not available as a repository in your chosen image. You can verify the availability of the extension using the following command:

```
$ sudo zypper repos
```

To install or update a package or packages with confirmation, use the following command:

```
$ sudo zypper install <package_name(s)>
```

## Update and Check Operating System Versions
<a name="_update_and_check_operating_system_versions"></a>

You must update and confirm versions across nodes. Apply all the latest patches to your operating system versions. This ensures that bugs are addressed and new features are available.

You can update the patches individually or update all system patches using the `zypper update` command. A clean reboot is recommended prior to setting up a cluster.

```
$ sudo zypper update
$ sudo reboot
```

Compare the operating system package versions on the two cluster nodes and ensure that the versions match on both nodes.

## System Logging
<a name="_system_logging"></a>

Both systemd-journald and rsyslog are suggested for comprehensive logging. Systemd-journald (enabled by default) provides structured, indexed logging with immediate access to events, while rsyslog is maintained for backward compatibility and traditional file-based logging. This dual approach ensures both modern logging capabilities and compatibility with existing log management tools and practices.

 **1. Enable and start rsyslog:** 

```
# systemctl enable --now rsyslog
```

**2. (Optional) Configure persistent logging for systemd-journald:**  
If you are not using a logging agent (like the AWS CloudWatch Unified Agent or Vector) to ship logs to a centralized location, you may want to configure persistent logging to retain logs after system reboots.

```
# mkdir -p /etc/systemd/journald.conf.d
```

Create `/etc/systemd/journald.conf.d/99-logstorage.conf` with:

```
[Journal]
Storage=persistent
```

Persistent logging requires careful storage management. Configure appropriate retention and rotation settings in `journald.conf` to prevent logs from consuming excessive disk space. Review `man journald.conf` for available options such as SystemMaxUse, RuntimeMaxUse, and MaxRetentionSec.

To apply the changes, restart journald:

```
# systemctl restart systemd-journald
```

After enabling persistent storage, only new logs will be stored persistently. Existing logs from the current boot session will remain in volatile storage until the next reboot.

 **3. Verify services are running:** 

```
# systemctl status systemd-journald
# systemctl status rsyslog
```

## Time Synchronization Services
<a name="_time_synchronization_services"></a>

Time synchronization is important for cluster operation. Ensure that chrony rpm is installed, and configure appropriate time servers in the configuration file.

You can use Amazon Time Sync Service that is available on any instance running in a VPC. It does not require internet access. To ensure consistency in the handling of leap seconds, don’t mix Amazon Time Sync Service with any other ntp time sync servers or pools.

Create or check the `/etc/chrony.d/ec2.conf` file to define the server:

```
# Amazon EC2 time source config
server 169.254.169.123 prefer iburst minpoll 4 maxpoll 4
```

Start the chronyd.service, using the following command:

```
# systemctl enable --now chronyd.service
# systemctl status chronyd
```

For more information, see [Set the time for your Linux instance](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-time.html).

## AWS CLI Profile
<a name="shared_aws_cli_profile"></a>

The AWS cluster resource agents use AWS Command Line Interface (AWS CLI). You need to create an AWS CLI profile for the root account.

You can either edit the config file at `/root/.aws` manually or by using the `aws configure` AWS CLI command.

You should skip providing the information for the access and secret access keys. The permissions are provided through IAM roles attached to Amazon EC2 instances.

```
# aws configure
AWS Access Key ID [None]:
AWS Secret Access Key [None]:
Default region name [None]: <region>
Default output format [None]:
```

The profile name is `default` unless configured. If you choose to use a different name you can specify `--profile`. The name chosen in this example is cluster. It is used in the AWS resource agent definition for pacemaker. The AWS Region must be the default AWS Region of the instance.

```
# aws configure --profile cluster
AWS Access Key ID [None]:
AWS Secret Access Key [None]:
Default region name [None]: <region>
Default output format [None]:
```

On the hosts, you can verify the available profiles using the following command:

```
# aws configure list-profiles
```

And review that an assumed role is associated by querying the caller identity:

```
# aws sts get-caller-identity --profile=<profile_name>
```

## Pacemaker Proxy Settings (Optional)
<a name="_pacemaker_proxy_settings_optional"></a>

If your Amazon EC2 instance has been configured to access the internet and/or AWS Cloud through proxy servers, then you need to replicate the settings in the pacemaker configuration. For more information, see [Using an HTTP Proxy](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-proxy.html).

Add the following lines to `/etc/sysconfig/pacemaker`:

```
http_proxy=http://<proxyhost>:<proxyport>
https_proxy=http://<proxyhost>:<proxyport>
no_proxy=127.0.0.1,localhost,169.254.169.254,fd00:ec2::254
```
+ Modify proxyhost and proxyport to match your settings.
+ Ensure that you exempt the address used to access the instance metadata.
+ Configure no\$1proxy to include the IP address of the instance metadata service – 169.254.169.254 (IPV4) and fd00:ec2::254 (IPV6). This address does not vary.

## Add Overlay IP for Initial Database Access
<a name="_add_overlay_ip_for_initial_database_access"></a>

This step is optional and only needed if you require client connectivity to the SAP HANA database before cluster setup. The Overlay IP will later be managed automatically by the cluster resources.

To enable initial database access, manually add the Overlay IP to the primary instance (where the SAP HANA database is currently running):

```
# ip addr add <hana_overlayip>/32 dev eth0
```
+ This configuration is temporary and will be lost after instance reboot
+ Only configure this on the current primary instance
+ The cluster will take over management of this IP once configured

## Hostname Resolution
<a name="_hostname_resolution"></a>

You must ensure that all instances can resolve all hostnames in use. Add the hostnames for cluster nodes to `/etc/hosts` file on all cluster nodes. This ensures that hostnames for cluster nodes can be resolved even in case of DNS issues. See the following example for a two-node cluster:

```
# cat /etc/hosts
10.2.10.1 hanahost01.example.com hanahost01
10.2.20.1 hanahost02.example.com hanahost02
172.16.52.1 hanahdb.example.com hanahdb
```

In this example, the secondary IPs used for the second cluster ring are not mentioned. They are only used in the cluster configuration. You can allocate virtual hostnames for administration and identification purposes.

**Important**  
The Overlay IP is out of VPC range, and cannot be reached from locations not associated with the route table, including on-premises.

# SAP HANA and Cluster Setup
<a name="sap-hana-pacemaker-sles-deployment-cluster"></a>

**Topics**
+ [SAP HANA Setup and HSR](sap-hana-pacemaker-sles-hana-setup-hsr.md)
+ [SAP HANA Service Control](sap-hana-pacemaker-sles-hana-control.md)
+ [Cluster Node Setup](sap-hana-pacemaker-sles-cluster-node-setup.md)
+ [Cluster Configuration](sap-hana-pacemaker-sles-cluster-config.md)
+ [Client Connectivity](sap-hana-pacemaker-sles-client-connectivity.md)

# SAP HANA Setup and HSR
<a name="sap-hana-pacemaker-sles-hana-setup-hsr"></a>

Prepare SAP HANA for System Replication (HSR) by configuring parameters and creating required backups.

**Topics**
+ [Review AWS and SAP Installation Guides](#review_guides)
+ [Check global.ini parameters](#global_ini)
+ [Create a SAP HANA Backup on the Primary System](#pre_setup_backup)
+ [Configure System Replication on Primary and Secondary Systems](#register_hsr)
+ [Check SAP Host Agent Version](#sap_host_agent)

**Important**  
This guide assumes that SAP HANA Platform has been installed either as a scale-up configuration with two EC2 instances in different availability zones, or as a scale-out configuration with multiple EC2 instances in two availability zones, following the guidance from AWS and SAP.

## Review AWS and SAP Installation Guides
<a name="review_guides"></a>
+  AWS Documentation - [SAP HANA Environment Setup on AWS](https://docs.aws.amazon.com/sap/latest/sap-hana/std-sap-hana-environment-setup.html) 
+ SAP Documentation - [SAP HANA Server Installation and Update Guide](https://help.sap.com/docs/SAP_HANA_PLATFORM/2c1988d620e04368aa4103bf26f17727/7eb0167eb35e4e2885415205b8383584.html) 

SAP provides documentation on how to configure SAP HANA System Replication using the SAP HANA Cockpit, SAP HANA Studio or `hdbnsutil` on the command line. Review the documentation for your SAP HANA Version to ensure no changes to the guidance or to use a method other than command line.
+ SAP Documentation: [Configuring SAP HANA System Replication](https://help.sap.com/docs/SAP_HANA_PLATFORM/4e9b18c116aa42fc84c7dbfd02111aba/442bf027937746248f69701aa9b94112.html) 

## Check global.ini parameters
<a name="global_ini"></a>

Run the following as <sid>adm. These commands will prompt for the system password for the SYSTEMDB database.

**Check log\$1mode is set to normal**  
Ensure that the configuration parameter log\$1mode is set to `normal` in the persistence section of the global.ini file:

```
hdbsql -jx -i <hana_sys_nr> -u system -d SYSTEMDB "SELECT VALUE FROM M_INIFILE_CONTENTS WHERE FILE_NAME = 'global.ini' AND SECTION = 'persistence' AND KEY = 'log_mode';"
```

For example:

```
hdbadm> hdbsql -jx -i 00 -u system -d SYSTEMDB "SELECT VALUE FROM M_INIFILE_CONTENTS WHERE FILE_NAME = 'global.ini' AND SECTION = 'persistence' AND KEY = 'log_mode';"
VALUE
"normal"
```

**Review global.ini file replication**  
SAP HANA System Replication requires consistent configuration between primary and secondary systems to ensure proper operation, especially during failover scenarios. The `inifile_checker/replicate` parameter in global.ini provides an automated solution to this requirement. When enabled on the primary system, any configuration changes made to ini files on the primary are automatically synchronized to the secondary site. This removes the need for manual configuration replication and helps prevent configuration mismatches that could impact system availability. The parameter only needs to be configured on the primary system, as the secondary system will receive these configuration changes through the normal System Replication process.

Add the following to `global.ini`:

```
[inifile_checker]
replicate = true
```

See SAP Note [2978895 - Changing parameters on Primary and Secondary site of SAP HANA system](https://me.sap.com/notes/2978895) 

## Create a SAP HANA Backup on the Primary System
<a name="pre_setup_backup"></a>

 **Get a list of all active databases:** 

```
hdbsql -jx -i <hana_sys_nr> -u system -d SYSTEMDB "SELECT DATABASE_NAME,ACTIVE_STATUS from M_DATABASES"
```

For example:

```
hdbadm> hdbsql -jx -i 00 -u system -d SYSTEMDB "SELECT DATABASE_NAME,ACTIVE_STATUS from M_DATABASES"
Password:
DATABASE_NAME,ACTIVE_STATUS
"SYSTEMDB","YES"
"HDB","YES"
```

**Create a backup of the SYSTEMDB and each tenant database:**  
The following commands are examples for file based backups. Backups can be performed using your preferred tool and location. If using a filesystem (e.g. /backup) ensure there is sufficient space for a full backup.

------
#### [ Backint ]

For the SystemDB

```
hdbsql -i 00 -u SYSTEM  -d SYSTEMDB "BACKUP DATA USING BACKINT ('initial_hsr_db_SYSTEMDB') COMMENT 'Initial backup for HSR'";
```

For each Tenant DB

```
hdbsql -i 00 -u SYSTEM  -d <TENANT_DB> "BACKUP DATA USING BACKINT ('initial_hsr_db_<TENANT_DB>') COMMENT 'Initial backup for HSR'";
```
+ Run as <sid>adm
+ Ensure that backint has been configured correctly
+ You will be prompted to provide a password or alternatively can use `-p password` 

------
#### [ File ]

For the SystemDB

```
hdbsql -i <hana_sys_nr> -u system -d SYSTEMDB "BACKUP DATA USING FILE ('/<backup location>/initial_hsr_db_SYSTEMDB') COMMENT 'Initial backup for HSR'";
```

For each Tenant DB

```
hdbsql -i <hana_sys_nr> -u system -d <TENANT_DB> "BACKUP DATA USING FILE ('/<backup location>/initial_hsr_db_<TENANT_DB>') COMMENT 'Initial backup for HSR'";
```
+ Run as <sid>adm
+ Ensure that a backup location exists with sufficient space and the correct file permissions
+ You will be prompted to provide a password or alternatively can use `-p password` 

------

### Stop the Secondary System and Copy System PKI Keys
<a name="copy_keys"></a>

**Stop the secondary system**  
Stop the hana application on the secondary, as <sid>adm

```
sapcontrol -nr <hana_sys_nr> -function StopSystem <SID>
```

**Copy the system PKI keys**  
Copy the following system PKI SSFS key and data files from the primary system to the same location on the secondary system using scp, a shared file system, or an S3 bucket:

```
/usr/sap/<SID>/SYS/global/security/rsecssfs/data/SSFS_<SID>.DAT
/usr/sap/<SID>/SYS/global/security/rsecssfs/key/SSFS_<SID>.KEY
```

For example using scp:

```
hdbadm>scp -p /usr/sap/HDB/SYS/global/security/rsecssfs/data/SSFS_HDB.DAT hdbadm@hanahost02:/usr/sap/HDB/SYS/global/security/rsecssfs/data/SSFS_HDB.DAT
hdbadm>scp -p /usr/sap/HDB/SYS/global/security/rsecssfs/key/SSFS_HDB.KEY hdbadm@hanahost02:/usr/sap/HDB/SYS/global/security/rsecssfs/key/SSFS_HDB.KEY
```

## Configure System Replication on Primary and Secondary Systems
<a name="register_hsr"></a>

**Enable System Replication on the Primary System**  
Ensure the primary SAP HANA system is **started**, then as <sid>adm, enable system replication using a unique site name:

```
hdbnsutil -sr_enable --name=<site_1>
```

For example:

```
hdbadm> hdbnsutil -sr_enable --name=siteA
```

**Register System Replication on the Secondary System**  
Ensure the secondary SAP HANA system is **stopped**, then as <sid>adm, enable system replication using a unique site name, the connection details of the primary system and preferred replication options.

```
hdbnsutil -sr_register \
 --name=<site_2> \
 --remoteHost=<hostname_1> \
 --remoteInstance=<hana_sys_nr> \
 --replicationMode=[sync|syncmem] \
 --operationMode=[logreplay|logreplay_readenabled]
```

For example:

```
hdbadm> hdbnsutil -sr_register --name=siteB --remoteHost=hanahost01 --remoteInstance=00 --replicationMode=syncmem --operationMode=logreplay
```

Alternatively, if your setup requires active/active read-enabled access to the secondary:

```
hdbadm> hdbnsutil -sr_register --name=siteB --remoteHost=hanahost01 --remoteInstance=00 --replicationMode=syncmem --operationMode=logreplay_readenabled
```
+  `hostname_1` is the hostname used to install SAP HANA, which may be a virtual name.
+ The replication mode can be either `sync` or `syncmem`.
+ For replication to support a clustered system and a hot standby, the operation mode must be `logreplay` or `logreplay_readenabled`.
+ For more information review the SAP Documentation
  + SAP Documentation: [Replication Modes for SAP HANA System Replication](https://help.sap.com/docs/SAP_HANA_PLATFORM/6b94445c94ae495c83a19646e7c3fd56/c039a1a5b8824ecfa754b55e0caffc01.html) 
  + SAP Documentation: [Operaton Modes for SAP HANA System Replication](https://help.sap.com/docs/SAP_HANA_PLATFORM/6b94445c94ae495c83a19646e7c3fd56/627bd11e86c84ec2b9fcdf585d24011c.html) 
  + SAP Documentation: [SAP HANA System Replication - Active/Active (Read Enabled)](https://help.sap.com/docs/SAP_HANA_PLATFORM/6b94445c94ae495c83a19646e7c3fd56/fe5fc53706a34048bf4a3a93a5d7c866.html) 

## Check SAP Host Agent Version
<a name="sap_host_agent"></a>

The SAP host agent is used for SAP instance control and monitoring. This agent is used by SAP cluster resource agents and hooks. It is recommended that you have the latest version installed on all instances. For more details, see [SAP Note 2219592 – Upgrade Strategy of SAP Host Agent](https://me.sap.com/notes/2219592).

Use the following command to check the version of the host agent, repeat on all SAP HANA nodes:

```
# /usr/sap/hostctrl/exe/saphostexec -version
```

# SAP HANA Service Control
<a name="sap-hana-pacemaker-sles-hana-control"></a>

Modify how SAP HANA services are managed to enable cluster takeover and operation.

**Topics**
+ [Add sidadm to haclient Group](#_add_sidadm_to_haclient_group)
+ [Modify SAP Profile for HANA](#_modify_sap_profile_for_hana)
+ [Configure SAPHanaSR Cluster Hook for Optimized Cluster Response](#hook_saphanasr)
+ [Configure susTkOver Cluster Hook to Ensure Cluster Awareness of Manual Takeover](#hook_sustkover)
+ [(Optional) Configure susChkSrv Cluster Hook (Fast Dying Index Server)](#hook_suschksrv)
+ [(Optional) Configure Fast Start Option](#_optional_configure_fast_start_option)
+ [Review systemd Integration](#_review_systemd_integration)

## Add sidadm to haclient Group
<a name="_add_sidadm_to_haclient_group"></a>

The pacemaker software creates a haclient operating system group. To ensure proper cluster access permissions, add the sidadm user to this group on all cluster nodes. Run the following command as root:

```
# usermod -a -G haclient hdbadm
```

## Modify SAP Profile for HANA
<a name="_modify_sap_profile_for_hana"></a>

To prevent automatic SAP HANA startup by the SAP start framework when an instance restarts, modify the SAP HANA instance profiles on all nodes. These profiles are located at `/usr/sap/<SID>/SYS/profile/`.

As <sid>adm, edit the SAP HANA profile `<SID>_HDB<hana_sys_nr>_<hostname>` and modify or add the Autostart parameter, ensuring it is set to 0:

```
Autostart = 0
```

## Configure SAPHanaSR Cluster Hook for Optimized Cluster Response
<a name="hook_saphanasr"></a>

The SAPHanaSR hook provides immediate notification to the cluster if system replication fails, complementing the standard cluster polling mechanism. This optimization can significantly improve failover response time.

Follow these steps to configure the SAPHanaSR hook:

1.  **Verify Cluster Package** 

   The hook configuration varies based on the resource agents in use (see [Deployment Guidance](sap-hana-pacemaker-rhel-references.md#deployments-rhel) for details).

------
#### [ SAPHanaSR ]

   Check the expected package is installed

   ```
   # rpm -qa SAPHanaSR
   ```

   Review the man pages for more details.

   ```
   # man SAPHanaSR
   # man SAPHanaSR.py
   ```

------
#### [ SAPHanaSR-angi ]

   Check the expected package is installed

   ```
   # rpm -qa SAPHanaSR-angi
   ```

   Review the man pages for more details

   ```
   # man SAPHanaSR-angi
   # man SAPHanaSR.py
   ```

------

1.  **Confirm Hook Location** 

   By default the package is installed in `/usr/share/SAPHanaSR-angi` or `/usr/share/SAPHanaSR`. We suggest using the default location but optionally you can copy it to a custom directory; for example, `/hana/share/myHooks`. The hook must be available on all SAP HANA cluster nodes.

1.  **Configure global.ini** 

   Update the `global.ini` file located at `/hana/shared/<SID>/global/hdb/custom/config/` on each SAP HANA cluster node. Make a backup copy before proceeding.

------
#### [ SAPHanaSR ]

   ```
   [ha_dr_provider_SAPHanaSR]
   provider = SAPHanaSR
   path = /usr/share/SAPHanaSR
   execution_order = 1
   
   [trace]
   ha_dr_saphanasr = info
   ```

**Note**  
Update the path if you have modified the package location.

------
#### [ SAPHanaSR-angi ]

   ```
   [ha_dr_provider_sushanasr]
   provider = susHanaSR
   path = /usr/share/SAPHanaSR-angi
   execution_order = 1
   
   [trace]
   ha_dr_sushanasr = info
   ```

**Note**  
Update the path if you have modified the package location.

------

1.  **Configure Sudo Privileges** 

   The SAPHanaSR Python hook requires sudo privileges for the <sid>adm user to access cluster attributes:

   1. Create a new sudoers file as root user in `/etc/sudoers.d/`, for example `60-SAPHanaSR-hook` 

   1. Use visudo to safely edit the new file `visudo /etc/sudoers.d/60-SAPHanaSR-hook` 

   1. Add the following configuration, replacing <sid> with lowercase system ID and <SID> with uppercase system ID:

      ```
      Cmnd_Alias SITE_SOK = /usr/sbin/crm_attribute -n hana_<sid>_site_srHook_[a-zA-Z0-9_]* -v SOK -t crm_config -s SAPHanaSR
      Cmnd_Alias SITE_SFAIL = /usr/sbin/crm_attribute -n hana_<sid>_site_srHook_[a-zA-Z0-9_]* -v SFAIL -t crm_config -s SAPHanaSR
      Cmnd_Alias HOOK_HELPER  = /usr/sbin/SAPHanaSR-hookHelper --sid=<SID> --case=checkTakeover
      <sid>adm ALL=(ALL) NOPASSWD: SITE_SOK, SITE_SFAIL, HOOK_HELPER
      ```

      For example:

      ```
      Cmnd_Alias SITE_SOK = /usr/sbin/crm_attribute -n hana_hdb_site_srHook_[a-zA-Z0-9_]* -v SOK -t crm_config -s SAPHanaSR
      Cmnd_Alias SITE_SFAIL = /usr/sbin/crm_attribute -n hana_hdb_site_srHook_[a-zA-Z0-9_]* -v SFAIL -t crm_config -s SAPHanaSR
      Cmnd_Alias HOOK_HELPER  = /usr/sbin/SAPHanaSR-hookHelper --sid=HDB --case=checkTakeover
      hdbadm ALL=(ALL) NOPASSWD: SITE_SOK, SITE_SFAIL, HOOK_HELPER
      ```
**Note**  
The syntax uses a glob expression which allows it to adapt to different HSR site names whilst avoiding the use of wild cards. This ensures flexibility and security. A modification is still required if the SID changes. Replace the `<sid>` with a lowercase `sid` and `<SID>` with an uppercase `SID` which matches your installation.

1.  **Reload Configuration** 

   As <sid>adm reload the changes to `global.ini` using either a HANA restart or the command:

   ```
   hdbadm> hdbnsutil -reconfig
   ```

1.  **Verify Hook Configuration** 

   As <sid>adm, verify the hook is loaded:

   ```
   hdbadm> cdtrace
   hdbadm> grep "loading HA/DR Provider" nameserver*
   ```

1.  **Replicate Configuration to Secondary** 

   1. Confirm that global.ini changes have been replicated to the secondary system

   1. Create corresponding sudoers.d file on the secondary system

## Configure susTkOver Cluster Hook to Ensure Cluster Awareness of Manual Takeover
<a name="hook_sustkover"></a>

susTkOver.py prevents a manual takeover of the HANA primary if the SAP HANA multi-state resource (managed by SAPHana or SAPHanaController) is active, unless the cluster is set into maintenance mode or the Linux cluster is stopped.

For more details:

```
# man susTkOver.py
```

In addition to the steps for the previous hook, add an additional entry in the global.ini on each node. It is necessary to restart Hana:

```
[ha_dr_provider_susTkOver]
provider = susTkOver
path = /usr/share/SAPHanaSR
execution_order = 2
sustkover_timeout = 30

[trace]
ha_dr_sustkover = info
```

## (Optional) Configure susChkSrv Cluster Hook (Fast Dying Index Server)
<a name="hook_suschksrv"></a>

In the default configuration, a failure of the SAP HANA IndexServer will result in the process being restarted locally even when protected by a cluster. The time taken to stop the process and reload the memory can impact both the Recovery Time Objective (RTO) and performance. The SAP HANA hook susChksrv provides an option to trigger an action, such as a fencing or shutdown based on the HA/DR provider hook method srServiceStateChanged(), which in turn will trigger a failover.

**Important**  
As this hook can be configured using several different options. We suggest consulting the man page or SUSE documentation, and evaluating the best option for your setup.

```
# man susChksrv.py
```

Test the scenario with a Production Sized system to assess whether the time to resume operations aligns with your non functional requirements.

For more information, see SUSE Blog: [Emergency Braking for SAP HANA Dying Index Server](https://www.suse.com/c/emergency-braking-for-sap-hana-dying-indexserver/) 

## (Optional) Configure Fast Start Option
<a name="_optional_configure_fast_start_option"></a>

Although out of scope of this document, the SAP HANA Fast Restart option uses tmpfs file systems to preserve and reuse MAIN data fragments to speed up SAP HANA restarts. This is effective in cases where the operating system is not restarted including local restarts of the Index Server.

Fast Start Option may be an alternative to the susChkSrv hook.

For more information, see SAP Documentation: [SAP HANA Fast Restart Option](https://help.sap.com/docs/SAP_HANA_PLATFORM/6b94445c94ae495c83a19646e7c3fd56/ce158d28135147f099b761f8b1ee43fc.html) 

## Review systemd Integration
<a name="_review_systemd_integration"></a>

Review SAP HANA version and systemd version to determine whether the prerequisites for systemd are available:

```
sidadm> systemctl --version
```

**Operating System versions**
+ SUSE Linux Enterprise Server 15 (systemd version 234)

**SAP HANA Revisions**
+ SAP HANA SPS07 revision 70

When using an SAP HANA version with systemd integration (SPS07 and later), you must run the following steps to prevent the nodes from being fenced when Amazon EC2 instances are intentionally stopped. See Note [3189534 - Linux: systemd integration for sapstartsrv and SAP HANA](https://me.sap.com/notes/3189534) 

1. Verify if SAP HANA is integrated with systemd. If it is integrated, a systemd service name, such as `SAP<SID>_<hana_sys_nr>.service` is present. For example, for SID HDB and instance number 00, `SAPHDB_00.service` is the service name.

   Use the following command as root to find SAP systemd services:

   ```
   # systemctl list-unit-files | grep -i sap
   ```

1. Create a pacemaker service drop-in file:

   ```
   # mkdir -p /etc/systemd/system/pacemaker.service.d/
   ```

1. Create the file /etc/systemd/system/pacemaker.service.d/50-saphana.conf with the following content:

   ```
   [Unit]
   Description=pacemaker needs SAP instance service
   Documentation=man:SAPHanaSR_basic_cluster(7)
   Wants=SAP<SID>_<hana_sys_nr>.service
   After=SAP<SID>_<hana_sys_nr>.service
   ```

1. Enable the drop-in file by reloading systemd:

   ```
   # systemctl daemon-reload
   ```

1. Verify that the change is active:

   ```
   # systemctl show pacemaker.service | grep SAP<SID>_<hana_sys_nr>
   ```

   For example, for SID HDB and instance number 00, the following output is expected:

   ```
   # systemctl show pacemaker.service | grep SAPHDB_00
   Wants=SAPHDB_00.service resource-agents-deps.target dbus.service
   After=system.slice network.target corosync.service resource-agents-deps.target basic.target rsyslog.service SAPHDB_00.service systemd-journald.socket sysinit.target time-sync.target dbus.service sbd.service
   ```

# Cluster Node Setup
<a name="sap-hana-pacemaker-sles-cluster-node-setup"></a>

Establish cluster communication between nodes using Corosync and configure required authentication.

**Topics**
+ [Deploy a Majority Maker Node (Scale-Out Clusters Only)](#_deploy_a_majority_maker_node_scale_out_clusters_only)
+ [Change the hacluster Password](#_change_the_hacluster_password)
+ [Setup Passwordless Authentication](#_setup_passwordless_authentication)
+ [Configure the Cluster Nodes](#_configure_the_cluster_nodes)
+ [Modify Generated Corosync Configuration](#_modify_generated_corosync_configuration)
+ [Verify Corosync Configuration](#_verify_corosync_configuration)
+ [Configure Cluster Services](#_configure_cluster_services)
+ [Verify Cluster Status](#_verify_cluster_status)

## Deploy a Majority Maker Node (Scale-Out Clusters Only)
<a name="_deploy_a_majority_maker_node_scale_out_clusters_only"></a>

**Note**  
Only required for clusters with more than two nodes.

When deploying an SAP HANA Scale-Out cluster in AWS, you must include a majority maker node in a third Availability Zone (AZ). The majority maker (tie-breaker) node ensures the cluster remains operational if one AZ fails by preserving the quorum. For the Scale-Out cluster to function, at least all nodes in one AZ plus the majority maker node must be running. If this minimum requirement is not met, the cluster loses its quorum state and any remaining SAP HANA nodes are fenced.

The majority maker requires a minimum EC2 instance configuration of 2 vCPUs, 2 GB RAM, and 50 GB disk space; this instance is exclusively used for quorum management and does not host an SAP HANA database or any other cluster resources.

## Change the hacluster Password
<a name="_change_the_hacluster_password"></a>

On all cluster nodes, change the password of the operating system user hacluster:

```
# passwd hacluster
```

## Setup Passwordless Authentication
<a name="_setup_passwordless_authentication"></a>

For a more comprehensive and easily consumable view of cluster activity, SUSE provides additional reporting tools. Many of these tools require access to both nodes without entering a password. SUSE recommends performing this setup for root user.

For more details, see Configuration to collect cluster report as root with root SSH access between cluster nodes section in SUSE Documentation [Usage of hb\$1report for SLES HAE](https://www.suse.com/support/kb/doc/?id=000017501).

**Warning**  
Review the security implications for your organization, including root access controls and network segmentation, before implementing this configuration.

## Configure the Cluster Nodes
<a name="_configure_the_cluster_nodes"></a>

Initialize the cluster framework on the first node, including all known cluster nodes.

On the primary node as root, run:

```
# crm cluster init -u -n <cluster_name> -N <hostname_1> -N <hostname_2>
```

 *Example using values from [Parameter Reference](sap-hana-pacemaker-sles-parameters.md) *:

```
hanahost01:~ # crm cluster init -u -n myCluster -N hanahost01 -N hanahost02
INFO: Detected "amazon-web-services" platform
INFO: Loading "default" profile from /etc/crm/profiles.yml
INFO: Configure Corosync (unicast):
  This will configure the cluster messaging layer.  You will need
  to specify a network address over which to communicate (default
  is eth0's network, but you can use the network address of any
  active interface).

Address for ring0 [10.2.10.1]
Port for ring0 [5405]

Do you wish to use SBD (y/n)? n
WARNING: Not configuring SBD - STONITH will be disabled.

Do you wish to configure a virtual IP address (y/n)? n

Do you want to configure QDevice (y/n)? n
INFO: Done (log saved to /var/log/crmsh/crmsh.log)

INFO: Adding node hanahost02 to cluster
INFO: Running command on hanahost02: crm cluster join -y -c root@hanahost01
...
INFO: Done (log saved to /var/log/crmsh/crmsh.log)
```

This command:
+ Initializes a two-node cluster named `myCluster` 
+ Configures unicast communication (-u)
+ Sets up the basic corosync configuration
+ Automatically joins the second node to the cluster
+ We do not configure SBD as `fence_aws` will be used for STONITH in AWS environments.
+ QDevice configuration is possible but not covered in this document. Refer to [SUSE Linux Enterprise High Availability Documentation - QDevice and QNetD](https://documentation.suse.com/en-us/sle-ha/15-SP7/html/SLE-HA-all/cha-ha-qdevice.html).
+ For clusters with more than two nodes, additional nodes can be added either during initialization with additional `-N <hostname_3>` parameters, or later using the following command on each new node:

  ```
  # crm cluster join -c <hostname_1>
  ```

## Modify Generated Corosync Configuration
<a name="_modify_generated_corosync_configuration"></a>

After initializing the cluster, the generated corosync configuration requires some modification to be optimised for cloud envrironments.

 **1. Edit the corosync configuration:** 

```
# vi /etc/corosync/corosync.conf
```

The generated file typically looks like this:

```
# Please read the corosync.conf.5 manual page
totem {
        version: 2
        cluster_name: myCluster
        clear_node_high_bit: yes
        interface {
                ringnumber: 0
                mcastport: 5405
                ttl: 1
        }

        transport: udpu
        crypto_hash: sha1
        crypto_cipher: aes256
        token: 5000     # This needs to be changed
        join: 60
        max_messages: 20
        token_retransmits_before_loss_const: 10
}

logging {
        fileline: off
        to_stderr: no
        to_logfile: yes
        logfile: /var/log/cluster/corosync.log
        to_syslog: yes
        debug: off
        timestamp: on
        logger_subsys {
                subsys: QUORUM
                debug: off
        }

}

nodelist {
    node {
        ring0_addr: <node1_primary_ip>    # Only single ring configured
        nodeid: 1
    }
    node {
        ring0_addr: <node2_primary_ip>    # Only single ring configured
        nodeid: 2
    }
}

quorum {

        # Enable and configure quorum subsystem (default: off)
        # see also corosync.conf.5 and votequorum.5
        provider: corosync_votequorum
        expected_votes: 2
        two_node: 1
}

totem {
    version: 2
    token: 5000             # This needs to be changed
    transport: udpu
    interface {
        ringnumber: 0
        mcastport: 5405
    }
}
```

 **2. Modify the configuration to add the second ring and optimize settings:** 

```
totem {
    token: 15000           # Changed from 5000 to 15000
    rrp_mode: passive      # Added for dual ring support
}

nodelist {
    node {
        ring0_addr: <node1_primary_ip>     # Primary network
        ring1_addr: <node1_secondary_ip>   # Added secondary network
        nodeid: 1
    }
    node {
        ring0_addr: <node2_primary_ip>     # Primary network
        ring1_addr: <node2_secondary_ip>   # Added secondary network
        nodeid: 2
    }
}
```

 *Example IP configuration:* 


| Network Interface | Node 1 | Node 2 | 
| --- | --- | --- | 
|  ring0\$1addr  |  10.2.10.1  |  10.2.20.1  | 
|  ring1\$1addr  |  10.2.10.2  |  10.2.20.2  | 

 **3. Synchronize the modified configuration to all nodes:** 

```
# csync2 -f /etc/corosync/corosync.conf
```

 **4. Restart the cluster** 

```
# crm cluster restart --all
```

## Verify Corosync Configuration
<a name="_verify_corosync_configuration"></a>

Verify network rings are active:

```
# corosync-cfgtool -s
```

 *Example output*:

```
Printing ring status.
Local node ID 1
RING ID 0
        id      = 10.2.10.1
        status  = ring 0 active with no faults
RING ID 1
        id      = 10.2.10.2
        status  = ring 1 active with no faults
```

Both network rings should report "active with no faults". If either ring is missing, review the corosync configuration and check that `/etc/corosync/corosync.conf` changes have been synced to the secondary node. You may need to do this manually. Restart the cluster if needed.

## Configure Cluster Services
<a name="_configure_cluster_services"></a>

Enable pacemaker to start automatically after reboot:

```
# systemctl enable pacemaker
```

Enabling pacemaker also handles corosync through service dependencies. The cluster will start automatically after reboot. For troubleshooting scenarios, you can choose to manually start services after boot instead.

## Verify Cluster Status
<a name="_verify_cluster_status"></a>

 **1. Check pacemaker service status:** 

```
# systemctl status pacemaker
```

 **2. Verify cluster status:** 

```
# crm_mon -1
```

 *Example output*:

```
Cluster Summary:
  * Stack: corosync
  * Current DC: hanahost01 (version 2.1.5+20221208.a3f44794f) - partition with quorum
  * 2 nodes configured
  * 0 resource instances configured

Node List:
  * Online: [ hanahost01 hanahost02 ]

Active Resources:
  * No active resources
```

# Cluster Configuration
<a name="sap-hana-pacemaker-sles-cluster-config"></a>

Bootstrap the cluster and configure all required cluster resources and constraints.

**Topics**
+ [Prepare for Resource Creation](#_prepare_for_resource_creation)
+ [Cluster Bootstrap](#cluster-bootstrap)
+ [Create STONITH Fencing Resource](#resource-stonith)
+ [Create Overlay IP Resources](#resource-overlayip)
+ [Create SAPHanaTopology Resource](#resource-saphanatop)
+ [Create SAPHANA Resource (based on resource agent SAPHana or SAPHanaController)](#resource-saphana)
+ [Create Resource Constraints](#resource-constraints)
+ [Activate Cluster](#_activate_cluster)
+ [Reset Configuration – Optional](#_reset_configuration_optional)

## Prepare for Resource Creation
<a name="_prepare_for_resource_creation"></a>

To ensure that the cluster does not perform unexpected actions during setup of resources and configuration, set the maintenance mode to true.

Run the following command to put the cluster in maintenance mode:

```
# crm maintenance on
```

To verify the current maintenance state:

```
# crm status
```

**Note**  
There are two types of maintenance mode:  
Cluster-wide maintenance (set with `crm maintenance on`)
Node-specific maintenance (set with `crm node maintenance nodename`)
Always use cluster-wide maintenance mode when making configuration changes. For node-specific operations like hardware maintenance, refer to the Operations for proper procedures.  
To disable maintenance mode after configuration is complete:  

```
# crm maintenance off
```

## Cluster Bootstrap
<a name="cluster-bootstrap"></a>

### Configure Cluster Properties
<a name="_configure_cluster_properties"></a>

Configure cluster properties to establish fencing behavior and resource failover settings:

```
# crm configure property stonith-enabled="true"
# crm configure property stonith-timeout="600"
# crm configure property priority-fencing-delay="20"
# crm configure property stonith-action="off"
```
+ The **priority-fencing-delay** is recommended for protecting SAP HANA nodes during network partitioning events. When a cluster partition occurs, this delay gives preference to nodes hosting higher priority resources, with SAP HANA Primary (promoted) instances receiving additional priority weighting. This helps ensure the Primary HANA node survives in split-brain scenarios. The recommended 20 second priority-fencing-delay works in conjunction with the pcmk\$1delay\$1max (10 seconds) configured in the stonith resource, providing a total potential delay of up to 30 seconds before fencing occurs
+ Setting **stonith-action="off"** ensures fenced nodes remain down until manually investigated, preventing potentially compromised nodes from automatically rejoining the cluster. While "reboot" is available as an alternative if automated recovery is preferred, "off" is recommended for SAP HANA clusters to prevent potential data corruption and enable root cause analysis

To verify your cluster property settings:

```
# crm configure show property
```

### Configure Resource Defaults
<a name="_configure_resource_defaults"></a>

Configure resource default behaviors:

```
# crm configure rsc_defaults resource-stickiness="1000"
# crm configure rsc_defaults migration-threshold="5000"
```
+ The **resource-stickiness** value prevents unnecessary resource movement, effectively setting a "cost" for moving resources. A value of 1000 strongly encourages resources to remain on their current node, avoiding the downtime associated with movement.
+ The **migration-threshold** of 5000 ensures the cluster will attempt to recover a resource on the same node many times before declaring that node unsuitable for hosting the resource.

Individual resources may override these defaults with their own defined values.

To verify your resource default settings:

```
# crm configure show rsc_defaults
```

### Configure Operation Defaults
<a name="_configure_operation_defaults"></a>

Configure operation timeout defaults:

```
# crm configure op_defaults timeout="600"
```
+ The **op\$1defaults timeout** ensures all cluster operations have a reasonable default timeout of 600 seconds. Individual resources may override this with their own timeout values.

To verify your operation default settings:

```
# crm configure show op_defaults
```

## Create STONITH Fencing Resource
<a name="resource-stonith"></a>

An AWS STONITH resource agent is recommended for AWS deployments on SUSE as it leverages the AWS API to safely fence failed or incommunicable nodes by stopping the EC2 instances. See [Pacemaker - STONITH Fencing Agent](sap-hana-pacemaker-sles-concepts.md#fencing-sles).

Create the STONITH resource using resource agent ** `external/ec2` **:

```
# crm configure primitive <stonith_resource_name> stonith:external/ec2 \
params tag="<cluster_tag>" profile="<cli_cluster_profile>" pcmk_delay_max="10" \
op start interval="0" timeout="180" \
op stop interval="0" timeout="180" \
op monitor interval="300" timeout="60"
```

Details:
+  **tag** - EC2 instance tag key name that associates instances with this cluster configuration. This tag key must be unique within the AWS account and have a value which matches the instance hostname. See [Create Amazon EC2 Resource Tags Used by Amazon EC2 STONITH Agent](sap-hana-pacemaker-sles-ec2-configuration.md#create-cluster-tags) for EC2 instance tagging configuration.
+  **profile** - (optional) AWS CLI profile name for API authentication. Verify profile exists with `aws configure list-profiles`. If a profile is not explicitly configured the default profile will be used.
+  **pcmk\$1delay\$1max** - Random delay before fencing operations. Works in conjunction with cluster property `priority-fencing-delay` to prevent simultaneous fencing. Historically set to higher values (45s), but with `priority-fencing-delay` now handling primary node protection, a lower value (10s) is sufficient.
+  *Example using values from [Parameter Reference](sap-hana-pacemaker-sles-parameters.md) * :

```
# crm configure primitive res_stonith_ec2 stonith:external/ec2 \
params tag="pacemaker" profile="cluster" \
pcmk_delay_max="10" \
op start interval="0" timeout="180" \
op stop interval="0" timeout="180" \
op monitor interval="300" timeout="60"
```

## Create Overlay IP Resources
<a name="resource-overlayip"></a>

This resource ensures client connections follow the SAP HANA primary instance during failover by updating AWS route table entries. It manages an overlay IP address that always points to the active SAP HANA database

Create the IP resource:

```
# crm configure primitive rsc_ip_<SID>_HDB<hana_sys_nr> ocf:heartbeat:aws-vpc-move-ip \
params ip="<hana_overlayip>" \
routing_table="<routetable_id>" \
interface="eth0" \
profile="<cli_cluster_profile>" \
op start interval="0" timeout="180" \
op stop interval="0" timeout="180" \
op monitor interval="60" timeout="60"
```

Details:
+  **ip** - Overlay IP address that will be used to connect to the Primary SAP HANA database. See [Overlay IP Concept](sap-hana-pacemaker-sles-concepts.md#overlay-ip-sles) 
+  **routing\$1table** - AWS route table ID(s) that need to be updated. Multiple route tables can be specified using commas (For example, `routing_table=rtb-xxxxxroutetable1,rtb-xxxxxroutetable2`). Ensure initial entries have been created following [Add VPC Route Table Entries for Overlay IPs](sap-hana-pacemaker-sles-infra-setup.md#rt-sles) 
+  **interface** - Network interface for the IP address (typically eth0)
+  **profile** - (optional) AWS CLI profile name for API authentication. Verify profile exists with `aws configure list-profiles`. If a profile is not explicitly configured the default profile will be used.
+  **awscli** - (optional) Path to the AWS CLI executable. The default path is `/usr/bin/aws`. Only specify this parameter if the AWS CLI is installed in a different location. To confirm the path on your system, run `which aws`.
+  *Example using values from [Parameter Reference](sap-hana-pacemaker-rhel-parameters.md) * :  
**Example**  

  ```
  # crm configure primitive rsc_ip_HDB_HDB00 ocf:heartbeat:aws-vpc-move-ip \
  params ip="172.16.52.1" \
  routing_table="rtb-xxxxxroutetable1" \
  interface="eth0" \
  profile="cluster" \
  op start interval="0" timeout="180" \
  op stop interval="0" timeout="180" \
  op monitor interval="60" timeout="60"
  ```

**For Active/Active Read Enabled**  
Only if you are using `logreplay_readenabled` and require that your secondary is accessible via overlay IP. You can create an additional IP resource.

```
# crm configure primitive rsc_ip_<SID>_HDB<hana_sys_nr>_readenabled ocf:heartbeat:aws-vpc-move-ip \
params ip="<readenabled_overlayip>" \
routing_table="<routetable_id>" \
interface="eth0" \
profile="<cli_cluster_profile>" \
op start interval="0" timeout="180" \
op stop interval="0" timeout="180" \
op monitor interval="60" timeout="60"
```
+  *Example using values from [Parameter Reference](sap-hana-pacemaker-sles-parameters.md) * :  
**Example**  

  ```
  # crm configure primitive rsc_ip_HDB_HDB00_readenabled ocf:heartbeat:aws-vpc-move-ip \
  params ip="172.16.52.2" \
  routing_table="rtb-xxxxxroutetable1" \
  interface="eth0" \
  profile="cluster" \
  op start interval="0" timeout="180" \
  op stop interval="0" timeout="180" \
  op monitor interval="60" timeout="60"
  ```

**For Shared VPC**  
If your configuration requires a shared vpc, two additional parameters are required.

```
# crm configure primitive rsc_ip_<SID>_HDB<hana_sys_nr> ocf:heartbeat:aws-vpc-move-ip \
params ip="<hana_overlayip>" routing_table=<routetable_id> interface=eth0 \
profile="<cli_cluster_profile>" lookup_type=NetworkInterfaceId \
routing_table_role="arn:aws:iam::<sharing_vpc_account_id>:role/<sharing_vpc_account_cluster_role>" \
op start interval="0" timeout="180" \
op stop interval="0" timeout="180" \
op monitor interval="60" timeout="60"
```

Additional details:
+  **lookup\$1type** = NetworkInterfaceId
+  **routing\$1table\$1role** = "arn:aws:iam::<shared\$1vpc\$1account\$1id>:role/<sharing\$1vpc\$1account\$1cluster\$1role>"

## Create SAPHanaTopology Resource
<a name="resource-saphanatop"></a>

The SAPHanaTopology resource agent helps manage high availability for SAP HANA databases with system replication. It analyzes the SAP HANA topology and reports findings via node status attributes. These attributes are used by either the SAPHana or SAPHanaController resource agents to control the SAP HANA databases. SAPHanaTopology starts and monitors the local saphostagent, leveraging SAP interfaces like landscapeHostConfiguration.py, hdbnsutil, and saphostctrl to gather information about system status, roles, and configuration.

### SAPHanaSR-angi and Classic Deployments
<a name="_saphanasr_angi_and_classic_deployments"></a>

For both scale-up and scale-out deployments

For documentation on the resource you can review the man page.

```
# man ocf_suse_SAPHanaTopology
```

------
#### [ For scale-up (2-node) ]

For the primitive:

```
# crm configure primitive rsc_SAPHanaTopology_<SID>_HDB<hana_sys_nr> ocf:suse:SAPHanaTopology \
params SID="<SID>" \
InstanceNumber="<hana_sys_nr>" \
op start interval="0" timeout="600" \
op stop interval="0" timeout="300" \
op monitor interval="10" timeout="600"
```

For the clone:

```
# crm configure clone cln_SAPHanaTopology_<SID>_HDB<hana_sys_nr> rsc_SAPHanaTopology_<SID>_HDB<hana_sys_nr> \
meta clone-node-max="1" interleave="true" clone-max="2"
```
+  *Example using values from [Parameter Reference](sap-hana-pacemaker-sles-parameters.md) * :  
**Example**  

  ```
  # crm configure primitive rsc_SAPHanaTopology_HDB_HDB00 ocf:suse:SAPHanaTopology \
  params SID="HDB" \
  InstanceNumber="00" \
  op start interval="0" timeout="600" \
  op stop interval="0" timeout="300" \
  op monitor interval="10" timeout="600"
  
  # crm configure clone cln_SAPHanaTopology_HDB_HDB00 rsc_SAPHanaTopology_HDB_HDB00 \
  meta clone-node-max="1" interleave="true" clone-max="2"
  ```

------
#### [ For scale-out ]

For the primitive:

```
# crm configure primitive rsc_SAPHanaTopology_<SID>_HDB<hana_sys_nr> ocf:suse:SAPHanaTopology \
params SID="<SID>" InstanceNumber="<hana_sys_nr>" \
op start interval="0" timeout="600" \
op stop interval="0" timeout="300" \
op monitor interval="10" timeout="600"
```

For the clone:

```
# crm configure clone cln_SAPHanaTopology_<SID>_HDB<hana_sys_nr> rsc_SAPHanaTopology_<SID>_HDB<hana_sys_nr> \
meta clone-node-max="1" interleave="true" clone-max="<number-of-nodes>"
```
+  *Example using values from [Parameter Reference](sap-hana-pacemaker-sles-parameters.md) * :  
**Example**  

  ```
  # crm configure primitive rsc_SAPHanaTopology_HDB_HDB00 ocf:suse:SAPHanaTopology \
  params SID="HDB" InstanceNumber="00" \
  op start interval="0" timeout="600" \
  op stop interval="0" timeout="300" \
  op monitor interval="10" timeout="600"
  
  # crm configure clone cln_SAPHanaTopology_HDB_HDB00 rsc_SAPHanaTopology_HDB_HDB00 \
  meta clone-node-max="1" interleave="true" clone-max="6"
  ```

------

Details:
+  **SID** - SAP System ID for the HANA instance
+  **InstanceNumber** - Instance number of the SAP HANA instance
+  **clone-node-max** - Defines how many copies of the resource agent can be started on a single node (set to 1)
+  **interleave** - Enables parallel starting of dependent clone resources on the same node (set to true)
+  **clone-max** - Defines the total number of clone instances that can be started in the cluster (For example, use 2 for scale-out or set to 6 for scale-out with 3 nodes per site, do not include majority maker node)

## Create SAPHANA Resource (based on resource agent SAPHana or SAPHanaController)
<a name="resource-saphana"></a>

The SAP HANA resource agents manage system replication and failover between SAP HANA databases. These agents control start, stop, and monitoring operations while checking synchronization status to maintain data consistency. They leverage SAP interfaces including sapcontrol, landscapeHostConfiguration, hdbnsutil, systemReplicationStatus, and saphostctrl. All configurations work in conjunction with the SAPHanaTopology agent, which gathers information about the system replication status across cluster nodes.

Choose the appropriate resource agent configuration based on your SAP HANA architecture:

### SAPHanaSR-angi Deployments (Available in SLES 15 SP4\$1)
<a name="_saphanasr_angi_deployments_available_in_sles_15_sp4"></a>

Available and recommended for new deployments on SLES 15 SP4 and above. The SAPHanaController resource agent with next generation system replication architecture (SAPHanaSR-angi) provides improved integration and management capabilities for both scale-up and scale-out deployments. For detailed information:

For documentation on the resource you can review the man page.

```
# man ocf_suse_SAPHanaController
```

------
#### [ For scale-up (2-node) ]

Create the primitive

```
# crm configure primitive rsc_SAPHanaController_<SID>_HDB<hana_sys_nr> ocf:suse:SAPHanaController \
params SID="<SID>" \
InstanceNumber="<hana_sys_nr>" \
PREFER_SITE_TAKEOVER="true" \
DUPLICATE_PRIMARY_TIMEOUT="7200" \
AUTOMATED_REGISTER="true" \
op start interval="0" timeout="3600" \
op stop interval="0" timeout="3600" \
op promote interval="0" timeout="3600" \
op monitor interval="60" role="Promoted" timeout="700" \
op monitor interval="61" role="Unpromoted" timeout="700" \
meta priority="100"
```

Create the clone

```
# crm configure clone msl_SAPHanaController_<SID>_HDB<hana_sys_nr> rsc_SAPHanaController_<SID>_HDB<hana_sys_nr> \
meta clone-node-max="1" promotable="true" interleave="true" clone-max="2"
```
+  *Example using values from [Parameter Reference](sap-hana-pacemaker-sles-parameters.md) * :  
**Example**  

  ```
  # crm configure primitive rsc_SAPHanaController_HDB_HDB00 ocf:suse:SAPHanaController \
  params SID="HDB" \
  InstanceNumber="00" \
  PREFER_SITE_TAKEOVER="true" \
  DUPLICATE_PRIMARY_TIMEOUT="7200" \
  AUTOMATED_REGISTER="true" \
  op start interval="0" timeout="3600" \
  op stop interval="0" timeout="3600" \
  op promote interval="0" timeout="3600" \
  op monitor interval="60" role="Promoted" timeout="700" \
  op monitor interval="61" role="Unpromoted" timeout="700" \
  meta priority="100"
  # crm configure clone msl_SAPHanaController_HDB_HDB00 rsc_SAPHanaController_HDB_HDB00 \
  meta clone-node-max="1" promotable="true" interleave="true" clone-max="2"
  ```

------
#### [ For scale-out ]

Create the primitive

```
# crm configure primitive rsc_SAPHanaController_<SID>_HDB<hana_sys_nr> ocf:suse:SAPHanaController \
params SID="<SID>" \
InstanceNumber="<hana_sys_nr>" \
PREFER_SITE_TAKEOVER="true" \
DUPLICATE_PRIMARY_TIMEOUT="7200" \
AUTOMATED_REGISTER="true" \
op start interval="0" timeout="3600" \
op stop interval="0" timeout="3600" \
op promote interval="0" timeout="3600" "\
op monitor interval="60" role="Promoted" timeout="700" \
op monitor interval="61" role="Unpromoted" timeout="700"
```

Create the clone

```
# crm configure clone msl_SAPHanaController_<SID>_HDB<hana_sys_nr> rsc_SAPHanaController_<SID>_HDB<hana_sys_nr> \
meta clone-node-max="1" promotable="true" interleave="true" clone-max="<number-of-nodes>"
```
+  *Example using values from [Parameter Reference](sap-hana-pacemaker-sles-parameters.md) * :  
**Example**  

  ```
  # crm configure primitive rsc_SAPHanaController_HDB_HDB00 ocf:suse:SAPHanaController \
  params SID="HDB" \
  InstanceNumber="00" \
  PREFER_SITE_TAKEOVER="true" \
  DUPLICATE_PRIMARY_TIMEOUT="7200" \
  AUTOMATED_REGISTER="true" \
  op start interval="0" timeout="3600" \
  op stop interval="0" timeout="3600" \
  op promote interval="0" timeout="3600" "\
  op monitor interval="60" role="Promoted" timeout="700" \
  op monitor interval="61" role="Unpromoted" timeout="700"
  
  # crm configure clone msl_SAPHanaController_HDB_HDB00 rsc_SAPHanaController_HDB_HDB00 \
  meta clone-node-max="1" promotable="true" interleave="true" clone-max="6"
  ```

------

Details:
+  **SID** - SAP System ID for the HANA instance
+  **InstanceNumber** - Instance number of the SAP HANA instance
+  **clone-node-max** - Defines how many copies of the resource agent can be started on a single node (set to 1)
+  **interleave** - Enables parallel starting of dependent clone resources on the same node (set to true)
+  **clone-max** - Defines the total number of clone instances that can be started in the cluster (For example, use 2 for scale-out or set to 6 for scale-out with 3 nodes per site, do not include majority maker node)
+  **PREFER\$1SITE\$1TAKEOVER** defines whether a takeover to the secondary is preferred. Review for non standard deployments.
+  **AUTOMATED\$1REGISTER** defines whether the ex-primary should be registered as a secondary. Review for non standard deployments.
+  **DUPLICATE\$1PRIMARY\$1TIMEOUT** is the wait time to minimise the risk of an unintended dual primary.
+  **meta priority** - Setting this to 100 works in conjunction with priority-fencing-delay to ensure proper failover order and prevent simultaneous fencing operations
+ The start and stop timeout values (3600s) may need to be increased for larger databases. Adjust these values based on your database size and observed startup/shutdown times

### Classic Deployments
<a name="_classic_deployments"></a>

For classic scale-up deployments, the SAPHana resource agent manages takeover between two SAP HANA databases. For detailed information:

```
# man ocf_suse_SAPHana
```

------
#### [ For scale-up (2-node) ]

Create the primitive using the SAPHana Resource Agent

```
# crm configure primitive rsc_SAPHana_<SID>_HDB<hana_sys_nr> ocf:suse:SAPHana \
params SID="<SID>" \
InstanceNumber="<hana_sys_nr>" \
PREFER_SITE_TAKEOVER="true" \
DUPLICATE_PRIMARY_TIMEOUT="7200" \
AUTOMATED_REGISTER="true" \
op start interval="0" timeout="3600" \
op stop interval="0" timeout="3600" \
op promote interval="0" timeout="3600" \
op monitor interval="60" role="Master" timeout="700" \
op monitor interval="61" role="Slave" timeout="700" \
meta priority="100"
```

Create the clone

```
# crm configure ms msl_SAPHana_<SID>_HDB<hana_sys_nr> rsc_SAPHana_<SID>_HDB<hana_sys_nr> \
meta clone-node-max="1" interleave="true" clone-max="2"
```
+  *Example using values from [Parameter Reference](sap-hana-pacemaker-sles-parameters.md) * :

  ```
  # crm configure primitive rsc_SAPHana_HDB_HDB00 ocf:suse:SAPHana \
  params SID="HDB" \
  InstanceNumber="00" \
  PREFER_SITE_TAKEOVER="true" \
  DUPLICATE_PRIMARY_TIMEOUT="7200" \
  AUTOMATED_REGISTER="true" \
  op start interval="0" timeout="3600" \
  op stop interval="0" timeout="3600" \
  op promote interval="0" timeout="3600" "\
  op monitor interval="60" role="Master" timeout="700" \
  op monitor interval="61" role="Slave" timeout="700" \
  meta priority="100"
  
  # crm configure ms msl_SAPHana_HDB_HDB00 rsc_SAPHana_HDB_HDB00 \
  meta clone-node-max="1" interleave="true" clone-max="2"
  ```

------
#### [ For scale-out ]

Create the primitive using the SAPHanaController Resource Agent:

```
# crm configure primitive rsc_SAPHanaController_<SID>_HDB<hana_sys_nr> ocf:suse:SAPHanaController \
params SID="<SID>"
InstanceNumber="<hana_sys_nr>" \
PREFER_SITE_TAKEOVER="true" \
DUPLICATE_PRIMARY_TIMEOUT="7200" \
AUTOMATED_REGISTER="true" \
op start interval="0" timeout="3600" \
op stop interval="0" timeout="3600" \
op promote interval="0" timeout="3600" \
op monitor interval="60" role="Master" timeout="700" \
op monitor interval="61" role="Slave" timeout="700"
```

Create the clone

```
# crm configure clone msl_SAPHanaController_<SID>_HDB<hana_sys_nr> rsc_SAPHanaController_<SID>_HDB<hana_sys_nr> \
meta clone-node-max="1" interleave="true" clone-max="<number-of-nodes>"
```
+  *Example using values from [Parameter Reference](sap-hana-pacemaker-sles-parameters.md) * :

  ```
  # crm configure primitive rsc_SAPHanaController_HDB_HDB00 ocf:suse:SAPHanaController \
  params SID="HDB" \
  InstanceNumber="00" \
  PREFER_SITE_TAKEOVER="true" \
  DUPLICATE_PRIMARY_TIMEOUT="7200" \
  AUTOMATED_REGISTER="true" \
  op start interval="0" timeout="3600" \
  op stop interval="0" timeout="3600" \
  op promote interval="0" timeout="3600" \
  op monitor interval="60" role="Master" timeout="700" \
  op monitor interval="61" role="Slave" timeout="700"
  
  # crm configure ms msl_SAPHana_HDB_HDB00 rsc_SAPHana_HDB_HDB00 \
  meta clone-node-max="1" interleave="true" clone-max="6"
  ```

------

Details:
+  **SID** - SAP System ID for the HANA instance
+  **InstanceNumber** - Instance number of the SAP HANA instance
+  **clone-node-max** - Defines how many copies of the resource agent can be started on a single node (set to 1)
+  **interleave** - Enables parallel starting of dependent clone resources on the same node (set to true)
+  **clone-max** - Defines the total number of clone instances that can be started in the cluster (For example, use 2 for scale-out or set to 6 for scale-out with 3 nodes per site, do not include majority maker node)
+  **PREFER\$1SITE\$1TAKEOVER** defines whether a takeover to the secondary is preferred. Review for non standard deployments.
+  **AUTOMATED\$1REGISTER** defines whether the ex-primary should be registered as a secondary. Review for non standard deployments.
+  **DUPLICATE\$1PRIMARY\$1TIMEOUT** is the wait time to minimise the risk of an unintended dual primary.
+  **meta priority** - Setting this to 100 works in conjunction with priority-fencing-delay to ensure proper failover order and prevent simultaneous fencing operations
+ The start and stop timeout values (3600s) may need to be increased for larger databases. Adjust these values based on your database size and observed startup/shutdown times

## Create Resource Constraints
<a name="resource-constraints"></a>

The following constraints are required.

### Order Constraint
<a name="_order_constraint"></a>

This constraint defines the start order between the SAPHanaTopology and SAPHana resources:

```
# crm configure order <order_rule_name> Optional: <SAPHanaTopology_clone> <SAPHana/SAPHanaController_Clone>
```
+  *Example* :

  ```
  # crm configure order ord_SAPHana Optional: cln_SAPHanaTopology_HDB_HDB00 msl_SAPHana_HDB_HDB00
  ```

### Colocation Constraint
<a name="_colocation_constraint"></a>

#### IP with Primary
<a name="_ip_with_primary"></a>

This constraint ensures that the IP resource which determines the target of the overlay IP runs on the node which has the primary SAP HANA role:

```
# crm configure colocation <colocation_rule_name> 2000: <ip_resource_name> <saphana/saphanacontroller name>:Master
```
+  *Example* :

  ```
  # crm configure colocation col_ip_SAPHana_Primary 2000: rsc_ip_HDB_HDB00 msl_SAPHana_HDB_HDB00:Master
  ```

#### ReadOnly IP with Secondary (Only for ReadOnly Patterns)
<a name="_readonly_ip_with_secondary_only_for_readonly_patterns"></a>

This constraint ensures that the read-enabled IP resource runs on the secondary (Unpromoted) node. When the secondary node is unavailable, the IP will move to the primary node, where read workloads will share capacity with primary workloads:

```
# crm configure colocation <colocation_rule_name> 2000: rsc_ip_<SID>_HDB<hana_sys_nr>_readenabled msl_SAPHana/SAPHanaController_<SID>_HDB<hana_sys_nr>:Unpromoted
```
+  *Example* :

  ```
  # crm configure colocation col_ip_readenabled_SAPHana_Secondary 2000: rsc_ip_HDB_HDB00_readenabled msl_SAPHana_HDB_HDB00:Unpromoted
  ```

### Location Constraint
<a name="_location_constraint"></a>

#### No SAP HANA Resources on the Majority Maker (Scale Out Only)
<a name="_no_sap_hana_resources_on_the_majority_maker_scale_out_only"></a>

This location constraint ensures that SAP HANA Resources avoid the Majority Maker, which is not suited to running them.

```
# crm configure location loc_SAPHanaTopology_avoid_majority_maker cln_SAPHanaTopology_<SID>_HDB<hana_sys_nr> -inf:<hostname_mm>

# crm configure location loc_SAPHana/SAPHanaController_avoid_majority_maker msl_SAPHana/SAPHanaController_<SID>_HDB<hana_sys_nr> -inf:<hostname_mm>
```
+  *Example* :

  ```
  # crm configure location loc_SAPHanaTopology_avoid_majority_maker cln_SAPHanaTopology_HDB_HDB00 -inf:hanamm
  # crm configure location loc_SAPHana_avoid_majority_maker msl_SAPHana_HDB_HDB00 -inf:hanamm
  ```

## Activate Cluster
<a name="_activate_cluster"></a>

Use `crm config show` and `crm config edit` commands to review that all the values have been entered correctly.

On confirmation of correct values, set the maintenance mode to false using the following command. This enables the cluster to take control of the resources:

```
# crm maintenance off
```

## Reset Configuration – Optional
<a name="_reset_configuration_optional"></a>

**Important**  
The following instructions help you reset the complete configuration. Run these commands only if you want to start setup from the beginning. You can make minor changes with the crm edit command.

Run the following command to back up the current configuration for reference:

```
# crm config show > /tmp/crmconfig_backup.txt
```

Run the following command to clear the current configuration:

```
# crm configure erase
```

Once the preceding erase command is executed, it removes all of the cluster resources from Cluster Information Base (CIB), and disconnects the communication from corosync to the cluster. Before starting the resource configuration run crm cluster restart, so that cluster reestablishes communication with corosync, and retrieves the configuration. The restart of cluster removes maintenance mode. Reapply before commencing additional configuration and resource setup.

# Client Connectivity
<a name="sap-hana-pacemaker-sles-client-connectivity"></a>

For proper SAP HANA database connectivity:
+ Ensure that the Overlay IP can be correctly resolved in all application servers
+ DNS configuration or local host entries must be valid
+ Network routing must be properly configured
+ SAP HANA client libraries must be installed and up to date

Ensure that the connectivity data for the SAP HANA Database references the hostname associated with the Overlay IP. For more information see SAP Documentation: [Setting Connectivity Data for the SAP HANA Database](https://help.sap.com/docs/SLTOOLSET/39c32e9783f6439e871410848f61544c/b7ed2d55b0a7f857e10000000a441470.html?version=CURRENT_VERSION_SWPM20) 

Test database connectivity using R3trans utility:

```
sidadm> R3trans -d
```

Review additional connections to SAP HANA that require High Availability. While application connectivity should use the overlay IP, administrative tools (SAP HANA Studio, hdbsql commands, monitoring tools) require direct connectivity to individual SAP HANA instances.

# Operations
<a name="sap-hana-pacemaker-operations"></a>

**Topics**
+ [Viewing the cluster state](sap-hana-pacemaker-sles-ops-cluster-state.md)
+ [Performing planned maintenance](sap-hana-pacemaker-sles-ops-planned-maint.md)
+ [Post-failure analysis and reset](sap-hana-pacemaker-sles-ops-post-failure.md)
+ [Alerting and monitoring](sap-hana-pacemaker-sles-ops-alert-monitor.md)

# Viewing the cluster state
<a name="sap-hana-pacemaker-sles-ops-cluster-state"></a>

You can view the state of the cluster in two ways - based on your operating system or with a web based console provided by SUSE.

**Topics**
+ [Operating system based](#_operating_system_based)
+ [SUSE Hawk2](#_suse_hawk2)

## Operating system based
<a name="_operating_system_based"></a>

There are multiple operating system commands that can be run as root or as a user with appropriate permissions. The commands enable you to get an overview of the status of the cluster and its services. See the following commands for more details.

```
# crm status
```

Sample output:

```
Cluster Summary:
  * Stack: corosync
  * Current DC: sapsecdb (version 2.0.5+20201202.ba59be712-150300.4.45.2-2.0.5+20201202.ba59be712) - partition with quorum
  * Last updated: Wed Aug 20 14:05:19 2025
  * Last change:  Wed Aug 20 14:04:54 2025 by root via crm_attribute on hanahost01
  * 2 nodes configured
  * 6 resource instances configured

Node List:
  * Online: [ hanahost01 hanahost02  ]

Full List of Resources:
  * rsc_AWS_STONITH     (stonith:external/ec2):  Started sapsecdb
  * rsc_ip_HDB_HDB00    (ocf::heartbeat:aws-vpc-move-ip):        Started hanahost01
  * Clone Set: cln_SAPHanaTopology_HDB_HDB00 [rsc_SAPHanaTopology_HDB_HDB00]:
    * Started: [ hanahost01 hanahost02 ]
  * Clone Set: msl_SAPHana_HDB_HDB00 [rsc_SAPHana_HDB_HDB00] (promotable):
    * Masters: [ hanahost01 ]
    * Slaves: [ hanahost02 ]
```

The following table provides a list of useful commands.


| Command | Description | 
| --- | --- | 
|   `crm_mon`   |  Display cluster status on the console with updates as they occur  | 
|   `crm_mon -1`   |  Display cluster status on the console just once, and exit  | 
|   `crm_mon -Arnf`   |  -A Display node attributes -n Group resources by node -r Display inactive resources -f Display resource fail counts  | 
|   `crm help`   |  View more options  | 
|   `crm_mon --help-all`   |  View more options  | 

## SUSE Hawk2
<a name="_suse_hawk2"></a>

Hawk2 is a web-based graphical user interface for managing and monitoring pacemaker highly availability clusters. It must be enabled on every node in the cluster, to point your web browser on any node for accessing it. Use the following command to enable Hawk2.

```
# systemctl enable --now hawk
# systemctl status hawk
```

Use the following URL to check security groups for access on port 7630 from your administrative host.

```
https://your-server:7630/

e.g https://hanahost01:7630
```

For more information, see [Configuring and Managing Cluster Resources with Hawk2](https://documentation.suse.com/sle-ha/12-SP5/html/SLE-HA-all/cha-conf-hawk2.html) in the SUSE Documentation.

# Performing planned maintenance
<a name="sap-hana-pacemaker-sles-ops-planned-maint"></a>

When performing maintenance on SAP HANA systems in a cluster environment, it’s important to understand how the cluster interacts with SAP HANA system replication. Planned maintenance activities should be conducted carefully to prevent unnecessary failovers or cluster interventions.

There are different options to perform planned maintenance on nodes, resources, and the cluster.

**Topics**
+ [Maintenance mode](#_maintenance_mode)
+ [Placing a node in standby mode](#_placing_a_node_in_standby_mode)
+ [Moving a resource](#_moving_a_resource)

## Maintenance mode
<a name="_maintenance_mode"></a>

Use maintenance mode if you want to make any changes to the configuration or take control of the resources and nodes in the cluster. In most cases, this is the safest option for administrative tasks.

**Example**  
Use one of the following commands to turn on maintenance mode.  

```
# crm maintenance on
```

```
# crm configure property maintenance-mode="true"
```
Use one of the following commands to turn off maintenance mode.  

```
# crm maintenance off
```

```
# crm configure property maintenance-mode="false"
```

## Placing a node in standby mode
<a name="_placing_a_node_in_standby_mode"></a>

To perform maintenance on the cluster without a full system outage, the recommended method for moving active resources is to place the node you want to remove from the cluster in standby mode.

```
# crm node standby <hostname>
```

The cluster will cleanly relocate resources, and you can perform activities, including reboots on the node in standby mode. When maintenance activities are complete, you can re-introduce the node with the following command.

```
# crm node online <hostname>
```

## Moving a resource
<a name="_moving_a_resource"></a>

Moving individual resources is not recommended because of the migration or move constraints that are created to lock the resource in its new location. These can be cleared as described in the info messages, but this introduces an additional setup.

```
 # crm resource move msl_SAPHanaController_HDB_HDB00 hanahost02
INFO: Move constraint created for msl_SAPHanaController_HDB_HDB00 to hanahost02
INFO: Use `crm resource clear msl_SAPHanaController_HDB_HDB00` to remove this constraint
```

Note: The exact resource name will vary depending on your SAP HANA system ID and instance number. Adjust the commands accordingly.

Use the following command once the resources have relocated to their target location.

```
# crm resource clear msl_SAPHanaController_HDB_HDB00
```

# Post-failure analysis and reset
<a name="sap-hana-pacemaker-sles-ops-post-failure"></a>

A review must be conducted after each failure to understand the source of failure as well the reaction of the cluster. In most scenarios, the cluster prevents an application outage. However, a manual action is often required to reset the cluster to a protective state for any subsequent failures.

**Topics**
+ [Checking the Logs](#_checking_the_logs)
+ [Cleanup crm status](#_cleanup_crm_status)
+ [Restart failed nodes or pacemaker](#_restart_failed_nodes_or_pacemaker)
+ [Further Analysis](#_further_analysis)

## Checking the Logs
<a name="_checking_the_logs"></a>
+ For troubleshooting cluster issues, use journalctl to examine both pacemaker and corosync logs:

  ```
  # journalctl -u pacemaker -u corosync --since "1 hour ago"
  ```
  + Use `--since` to specify time periods (e.g., "2 hours ago", "today")
  + Add `-f` to follow logs in real-time
  + Combine with grep for specific searches
+ System messages and resource agent activity can be found in `/var/log/messages`.
+ For HANA-specific issues, check the HANA trace directory. This can be reached using 'cdtrace' when logged in as <sid>adm. Also consult the DB\$1<tenantdb> directory within the HANA trace directory.

## Cleanup crm status
<a name="_cleanup_crm_status"></a>

If failed actions are reported using the `crm status` command, and if they have already been investigated, then you can clear the reports with the following command.

```
# crm resource cleanup <resource> <hostname>
```

## Restart failed nodes or pacemaker
<a name="_restart_failed_nodes_or_pacemaker"></a>

It is recommended that failed (or fenced) nodes are not automatically restarted. It gives operators a chance to investigate the failure, and ensure that the cluster doesn’t make assumptions about the state of resources.

You need to restart the instance or the pacemaker service based on your approach.

## Further Analysis
<a name="_further_analysis"></a>

For cluster-specific issues, use `hb_report` to generate a targeted analysis of cluster components across all nodes:

```
# hb_report -f "YYYY-MM-DD HH:MM:SS" -t "YYYY-MM-DD HH:MM:SS" /tmp/hb_report
```

For quick analysis of recent events, you can use:

```
# crm history events
# crm history log
```
+ Both `hb_report` and `crm history` commands require passwordless SSH between nodes
+ For more information, see SUSE Documentation - [Usage of hb\$1report for SLES HAE](https://www.suse.com/support/kb/doc/?id=000017501) 

# Alerting and monitoring
<a name="sap-hana-pacemaker-sles-ops-alert-monitor"></a>

This section covers the following topics.

**Topics**
+ [Using Amazon CloudWatch Application Insights](#_using_amazon_cloudwatch_application_insights)
+ [Using the cluster alert agents](#_using_the_cluster_alert_agents)

## Using Amazon CloudWatch Application Insights
<a name="_using_amazon_cloudwatch_application_insights"></a>

For monitoring and visibility of cluster state and actions, Application Insights includes metrics for monitoring enqueue replication state, cluster metrics, and SAP and high availability checks. Additional metrics, such as EFS and CPU monitoring can also help with root cause analysis.

For more information, see [Get started with Amazon CloudWatch Application Insights](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/appinsights-getting-started.html) and [SAP HANA High Availability on Amazon EC2](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/component-configuration-examples-hana-ha.html).

## Using the cluster alert agents
<a name="_using_the_cluster_alert_agents"></a>

Within the cluster configuration, you can call an external program (an alert agent) to handle alerts. This is a *push* notification. It passes information about the event via environment variables.

The agents can then be configured to send emails, log to a file, update a monitoring system, etc. For example, the following script can be used to access Amazon SNS.

```
#!/bin/sh

# alert_sns.sh
# modified from /usr/share/pacemaker/alerts/alert_smtp.sh.sample

##############################################################################
# SETUP
# * Create an SNS Topic and subscribe email or chatbot
# * Note down the ARN for the SNS topic
# * Give the IAM Role attached to both Instances permission to publish to the SNS Topic
# * Ensure the aws cli is installed
# * Copy this file to /usr/share/pacemaker/alerts/alert_sns.sh or other location on BOTH nodes
# * Ensure the permissions allow for hacluster and root to execute the script
# * Run the following as root (modify file location if necessary and replace SNS ARN):
#
# SLES:
# crm configure alert aws_sns_alert /usr/share/pacemaker/alerts/alert_sns.sh meta timeout=30s timestamp-format="%Y-%m-%d_%H:%M:%S" to <{ arn:aws:sns:region:account-id:myPacemakerAlerts  }>
#
# RHEL:
# pcs alert create id=aws_sns_alert path=/usr/share/pacemaker/alerts/alert_sns.sh meta timeout=30s timestamp-format="%Y-%m-%d_%H:%M:%S"
# pcs alert recipient add aws_sns_alert value=arn:aws:sns:region:account-id:myPacemakerAlerts
##############################################################################

# Additional information to send with the alerts
node_name=`uname -n`
sns_body=`env | grep CRM_alert_`

# Required for SNS
TOKEN=$(/usr/bin/curl --noproxy '*' -s -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")

# Get metadata
REGION=$(/usr/bin/curl --noproxy '*' -w "\n" -s -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/dynamic/instance-identity/document | grep region | awk -F\" '{print $4}')

sns_subscription_arn=${CRM_alert_recipient}

# Format depending on alert type
case ${CRM_alert_kind} in
   node)
     sns_subject="${CRM_alert_timestamp} ${cluster_name}: Node '${CRM_alert_node}' is now '${CRM_alert_desc}'"
   ;;
   fencing)
     sns_subject="${CRM_alert_timestamp} ${cluster_name}: Fencing ${CRM_alert_desc}"
   ;;
   resource)
     if [ ${CRM_alert_interval} = "0" ]; then
         CRM_alert_interval=""
     else
         CRM_alert_interval=" (${CRM_alert_interval})"
     fi
     if [ ${CRM_alert_target_rc} = "0" ]; then
         CRM_alert_target_rc=""
     else
         CRM_alert_target_rc=" (target: ${CRM_alert_target_rc})"
     fi
     case ${CRM_alert_desc} in
         Cancelled)
           ;;
         *)
           sns_subject="${CRM_alert_timestamp}: Resource operation '${CRM_alert_task}${CRM_alert_interval}' for '${CRM_alert_rsc}' on '${CRM_alert_node}': ${CRM_alert_desc}${CRM_alert_target_rc}"
           ;;
     esac
     ;;
   attribute)
     sns_subject="${CRM_alert_timestamp}: The '${CRM_alert_attribute_name}' attribute of the '${CRM_alert_node}' node was updated in '${CRM_alert_attribute_value}'"
     ;;
   *)
     sns_subject="${CRM_alert_timestamp}: Unhandled $CRM_alert_kind alert"
     ;;
esac

# Use this information to send the email
aws sns publish --topic-arn "${sns_subscription_arn}" --subject "${sns_subject}" --message "${sns_body}" --region ${REGION}
```

# Testing
<a name="sap-hana-pacemaker-sles-testing"></a>

We recommend scheduling regular fault scenario recovery testing at least annually, and as part of the operating system or HANA Upgrades that may impact operations. For more details on best practices for regular testing, see SAP Lens – [Best Practice 4.3 – Regularly test business continuity plans and fault recovery](https://docs.aws.amazon.com/wellarchitected/latest/sap-lens/best-practice-4-3.html).

The tests described here simulate failures. These can help you understand the behavior and operational requirements of your cluster.

In addition to checking the state of cluster resources, ensure that the service you are trying to protect is in the required state. Is client connectivity still possible? Define the recovery time to ensure that it aligns with your business objectives. Record recovery actions in runbooks.

**Topics**
+ [Test 1: Stop HANA on the primary node using `HDB kill-9`](#_test_1_stop_hana_on_the_primary_node_using_hdb_kill_9)
+ [Test 2: Simulate a hardware failure](#_test_2_simulate_a_hardware_failure)
+ [Test 3: Simulate a kernel panic](#_test_3_simulate_a_kernel_panic)
+ [Test 4: Simulate a network failure](#_test_4_simulate_a_network_failure)
+ [Test 5: Accidental shutdown](#_test_5_accidental_shutdown)
+ [Other Tests](#_other_tests)

## Test 1: Stop HANA on the primary node using `HDB kill-9`
<a name="_test_1_stop_hana_on_the_primary_node_using_hdb_kill_9"></a>

 **Why** – Tests cluster response to an immediate HANA process termination. This validates the cluster’s ability to detect and respond to critical database process failures and ensures proper failover mechanisms are working.

 **Simulate failure** – On `hanahost01` as `hdbadm`:

```
hdbadm> HDB kill-9
```

 **Expected behavior** – The cluster detects the HANA process failure and triggers immediate failover to the secondary node. The secondary node is promoted to primary, taking over the workload without attempting local recovery.

 **Recovery action** –

1. Monitor cluster status using `crm_mon -r` 

1. Verify HANA system replication status using `hdbnsutil -sr_state` 

1. If AUTOMATED\$1REGISTER is "false", manually reregister the former primary:
   + See more details on how to register the secondary in [HSR Setup](sap-hana-pacemaker-sles-hana-setup-hsr.md) :

     ```
     hdbnsutil -sr_register --name=<site_name> --remoteHost=<primary_host> --remoteInstance=<instance_number> --mode=sync --operationMode=logreplay
     ```

## Test 2: Simulate a hardware failure
<a name="_test_2_simulate_a_hardware_failure"></a>

 **Why** – Tests cluster response to complete node failure, validating proper fencing behavior and resource failover when a node becomes completely unresponsive.

 **Notes** – The double force option (`--force --force`) is used to simulate a hardware failure as closely as possible in a test environment. This command bypasses the system manager and forces an immediate shutdown without any cleanup, similar to a power loss or hardware failure. However, it’s important to note that this is still a simulation - some OS-level cleanup may still occur that wouldn’t happen in a real hardware failure or power loss scenario.

 **Simulate failure** – On `hanahost01` as `root`:

```
# poweroff --force --force
```

 **Expected behavior** – Corosync detects the loss of node communication and Pacemaker on the surviving node initiates fencing through the fencing agent, followed by promotion of the secondary HANA instance to primary. Application connections should automatically reconnect to the new primary.

 **Recovery action** –

1. Start the powered-off Amazon EC2 instance

1. Verify cluster status using `crm_mon -r` 

1. Clean up STONITH history using `crm resource refresh` 

1. Check HANA replication status using `hdbnsutil -sr_state` 

1. If AUTOMATED\$1REGISTER is "false", manually register as secondary

1. Verify application connectivity to the new primary

## Test 3: Simulate a kernel panic
<a name="_test_3_simulate_a_kernel_panic"></a>

 **Why** – Tests cluster response to catastrophic kernel failure, ensuring proper recovery mechanisms work when a node experiences a complete system crash.

 **Notes** – To simulate a system crash, you must first ensure that `/proc/sys/kernel/sysrq` is set to 1.

 **Simulate failure** – On `hanahost01` as `root`:

```
# echo 'c' > /proc/sysrq-trigger
```

 **Expected behavior** – The cluster detects node failure through lost heartbeat. The surviving node initiates fencing through the fencing agent, followed by promotion of the secondary HANA instance to primary.

 **Recovery action** –

1. Restart the node after kernel panic

1. Verify cluster status using `crm_mon -r` 

1. Clean up STONITH history using `crm resource refresh` 

1. Check HANA replication status using `hdbnsutil -sr_state` 

1. If AUTOMATED\$1REGISTER is "false", manually register as secondary

1. Verify all cluster resources are clean

## Test 4: Simulate a network failure
<a name="_test_4_simulate_a_network_failure"></a>

 **Why** – Tests cluster behavior during network partition scenarios, ensuring split-brain prevention mechanisms work and proper fencing occurs when nodes can’t communicate.

 **Notes** –
+ Iptables must be installed
+ Use a subnet in this command because of the secondary ring
+ Check for any existing iptables rules as iptables -F will flush all rules
+ Review pcmk\$1delay and priority parameters if you see neither node survives the fence race

 **Simulate failure** – On either node as root:

```
# iptables -A INPUT -s <CIDR_of_other_subnet> -j DROP; iptables -A OUTPUT -d <CIDR_of_other_subnet> -j DROP
```

 **Expected behavior** – The cluster detects the network failure and fences one of the nodes to avoid a split-brain situation. The surviving node assumes control of cluster resources.

 **Recovery action** –

1. If the failure is simulated on the surviving node, execute `iptables -F` to clear the network failure

1. Start the EC2 node and pacemaker service

1. Verify cluster status and resource placement

## Test 5: Accidental shutdown
<a name="_test_5_accidental_shutdown"></a>

 **Why** – Tests proper handling of shutdown scenarios, ensuring the cluster manages resources appropriately during both planned and unplanned shutdowns.

 **Notes** –
+ Avoid shutdowns without cluster awareness
+ We recommend the use of systemd to ensure predictable behavior
+ Ensure the resource dependencies are in place

 **Simulate failure** – Login to AWS Management Console, and stop the instance or issue a shutdown command.

 **Expected behavior** – The node which has been shut down fails. The cluster will move the resources which were running on the failed node to the surviving node. If systemd and resource dependencies are not configured correctly, the cluster may detect an unclean stop of cluster services and fence the shutting-down instance.

 **Recovery action** –

1. Start the EC2 node and pacemaker service

1. Verify cluster status and resource placement

1. Ensure resources are properly distributed according to constraints

## Other Tests
<a name="_other_tests"></a>

Consider these additional tests based on your environment and project requirements:
+  **Secondary Node Testing** 
  + Execute previous tests on the secondary, to ensure that secondary disruptions do not impact service availability on the primary
  + Execute previous tests with the nodes in reversed roles to validate full operational capability in either configuration
+  **Scale-out Testing** (for scale-out deployments)
  + Test failures on coordinator and worker nodes
  + Test concurrent failure of multiple worker nodes to verify failover order
  + Test failures with blocked access to storage access including /hana/shared
+  **Component-Level Testing** 
  + Test index server failures and measure recovery times
  + Validate Fast Start Option behavior and hook script execution
+  **Cluster Configuration Testing** 
  + Direct fencing operations using `stonith_admin -F <node_name>` 
  + Resource movement and constraint verification

Remember to document all test results, recovery times, and any unexpected behaviors for future reference and runbook updates.