

# SAP ASE for SAP NetWeaver on AWS: high availability configuration for SUSE Linux Enterprise Server (SLES) for SAP applications
<a name="ase-sles-ha"></a>

This topic applies to SUSE Linux Enterprise Server (SLES) operating system for SAP NetWeaver running SAP Adaptive Server Enterprise (ASE) database on AWS cloud. It covers the instructions for configuration of a pacemaker cluster for SAP ASE database when deployed on Amazon EC2 instances in two different Availability Zones within an AWS Region and FSx for ONTAP as the storage layer.

This topic covers the implementation of high availability using the cold standby method. For more information, see [SAP Note 1650511 – SYB: High Availability Offerings with SAP Adaptive Server Enterprise](https://me.sap.com/notes/1650511/E) (requires SAP portal access).

**Topics**
+ [Planning](ase-sles-ha-planning.md)
+ [Architecture diagram](ase-sles-ha-diagrams.md)
+ [Deployment](ase-sles-ha-deployment.md)
+ [Operations](ase-sles-ha-operations.md)

# Planning
<a name="ase-sles-ha-planning"></a>

This section covers the following topics.

**Topics**
+ [Prerequisites](#prerequisites)
+ [Reliability](#reliability)
+ [SAP and SUSE references](#references)
+ [Concepts](#concepts)

## Prerequisites
<a name="prerequisites"></a>

You must meet the following prerequisites before commencing setup.

**Topics**
+ [Deployed cluster infrastructure](#cluster)
+ [Supported operating system](#supported-os)
+ [Required access for setup](#access)

### Deployed cluster infrastructure
<a name="cluster"></a>

Ensure that your AWS networking requirements and Amazon EC2 instances where SAP workloads are installed, are correctly configured for SAP. For more information, see [SAP NetWeaver Environment Setup for Linux on AWS](https://docs.aws.amazon.com/sap/latest/sap-netweaver/std-sap-netweaver-environment-setup.html).

See the following SAP ASE pacemaker cluster specific requirements.
+ Two cluster nodes created in private subnets in separate Availability Zones within the same Amazon VPC and AWS Region
+ Access to the route table(s) that are associated with the chosen subnets

  For more information, see [Overlay IP](#overlay-ip).
+ Targeted Amazon EC2 instances must have connectivity to the Amazon EC2 endpoint via internet or a Amazon VPC endpoint.

### Supported operating system
<a name="supported-os"></a>

Protecting SAP ASE database with a pacemaker cluster requires packages from SUSE, including targeted cluster resource agents for SAP and AWS that may not be available in standard repositories.

For deploying SAP applications on SUSE, SAP and SUSE recommend using SUSE Linux Enterprise Server for SAP applications (SLES for SAP). SLES for SAP provides additional benefits, including Extended Service Pack Overlap Support (ESPOS), configuration and tuning packages for SAP applications, and High Availability Extensions (HAE). For more details, see SUSE website at [SUSE Linux Enterprise Server for SAP Applications](https://www.suse.com/products/sles-for-sap/).

SLES for SAP is available at [AWS Marketplace](https://aws.amazon.com/marketplace) with an hourly or annual subscription. You can also use the bring your own subscription (BYOS) model.

### Required access for setup
<a name="access"></a>

The following access is required for setting up the cluster.
+ An IAM user with the following privileges.
  + modify Amazon VPC route tables
  + modify Amazon EC2 instance properties
  + create IAM policies and roles
  + create Amazon EFS file systems
+ Root access to the operating system of both cluster nodes
+ SAP administrative user access – `<syb>adm` 

  In case of a new install, this user is created by the install process.

## Reliability
<a name="reliability"></a>

The SAP Lens of the Well-Architected framework, in particular the Reliability pillar, can be used to understand the reliability requirements for your SAP workload.

SAP ASE is a single point of failure in a highly available SAP architecture. The impact of an outage of this component must be evaluated against factors, such as, recovery point objective (RPO), recovery time objective (RTO), cost and operation complexity. For more information, see [Reliability](https://docs.aws.amazon.com/wellarchitected/latest/sap-lens/reliability.html) in SAP Lens - AWS Well-Architected Framework.

## SAP and SUSE references
<a name="references"></a>

In addition to this guide, see the following references for more details.
+  [SAP Note: 1650511 - SYB: High Availability Offerings with SAP Adaptive Server Enterprise](https://me.sap.com/notes/1650511/E) 
+  [SAP Note: 1656099 - SAP Applications on AWS: Supported DB/OS and Amazon EC2 products](https://me.sap.com/notes/1656099) 
+  [SAP Note: 1984787 - SUSE Linux Enterprise Server 12: Installation Notes](https://me.sap.com/notes/1984787) 
+  [SAP Note: 2578899 - SUSE Linux Enterprise Server 15: Installation Notes](https://me.sap.com/notes/2578899) 
+  [SAP Note: 1275776 - Linux: Preparing SLES for SAP environments](https://me.sap.com/notes/1275776) 

You must have SAP portal access for reading all SAP Notes.

## Concepts
<a name="concepts"></a>

This section covers AWS concepts.

**Topics**
+ [Availability Zones](#availability-zones)
+ [Overlay IP](#overlay-ip)
+ [Shared VPC](#shared-vpc)
+ [Amazon FSx for NetApp ONTAP](#fsx-ontap)
+ [Pacemaker - STONITH fencing agent](#stonith)

### Availability Zones
<a name="availability-zones"></a>

Availability Zone is one or more discreet data centers with redundant power, networking, and connectivity in an AWS Region. For more information, see [Regions and Availability Zones](https://aws.amazon.com/about-aws/global-infrastructure/regions_az/).

For mission critical deployments of SAP on AWS where the goal is to minimise the recovery time objective (RTO), we suggest distributing single points of failure across Availability Zones. Compared with single instance or single Availability Zone deployments, this increases resilience and isolation against a broad range of failure scenarios and issues, including natural disasters.

Each Availability Zone is physically separated by a meaningful distance (many kilometers) from another Availability Zone. All Availability Zones in an AWS Region re interconnected with high-bandwidth, low-latency network, over fully redundant, dedicated metro fiber. This enables synchronous replication. All traffic between Availability Zones is encrypted.

### Overlay IP
<a name="overlay-ip"></a>

Overlay IP enables a connection to the application, regardless of which Availability Zone (and subnet) contains the active primary node.

When deploying an Amazon EC2 instance in AWS, IP addresses are allocated from the CIDR range of the allocated subnet. The subnet cannot span across multiple Availability Zones, and therefore the subnet IP addresses may be unavailable after faults, including network connectivity or hardware issues which require a failover to the replication target in a different Availability Zone.

To address this, we suggest that you configure an overlay IP, and use this in the connection parameters for the application. This IP address is a non-overlapping RFC1918 private IP address from outside of VPC CIDR block and is configured as an entry in the route table or tables. The route directs the connection to the active node and is updated during a failover by the cluster software.

You can select any one of the following RFC1918 private IP addresses for your overlay IP address.
+ 10.0.0.0 – 10.255.255.255 (10/8 prefix)
+ 172.16.0.0 – 172.31.255.255 (172.16/12 prefix)
+ 192.168.0.0 – 192.168.255.255 (192.168/16 prefix)

If, for example, you use the 10/8 prefix in your SAP VPC, selecting a 172 or a 192 IP address may help to differentiate the overlay IP. Consider the use of an IP Address Management (IPAM) tool such as Amazon VPC IP Address Manager to plan, track, and monitor IP addresses for your AWS workloads. For more information, see [What is IPAM?](https://docs.aws.amazon.com/vpc/latest/ipam/what-it-is-ipam.html) 

The overlay IP agent in the cluster can also be configured to update multiple route tables which contain the Overlay IP entry if your subnet association or connectivity requires it.

 **Access to overlay IP** 

The overlay IP is outside of the range of the VPC, and therefore cannot be reached from locations that are not associated with the route table, including on-premises and other VPCs.

Use [AWS Transit Gateway](https://docs.aws.amazon.com/vpc/latest/tgw/what-is-transit-gateway.html) as a central hub to facilitate the network connection to an overlay IP address from multiple locations, including Amazon VPCs, other AWS Regions, and on-premises using [AWS Direct Connect](https://docs.aws.amazon.com/directconnect/latest/UserGuide/Welcome.html) or [AWS Client VPN](https://docs.aws.amazon.com/vpn/latest/clientvpn-admin/what-is.html).

If you do not have AWS Transit Gateway set up as a network transit hub or if it is not available in your preferred AWS Region, you can use a [Network Load Balancer](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/introduction.html) to enable network access to an overlay IP.

For more information, see [SAP on AWS High Availability with Overlay IP Address Routing](https://docs.aws.amazon.com/sap/latest/sap-hana/sap-ha-overlay-ip.html).

### Shared VPC
<a name="shared-vpc"></a>

An enterprise landing zone setup or security requirements may require the use of a separate cluster account to restrict the route table access required for the Overlay IP to an isolated account. For more information, see [Share your VPC with other accounts](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-sharing.html).

Evaluate the operational impact against your security posture before setting up shared VPC. To set up, see [Shared VPC – optional](https://docs.aws.amazon.com/sap/latest/sap-netweaver/sles-netweaver-ha-settings.html#sles-netweaver-ha-shared-vpc).

### Amazon FSx for NetApp ONTAP
<a name="fsx-ontap"></a>

Amazon FSx for NetApp ONTAP is a fully managed service that provides highly reliable, scalable, high-performing, and feature-rich file storage built on NetApp’s popular ONTAP file system. FSx for ONTAP combines the familiar features, performance, capabilities, and API operations of NetApp file systems with the agility, scalability, and simplicity of a fully managed AWS service.

FSx for ONTAP also provides highly available and durable storage with fully managed backups and support for cross-Region disaster recovery. To make it easier to protect and secure your data, FSx for ONTAP supports popular data security and anti-virus applications. For more information, see [What is Amazon FSx for NetApp ONTAP?](https://docs.aws.amazon.com/fsx/latest/ONTAPGuide/what-is-fsx-ontap.html) 

### Pacemaker - STONITH fencing agent
<a name="stonith"></a>

In a two-node cluster setup for a primary resource and its replication pair, it is important that there is only one node in the primary role with the ability to modify your data. In the event of a failure scenario where a node is unresponsive or incommunicable, ensuring data consistency requires that the faulty node is isolated by powering it down before the cluster commences other actions, such as promoting a new primary. This arbitration is the role of the fencing agent.

Since a two-node cluster introduces the possibility of a fence race in which a dual shoot out can occur with communication failures resulting in both nodes simultaneously claiming, "I can’t see you, so I am going to power you off". The fencing agent is designed to minimise this risk by providing an external witness.

SLES supports several fencing agents, including the one recommended for use with Amazon EC2 Instances (external/ec2). This resource uses API commands to check its own instance status - "Is my instance state anything other than running?" before proceeding to power off its pair. If it is already in a stopping or stopped state it will admit defeat and leave the surviving node untouched.

# Architecture diagram
<a name="ase-sles-ha-diagrams"></a>

The following diagram shows the cold standby SAP ASE cluster setup with FSx for ONTAP.

![\[Pacemaker Cluster for SAP ASE\]](http://docs.aws.amazon.com/sap/latest/sap-AnyDB/images/ase_sles.png)


# Deployment
<a name="ase-sles-ha-deployment"></a>

This section covers the following topics.

**Topics**
+ [Settings and prerequisites](ase-sles-ha-settings.md)
+ [SAP ASE and cluster setup](ase-sles-ha-setup.md)
+ [Cluster configuration](ase-sles-ha-cluster-configuration.md)

# Settings and prerequisites
<a name="ase-sles-ha-settings"></a>

The cluster setup uses parameters, including `DBSID` that is unique to your setup. It is useful to predetermine the values with the following examples and guidance.

**Topics**
+ [Define reference parameters for setup](#define-parameters)
+ [Amazon EC2 instance settings](#instance-settings)
+ [Operating system prerequisites](#os-prerequisites)
+ [IP and hostname resolution prerequisites](#ip-prerequisites)
+ [FSx for ONTAP prerequisites](#filesystem-prerequisites)
+ [Shared VPC – *optional*](#ase-sles-ha-shared-vpc)

## Define reference parameters for setup
<a name="define-parameters"></a>

The cluster setup relies on the following parameters.

**Topics**
+ [Global AWS parameters](#global-aws-parameters)
+ [Amazon EC2 instance parameters](#ec2-parameters)
+ [SAP and Pacemaker resource parameters](#sap-pacemaker-resource-parameters)
+ [SLES cluster parameters](#sles-cluster-parameters)

### Global AWS parameters
<a name="global-aws-parameters"></a>


| Name | Parameter | Example | 
| --- | --- | --- | 
|   AWS account ID  |   `<account_id>`   |   `123456789100`   | 
|   AWS Region  |   `<region_id>`   |   `us-east-1`   | 
+  AWS account – For more details, see [Your AWS account ID and its alias](https://docs.aws.amazon.com/IAM/latest/UserGuide/console_account-alias.html).
+  AWS Region – For more details, see [Describe your Regions](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html#using-regions-availability-zones-describe).

### Amazon EC2 instance parameters
<a name="ec2-parameters"></a>


| Name | Parameter | Primary example | Secondary example | 
| --- | --- | --- | --- | 
|  Amazon EC2 instance ID  |   `<instance_id>`   |   `i-xxxxinstidforhost1`   |   `i- xxxxinstidforhost2`   | 
|  Hostname  |   `<hostname>`   |   `slxdbhost01`   |   `slxdbhost02`   | 
|  Host IP  |   `<host_ip>`   |   `10.1.10.1`   |   `10.1.20.1`   | 
|  Host additional IP  |   `<host_additional_ip>`   |   `10.1.10.2`   |   `10.1.20.2`   | 
|  Configured subnet  |   `<subnet_id>`   |   `subnet-xxxxxxxxxxsubnet1`   |   `subnet-xxxxxxxxxxsubnet2`   | 
+ Hostname – Hostnames must comply with SAP requirements outlined in [SAP Note 611361 - Hostnames of SAP ABAP Platform servers](https://me.sap.com/notes/611361) (requires SAP portal access).

  Run the following command on your instances to retrieve the hostname.

  ```
  hostname
  ```
+ Amazon EC2 instance ID – run the following command (IMDSv2 compatible) on your instances to retrieve instance metadata.

  ```
  /usr/bin/curl --noproxy '*' -w "\n" -s -H "X-aws-ec2-metadata-token: $(curl --noproxy '*' -s -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")" http://169.254.169.254/latest/meta-data/instance-id
  ```

  For more details, see [Retrieve instance metadata](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html) and [Instance identity documents](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-identity-documents.html).

### SAP and Pacemaker resource parameters
<a name="sap-pacemaker-resource-parameters"></a>


| Name | Parameter | Example | 
| --- | --- | --- | 
|  DBSID  |   `<DBSID>` or `<dbsid>`   |   `ASD`   | 
|  Virtual hostname  |   `<db_virt_hostname>`   |   `slxvdb`   | 
|  Database Overlay IP  |   `<ase_db_oip>`   |   `172.16.0.29`   | 
|  VPC Route Tables  |   `<rtb_id>`   |   `rtb-xxxxxroutetable1`   | 
|  FSx for ONTAP mount points  |   `<ase_db_fs>`   |   `svm-xxx.fs-xxx.fsx.us-east-1.amazonaws.com`   | 
+ SAP details – SAP parameters must follow the guidance and limitations of SAP and Software Provisioning Manager. Refer to [SAP Note 1979280 - Reserved SAP System Identifiers (SAPSID) with Software Provisioning Manager](https://me.sap.com/notes/1979280) for more details.

  Post-installation, use the following command to find the details of the instances running on a host.

  ```
  sudo /usr/sap/hostctrl/exe/saphostctrl -function ListDatabases
  ```
+ Overlay IP – This value is defined by you. For more information, see [Overlay IP](https://docs.aws.amazon.com/sap/latest/sap-netweaver/sles-ha.html#overlay-ip).
+ FSx for ONTAP mount points – This value is defined by you. Consider the required mount points specified in [SAP ASE on AWS with Amazon FSx for NetApp ONTAP](https://docs.aws.amazon.com/sap/latest/sap-AnyDB/sap-ase-amazon-fsx.html).

### SLES cluster parameters
<a name="sles-cluster-parameters"></a>


| Name | Parameter | Example | 
| --- | --- | --- | 
|  Cluster user  |   `cluster_user`   |   `hacluster`   | 
|  Cluster password  |   `cluster_password`   |  | 
|  Cluster tag  |   `cluster_tag`   |   `pacemaker`   | 
|   AWS CLI cluster profile  |   `aws_cli_cluster_profile`   |   `cluster`   | 

## Amazon EC2 instance settings
<a name="instance-settings"></a>

Amazon EC2 instance settings can be applied using Infrastructure as Code or manually using AWS Command Line Interface or AWS Management Console. We recommend Infrastructure as Code automation to reduce manual steps, and ensure consistency.

**Topics**
+ [Create IAM roles and policies](#iam)
+ [AWS Overlay IP policy](#overlay-ip-policy)
+ [Assign IAM role](#role)
+ [Modify security groups for cluster communication](#security-groups)
+ [Disable source/destination check](#disable-check)
+ [Review automatic recovery and stop protection](#auto-recovery)
+ [Create Amazon EC2 resource tags used by Amazon EC2 STONITH agent](#stonith-tags)

### Create IAM roles and policies
<a name="iam"></a>

In addition to the permissions required for standard SAP operations, two IAM policies are required for the cluster to control AWS resources on ASCS. These policies must be assigned to your Amazon EC2 instance using an IAM role. This enables Amazon EC2 instance, and therefore the cluster to call AWS services.

Create these policies with least-privilege permissions, granting access to only the specific resources that are required within the cluster. For multiple clusters, you need to create multiple policies.

For more information, see \$1https---docs-aws-amazon-com-AWSEC2-latest-UserGuide-iam-roles-for-amazon-ec2-html-ec2-instance-profile\$1[IAM roles for Amazon EC2].

#### STONITH policy
<a name="stonith"></a>

The SLES STONITH resource agent (`external/ec2`) requires permission to start and stop both the nodes of the cluster. Create a policy as shown in the following example. Attach this policy to the IAM role assigned to both Amazon EC2 instances in the cluster.

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeInstances",
        "ec2:DescribeTags"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "ec2:StartInstances",
        "ec2:StopInstances"
      ],
      "Resource": [
        "arn:aws:ec2:us-east-1:123456789012:instance/arn:aws:ec2:us-east-1:123456789012:instance/i-1234567890abcdef0",
        "arn:aws:ec2:us-east-1:123456789012:instance/arn:aws:ec2:us-east-1:123456789012:instance/i-1234567890abcdef0"
      ]
    }
  ]
}
```

### AWS Overlay IP policy
<a name="overlay-ip-policy"></a>

The SLES Overlay IP resource agent (`aws-vpc-move-ip`) requires permission to modify a routing entry in route tables. Create a policy as shown in the following example. Attach this policy to the IAM role assigned to both Amazon EC2 instances in the cluster.

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "ec2:ReplaceRoute",
            "Resource": [
                 "arn:aws:ec2:us-east-1:123456789012:route-table/rtb-0123456789abcdef0",
                 "arn:aws:ec2:us-east-1:123456789012:route-table/rtb-0123456789abcdef0"
                        ]
        },
        {
            "Effect": "Allow",
            "Action": "ec2:DescribeRouteTables",
            "Resource": "*"
        }
    ]
}
```

**Note**  
If you are using a Shared VPC, see [Shared VPC cluster resources](#shared-vpc-clsuter-resources).

### Assign IAM role
<a name="role"></a>

The two cluster resource IAM policies must be assigned to an IAM role associated with your Amazon EC2 instance. If an IAM role is not associated to your instance, create a new IAM role for cluster operations. To assign the role, go to https://console.aws.amazon.com/ec2/, select your instance(s), and then choose **Actions** > **Security** > **Modify IAM role**.

### Modify security groups for cluster communication
<a name="security-groups"></a>

A security group controls the traffic that is allowed to reach and leave the resources that it is associated with. For more information, see [Control traffic to your AWS resources using security groups](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-security-groups.html).

In addition to the standard ports required to access SAP and administrative functions, the following rules must be applied to the security groups assigned to both Amazon EC2 instances in the cluster.


**Inbound**  

| Source | Protocol | Port range | Description | 
| --- | --- | --- | --- | 
|  The security group ID (its own resource ID)  |   **UDP**   |  5405  |  Allows UDP traffic between cluster resources for corosync communication  | 
|  Bastion host security group or CIDR range for administration  |   **TCP**   |  7630  |   *optional* Used for SLES Hawk2 Interface for monitoring and administration using a Web Interface For more information, see [Configuring and Managing Cluster Resources with Hawk2](https://documentation.suse.com/sle-ha/12-SP5/html/SLE-HA-all/cha-conf-hawk2.html) in the SUSE documentation.  | 

**Note**  
Note the use of the UDP protocol.

If you are running a local firewall, such as `iptables`, ensure that communication on the preceding ports is allowed between two Amazon EC2 instances.

### Disable source/destination check
<a name="disable-check"></a>

Amazon EC2 instances perform source/destination checks by default, requiring that an instance is either the source or the destination of any traffic it sends or receives.

In the pacemaker cluster, source/destination check must be disabled on both instances receiving traffic from the Overlay IP. You can disable check using AWS CLI or AWS Management Console.

 AWS CLI  
+ Use the [modify-instance-attribute](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/ec2/modify-instance-attribute.html) command to disable source/destination check.

  Run the following commands for both instances in the cluster.

  \$1\$1

```
aws ec2 modify-instance-attribute --instance-id <i-xxxxinstidforhost1> --no-source-dest-check
```

\$1\$1

```
aws ec2 modify-instance-attribute --instance-id <i-xxxxinstidforhost2> --no-source-dest-check
```

 AWS Management Console  
+ Ensure that the **Stop** option is checked in https://console.aws.amazon.com/ec2/.  
![\[Change Source / destination check.\]](http://docs.aws.amazon.com/sap/latest/sap-AnyDB/images/sles_stop_option.png)

### Review automatic recovery and stop protection
<a name="auto-recovery"></a>

After a failure, cluster-controlled operations must be resumed in a coordinated way. This helps ensure that the cause of failure is known and addressed, and the status of the cluster is as expected. For example, verifying that there are no pending fencing actions.

This can be achieved by not enabling pacemaker to run as a service at the operating system level or by avoiding auto restarts for hardware failure.

If you want to control the restarts resulting from hardware failure, disable simplified automatic recovery and do not configure Amazon CloudWatch action-based recovery for Amazon EC2 instances that are part of a pacemaker cluster. Use the following commands on both Amazon EC2 instances in the pacemaker cluster, to disable simplified automatic recovery via AWS CLI. If making the change via AWS CLI, run the command for both Amazon EC2 instances in the cluster.

**Note**  
Modifying instance maintenance options will require admin privileges not covered by the IAM instance roles defined for operations of the cluster.

```
aws ec2 modify-instance-maintenance-options --instance-id <i-xxxxinstidforhost1> --auto-recovery disabled
```

```
aws ec2 modify-instance-maintenance-options --instance-id <i-xxxxinstidforhost2> --auto-recovery disabled
```

To ensure that STONITH actions can be executed, you must ensure that stop protection is disabled for Amazon EC2 instances that are part of a pacemaker cluster. If the default settings have been modified, use the following commands for both instances to disable stop protection via AWS CLI.

**Note**  
Modifying instance attributes will require admin privileges not covered by the IAM instance roles defined for operations of the cluster.

```
aws ec2 modify-instance-attribute --instance-id <i-xxxxinstidforhost1> --no-disable-api-stop
```

```
aws ec2 modify-instance-attribute --instance-id <i-xxxxinstidforhost2> --no-disable-api-stop
```

### Create Amazon EC2 resource tags used by Amazon EC2 STONITH agent
<a name="stonith-tags"></a>

Amazon EC2 STONITH agent uses AWS resource tags to identify Amazon EC2 instances. Create tag for the primary and secondary Amazon EC2 instances via AWS Management Console or AWS CLI. For more information, see [Tagging your AWS resources](https://docs.aws.amazon.com/tag-editor/latest/userguide/tagging.html).

Use the same tag key and the local hostname returned using the command `hostname` across instances. For example, a configuration with the values defined in [Global AWS parameters](https://docs.aws.amazon.com/sap/latest/sap-netweaver/sles-setup.html#global-aws-parameters) would require the tags shown in the following table.


| Amazon EC2 | Key example | Value example | 
| --- | --- | --- | 
|  Instance 1  |  pacemaker  |  slxdbhost1  | 
|  Instance 2  |  pacemaker  |  slxdbhost2  | 

You can run the following command locally to validate the tag values and IAM permissions to describe the tags.

```
aws ec2 describe-tags --filters "Name=resource-id,Values=<instance_id>" "Name=key,Values=<pacemaker_tag>" --region=<region> --output=text | cut -f5
```

## Operating system prerequisites
<a name="os-prerequisites"></a>

This section covers the following topics.

**Topics**
+ [Root access](#root-access)
+ [Install missing operating system packages](#os-packages)
+ [Update and check operating system versions](#confirm-versions)
+ [System logging](#system-logging)
+ [Time synchronization services](#time-sync)
+ [AWS CLI profile](#cli-profile)
+ [Pacemaker proxy settings](#proxy-settings)

### Root access
<a name="root-access"></a>

Verify root access on both cluster nodes. The majority of the setup commands in this document are performed with the root user. Assume that commands should be run as root unless there is an explicit call out to choose otherwise.

### Install missing operating system packages
<a name="os-packages"></a>

This is applicable to both cluster nodes. You must install any missing operating system packages.

The following packages and their dependencies are required for the pacemaker setup. Depending on your baseline image, for example, SLES for SAP, these packages may already be installed.

```
aws-cli
chrony
cluster-glue
corosync
crmsh
dstat
fence-agents
ha-cluster-bootstrap
iotop
pacemaker
patterns-ha-ha_sles
resource-agents
rsyslog
sap-suse-cluster-connectorsapstartsrv-resource-agents
```

We highly recommend installing the following additional packages for troubleshooting.

```
zypper-lifecycle-plugin
supportutils
yast2-support
supportutils-plugin-suse-public-cloud
supportutils-plugin-ha-sap
```

**Important**  
Ensure that you have installed the newer version `sap-suse-cluster-connector` (**dashes**), and not the older version `sap_suse_cluster_connector` that uses underscores.

Use the following command to check packages and versions.

```
for package in aws-cli chrony cluster-glue corosync crmsh dstat fence-agents ha-cluster-bootstrap iotop pacemaker patterns-ha-ha_sles resource-agents rsyslog sap-suse-cluster-connector sapstartsrv-resource-agents zypper-lifecycle-plugin supportutils yast2-support supportutils-plugin-suse-public-cloud supportutils-plugin-ha-sap; do
echo "Checking if ${package} is installed..."
RPM_RC=$(rpm -q ${package} --quiet; echo $?)
if [ ${RPM_RC} -ne 0 ];then
echo "   ${package} is missing and needs to be installed"
fi
done
```

If a package is not installed, and you are unable to install it using `zypper`, it may be because SUSE Linux Enterprise High Availability extension is not available as a repository in your chosen image. You can verify the availability of the extension using the following command.

```
zypper repos
```

To install or update a package or packages with confirmation, use the following command.

```
zypper install <package_name(s)>
```

### Update and check operating system versions
<a name="confirm-versions"></a>

You must update and confirm versions across nodes. Apply all the latest patches to your operating system versions. This ensures that bugs are addresses and new features are available.

You can update the patches individually or use the `zypper` update. A clean reboot is recommended prior to setting up a cluster.

```
zypper update
reboot
```

Compare the operating system package versions on the two cluster nodes and ensure that the versions match on both nodes.

### System logging
<a name="system-logging"></a>

This is applicable to both cluster nodes. We recommend using the `rsyslogd` daemon for logging. It is the default configuration in the cluster. Verify that the `rsyslog` package is installed on both cluster nodes.

 `logd` is a subsystem to log additional information coming from the STONITH agent.

```
systemctl enable --now logd
systemctl status logd
```

### Time synchronization services
<a name="time-sync"></a>

This is applicable to both cluster nodes. Time synchronization is important for cluster operation. Ensure that `chrony` rpm is installed, and configure appropriate time servers in the configuration file.

You can use Amazon Time Sync Service that is available on any instance running in a VPC. It does not require internet access. To ensure consistency in the handling of leap seconds, don’t mix Amazon Time Sync Service with any other `ntp` time sync servers or pools.

Create or check the `/etc/chrony.d/ec2.conf` file to define the server.

```
#Amazon EC2 time source config
server 169.254.169.123 prefer iburst minpoll 4 maxpoll 4
```

Start the `chronyd.service`, using the following command.

```
systemctl enable --now chronyd.service
systemctl status chronyd
```

For more information, see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-time.html[Set the time for your Linux instance].

### AWS CLI profile
<a name="cli-profile"></a>

This is applicable to both cluster nodes. The cluster resource agents use AWS Command Line Interface (AWS CLI). You need to create an AWS CLI profile for the root account on both instances.

You can either edit the config file at `/root/.aws` manually or by using [aws configure](https://docs.aws.amazon.com/cli/latest/reference/configure/index.html) AWS CLI command.

You can skip providing the information for the access and secret access keys. The permissions are provided through IAM roles attached to Amazon EC2 instances.

```
aws configure
{aws} Access Key ID [None]:
{aws} Secret Access Key [None]:
Default region name [None]: <region_id>
Default output format [None]:
```

The profile name is configurable. The name chosen in this example is **cluster** – it is used in [Create Amazon EC2 resource tags used by Amazon EC2 STONITH agent](https://docs.aws.amazon.com/sap/latest/sap-netweaver/sles-netweaver-ha-settings.html#stonith-tags). The AWS Region must be the default AWS Region of the instance.

```
aws configure –-profile <cluster>
{aws} Access Key ID [None]:
{aws} Secret Access Key [None]:
Default region name [None]: <region_id>
Default output format [None]:
```

### Pacemaker proxy settings
<a name="proxy-settings"></a>

This is applicable to both cluster nodes. If your Amazon EC2 instance has been configured to access the internet and/or AWS Cloud through proxy servers, then you need to replicate the settings in the pacemaker configuration. For more information, see [Use an HTTP proxy](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-proxy.html).

Add the following lines to `/etc/sysconfig/pacemaker`.

```
http_proxy=http://<proxyhost>:<proxyport>
https_proxy= http://<proxyhost>:<proxyport>
no_proxy=127.0.0.1,localhost,169.254.169.254,fd00:ec2::254
```

Modify `proxyhost` and `proxyport` to match your settings. Ensure that you exempt the address used to access the instance metadata. Configure `no_proxy` to include the IP address of the instance metadata service – ** `169.254.169.254` ** (IPV4) and ** `fd00:ec2::254` ** (IPV6). This address does not vary.

## IP and hostname resolution prerequisites
<a name="ip-prerequisites"></a>

This section covers the following topics.

**Topics**
+ [Primary and secondary IP addresses](#ip-addresses)
+ [Add initial VPC route table entries for overlay IPs](#route-entries)
+ [Add overlay IPs to host IP configuration](#overlay-host)
+ [Hostname resolution](#hostname-resolution)

### Primary and secondary IP addresses
<a name="ip-addresses"></a>

This is applicable to both cluster nodes. We recommend defining a redundant communication channel (a second ring) in `corosync` for SUSE clusters. The cluster nodes can use the second ring to communicate in case of underlying network disruptions.

Create a redundant communication channel by adding a secondary IP address on both nodes.

Add a secondary IP address on both nodes. These IPs are only used in cluster configurations. They provide the same fault tolerance as a secondary Elastic Network Interface (ENI). For more information, see [Assign a secondary private IPv4 address](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/MultipleIP.html#ManageMultipleIP).

On correct configuration, the following command returns two IPs from the same subnet on both, the primary and secondary node.

```
ip -o -f inet addr show eth0 | awk -F " |/" '{print $7}'
```

These IP addresses are required for `ring0_addr` and `ring1_addr` in `corosync.conf`.

### Add initial VPC route table entries for overlay IPs
<a name="route-entries"></a>

You need to add initial route table entries for overlay IPs. For more information on overlay IP, see [Overlay IP](https://docs.aws.amazon.com/sap/latest/sap-netweaver/sles-ha.html#overlay-ip).

Add entries to the VPC route table or tables associated with the subnets of your Amazon EC2 instance for the cluster. The entries for destination (overlay IP CIDR) and target (Amazon EC2 instance or ENI) must be added manually for SAP ASE database. This ensures that the cluster resource has a route to modify. It also supports the install of SAP using the virtual names associated with the overlay IP before the configuration of the cluster.

 **Modify or add a route to a route table using AWS Management Console** 

1. Open the Amazon VPC console at https://console.aws.amazon.com/vpc/.

1. In the navigation pane, choose **Route Tables**, and select the route table associated with the subnets where your instances have been deployed.

1. Choose **Actions**, **Edit routes**.

1. To add a route, choose **Add route**.

   1. Add your chosen overlay IP address CIDR and the instance ID of your primary instance for SAP ASE database. See the following table for an **example**.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sap/latest/sap-AnyDB/ase-sles-ha-settings.html)

1. Choose **Save changes**.

The preceding steps can also be performed programmatically. We suggest performing the steps using administrative privileges, instead of instance-based privileges to preserve least privilege. CreateRoute API isn’t necessary for ongoing operations.

Run the following command as a dry run on both nodes to confirm that the instances have the necessary permissions.

```
aws ec2 replace-route --route-table-id <rtb-xxxxxroutetable1> --destination-cidr-block <172.16.0.29/32> --instance-id <i-xxxxinstidforhost1> --dry-run --profile <aws_cli_cluster_profile>
```

### Add overlay IPs to host IP configuration
<a name="overlay-host"></a>

You must configure the overlay IP as an additional IP address on the standard interface to enable SAP install. This action is managed by the cluster IP resource. However, to install SAP using the correct IP addresses prior to having the cluster configuration in place, you need to add these entries manually.

If you need to reboot the instance during setup, the assignment is lost, and must be re-added.

See the following **example**. You must update the command with your chosen IP addresses.

On EC2 instance 1, where you are installing SAP ASE database, add the overlay IP.

```
ip addr add <172.16.0.29/32> dev eth0
```

### Hostname resolution
<a name="hostname-resolution"></a>

This is applicable to both cluster nodes. You must ensure that both instances can resolve all hostnames in use. Add the hostnames for cluster nodes to `/etc/hosts` file on both cluster nodes. This ensures that hostnames for cluster nodes can be resolved even in case of DNS issues. See the following example.

```
cat /etc/hosts
<10.1.10.1 slxdbhost01.example.com slxdbhost01>
<10.1.20.1 slxdbhost02.example.com slxdbhost02>
<172.16.0.29 slxvdb.example.com slxvdb>
```

In this example, the secondary IPs used for the second cluster ring are not mentioned. They are only used in the cluster configuration. You can allocate virtual hostnames for administration and identification purposes.

**Important**  
The overlay IP is out of VPC range, and cannot be reached from locations not associated with the route table, including on-premises.

## FSx for ONTAP prerequisites
<a name="filesystem-prerequisites"></a>

This section covers the following topics.

**Topics**
+ [Shared file systems](#shared-filesystems)
+ [Create volumes and file systems](#create-filesystems)

### Shared file systems
<a name="shared-filesystems"></a>

Amazon FSx for NetApp ONTAP is supported for SAP ASE database file systems.

FSx for ONTAP provides fully managed shared storage in AWS Cloud with data access and management capabilities of ONTAP. For more information, see [Create an Amazon FSx for NetApp ONTAP file system](https://docs.aws.amazon.com/fsx/latest/ONTAPGuide/getting-started-step1.html).

Select a file system based on your business requirements, evaluating the resilience, performance, and cost of your choice.

The SVM’s DNS name is your simplest mounting option. The file system DNS name automatically resolves to the mount target’s IP address on the Availability ZOne of the connecting Amazon EC2 instance.

 `svm-id.fs-id.fsx.aws-region.amazonaws.com` 

**Note**  
Review the `enableDnsHostnames` and `enableDnsSupport` DNS attributes for your VPC. For more information, see [View and update DNS attributes for your VPC](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-dns.html#vpc-dns-updating).

### Create volumes and file systems
<a name="create-filesystems"></a>

You can review the following resources to understand the FSx for ONTAP mount points for SAP ASE database.
+  [Host setup for SAP ASE](https://docs.aws.amazon.com/sap/latest/sap-AnyDB/host-setup-fsx-sap-ase.html) 
+ SAP – [Setup of Database Layout](https://help.sap.com/docs/SLTOOLSET/e345db692e3c43928199d701df58c0d8/f231f7924dd34e9e85291bfb9af709f1.html?version=CURRENT_VERSION) (ABAP)
+ SAP – [Setup of Database Layout](https://help.sap.com/docs/SLTOOLSET/01f04921ac57452983980fe83a3ce10d/f231f7924dd34e9e85291bfb9af709f1.html?version=CURRENT_VERSION) (JAVA)

The following are the FSx for ONTAP mount points covered in this topic.


| Unique NFS Location (example) | File system location | 
| --- | --- | 
|  SVM-xxx:/sybase  |  /sybase  | 
|  SVM-xxx:/asedata  |  /sybase/<DBSID>/sapdata\$11  | 
|  SVM-xxx:/aselog  |  /sybase/<DBSID>/saplog\$11  | 
|  SVM-xxx:/sapdiag  |  /sybase/<DBSID>/sapdiag  | 
|  SVM-xxx:/saptmp  |  /sybase/<DBSID>/saptmp  | 
|  SVM-xxx:/backup  |  /sybasebackup  | 
|  SVM-xxx:/usrsap  |  /usr/sap  | 

Ensure that you have properly mounted the file systems, and the necessary adjustments for host setup have been performed. See [Host setup for SAP ASE](https://docs.aws.amazon.com/sap/latest/sap-AnyDB/host-setup-fsx-sap-ase.html). You can temporarily add the entries to `/etc/fstab` to not lose them during a reboot. The entries must be removed prior to configuring the cluster. The cluster resource manages the mounting of the NFS.

You need to perform this step only on the primary Amazon EC2 instance for the initial installation.

Review the mount options to ensure that they match with your operating system, NFS file system type, and SAP’s latest recommendations.

Use the following command to check that the required file systems are available.

```
df -h
```

## Shared VPC – *optional*
<a name="ase-sles-ha-shared-vpc"></a>

Amazon VPC sharing enables you to share subnets with other AWS accounts within the same AWS Organizations. Amazon EC2 instances can be deployed using the subnets of the shared Amazon VPC.

In the pacemaker cluster, the `aws-vpc-move-ip` resource agent has been enhanced to support a shared VPC setup while maintaining backward compatibility with previous existing features.

The following checks and changes are required. We refer to the AWS account that owns Amazon VPC as the sharing VPC account, and to the consumer account where the cluster nodes are going to be deployed as the cluster account.

This section covers the following topics.

**Topics**
+ [Minimum version requirements](#minimum-version-requirements)
+ [IAM roles and policies](#iam-roles-policies)
+ [Shared VPC cluster resources](#shared-vpc-clsuter-resources)

### Minimum version requirements
<a name="minimum-version-requirements"></a>

The latest version of the `aws-vpc-move-ip` agent shipped with SLES15 SP3 supports the shared VPC setup by default. The following are the minimum version required to support a shared VPC Setup:
+ SLES 12 SP5 - resource-agents-4.3.018.a7fb5035-3.79.1.x86\$164
+ SLES 15 SP2 - resource-agents-4.4.0\$1git57.70549516-3.30.1.x86\$164
+ SLES 15 SP3 - resource-agents-4.8.0\$1git30.d0077df0-8.5.1

### IAM roles and policies
<a name="iam-roles-policies"></a>

Using the overlay IP agent with a shared Amazon VPC requires a different set of IAM permissions to be granted on both AWS accounts (sharing VPC account and cluster account).

#### Sharing VPC account
<a name="sharing-vpc-account"></a>

In sharing VPC account, create an IAM role to delegate permissions to the EC2 instances that will be part of the cluster. During the IAM Role creation, select "Another AWS account" as the type of trusted entity, and enter the AWS account ID where the EC2 instances will be deployed/running from.

After the IAM role has been created, create the following IAM policy on the sharing VPC account, and attach it to an IAM role. Add or remove route table entries as needed.

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "VisualEditor0",
      "Effect": "Allow",
      "Action": "ec2:ReplaceRoute",
      "Resource": [
        "arn:aws:ec2:us-east-1:123456789012:route-table/rtb-0123456789abcdef0",
        "arn:aws:ec2:us-east-1:123456789012:route-table/rtb-0123456789abcdef0"
      ]
    },
    {
      "Sid": "VisualEditor1",
      "Effect": "Allow",
      "Action": "ec2:DescribeRouteTables",
      "Resource": "*"
    }
  ]
}
```

Next, edit move to the "Trust relationships" tab in the IAM role, and ensure that the AWS account you entered while creating the role has been correctly added.

#### Cluster account
<a name="cluster-account"></a>

In cluster account, create the following IAM policy, and attach it to an IAM role. This is the IAM Role that is going to be attached to the EC2 instances.

 ** AWS STS policy** 

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "VisualEditor0",
      "Effect": "Allow",
      "Action": "sts:AssumeRole",
      "Resource": "arn:aws:iam::123456789012:role/sharing-vpc-account-cluster-role"
    }
  ]
}
```

 **STONITH policy** 

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "VisualEditor0",
      "Effect": "Allow",
      "Action": [
        "ec2:StartInstances",
        "ec2:StopInstances"
      ],
      "Resource": [
        "arn:aws:ec2:us-east-1:123456789012:instance/arn:aws:ec2:us-east-1:123456789012:instance/i-1234567890abcdef0",
        "arn:aws:ec2:us-east-1:123456789012:instance/arn:aws:ec2:us-east-1:123456789012:instance/i-1234567890abcdef0"
      ]
    },
    {
      "Sid": "VisualEditor1",
      "Effect": "Allow",
      "Action": "ec2:DescribeInstances",
      "Resource": "*"
    }
  ]
}
```

### Shared VPC cluster resources
<a name="shared-vpc-clsuter-resources"></a>

The cluster resource agent `aws-vpc-move-ip` also uses a different configuration syntax. When configuring the `aws-vpc-move-ip` resource agent, the following new parameters must be used:
+ lookup\$1type=NetworkInterfaceId
+ routing\$1table\$1role="arn:aws:iam::<account\$1id>:role/<VPC-Account-Cluster-Role>"

The following IP Resource for SAP ASE database needs to be created.

```
crm configure primitive rsc_ip_SD_ASEDB ocf:heartbeat:aws-vpc-move-ip params ip=172.16.0.29
routing_table=rtb-xxxxxroutetable1 interface=eth0 profile=cluster lookup_type=NetworkInterfaceId
routing_table_role="arn:aws:iam::<sharing_vpc_account_id>:role/<sharing_vpc_account_cluster_role>"
op start interval=0 timeout=180s op stop interval=0 timeout=180s op monitor interval=20s
timeout=40s
```

# SAP ASE and cluster setup
<a name="ase-sles-ha-setup"></a>

This section covers the following topics.

**Topics**
+ [Install SAP ASE database](#install-sap-ase)
+ [Cluster prerequisites](#cluster-prerequisites)
+ [Create cluster and node associations](#associations)

## Install SAP ASE database
<a name="install-sap-ase"></a>

The following topics provide information about installing SAP ASE database on AWS Cloud in a highly available cluster. Review SAP Documentation for more details.

**Topics**
+ [Use SWPM](#swpm)
+ [Install SAP database instance](#sap-instances)
+ [Check SAP host agent version](#host-agent-version)

### Use SWPM
<a name="swpm"></a>

Before running SAP Software Provisioning Manager (SWPM), ensure that the following prerequisites are met.
+ If the operating system groups for SAP are pre-defined, ensure that the user identifier (UID) and group identifier values for `sapadm`, `<syb>adm`, and `sapsys` are consistent across both instances.
+ You have downloaded the most recent version of Software Provisioning Manager for your SAP version. For more information, see SAP Documentation [Software Provisioning Manager](https://support.sap.com/en/tools/software-logistics-tools/software-provisioning-manager.html?anchorId=section).
+ Ensure that routes, overlay IPs, and virtual host names are mapped to the instance where the installation is run. This is to ensure that the virtual hostname for SAP ASE database is available on the primary instance. For more information, see [IP and hostname resolution prerequisites](https://docs.aws.amazon.com/sap/latest/sap-netweaver/sles-setup.html#ip-prerequisites).
+ Ensure that FSx for ONTAP mount points are available, either in `/etc/fstab` or using the mount command. For more information, see [File system prerequisites](https://docs.aws.amazon.com/sap/latest/sap-netweaver/sles-setup.html#filesystem-prerequisites). If you are adding the entries in `/etc/fstab`, ensure that they are removed before configuring the cluster.

### Install SAP database instance
<a name="sap-instances"></a>

The commands in this section use the example values provided in [Define reference parameters for setup](https://docs.aws.amazon.com/sap/latest/sap-netweaver/sles-setup.html#define-parameters).

Install SAP ASE database on `slxdbhost01` with virtual hostname `slxvdb`, using the high availability option of Software Provisioning Manager (SWPM) tool. You can use the `SAPINST_USE_HOSTNAME` parameter to install SAP using a virtual hostname.

```
<swpm location>/sapinst SAPINST_USE_HOSTNAME=<slxvdb>
```

**Note**  
Before installing SAP ASE database, ASCS and ERS must be installed, and the `/sapmnt` directory must be available on the database server.

### Check SAP host agent version
<a name="host-agent-version"></a>

The SAP host agent is used for ASE database instance control and monitoring. This agent is used by SAP cluster resource agents and hooks. It is recommended that you have the latest version installed on both instances. For more details, see [SAP Note 2219592 – Upgrade Strategy of SAP Host Agent](https://me.sap.com/notes/2219592).

Use the following command to check the version of the host agent.

```
/usr/sap/hostctrl/exe/saphostexec -version
```

## Cluster prerequisites
<a name="cluster-prerequisites"></a>

This section covers the following topics.

**Topics**
+ [Update the `hacluster` password](#update-hacluster)
+ [Setup passwordless authentication between nodes](#setup-authentication)
+ [Create an authentication key for `corosync`](#corosync-authetication)

### Update the `hacluster` password
<a name="update-hacluster"></a>

This is applicable to both cluster nodes. Change the password of the operating system user `hacluster` using the following command.

```
passwd hacluster
```

### Setup passwordless authentication between nodes
<a name="setup-authentication"></a>

For a more comprehensive and easily consumable view of cluster activity, SUSE provides additional reporting tools. Many of these tools require access to both nodes without entering a password. SUSE recommends performing this setup for root user. For more details, see *Configuration to collect cluster report as root with root SSH access between cluster nodes* section in SUSE Documentation https://www.suse.com/support/kb/doc/?id=000017501\$1:\$1:text=The%20hb\$1report%20utility%20(on%20newer,an%20incident%20to%20be%20investigated.[Usage of hb\$1report for SLES HAE].

### Create an authentication key for `corosync`
<a name="corosync-authetication"></a>

If you want to configure `corosync` to use cryptographic techniques for ensuring authenticity and privacy of the messages, you need to generate a private key. The executable `corosync-keygen` creates this key and writes it to `/etc/corosync/authkey`.

Use the following command on Node 1 as root.

```
corosync-keygen
```

Use `scp` or a temporary shared NFS location to copy an identical file on the second node at the same location. For example, on `slxdbhost01`.

```
scp -p /etc/corosync/authkey root@<slxdbhost02>:/etc/corosync
```

## Create cluster and node associations
<a name="associations"></a>

This section covers the following topics.

**Topics**
+ [Stop services for initial configuration](#stop-services)
+ [File modifications and key values](#file-modifications)
+ [Sample `corosync.conf` file](#sample-file)

### Stop services for initial configuration
<a name="stop-services"></a>

This is applicable to both cluster nodes. The cluster service `pacemaker` must be in a stopped state when performing cluster configuration.

Run the following command to check if `pacemaker` is running.

```
systemctl status pacemaker
```

Run the following command to stop `pacemaker`.

```
systemctl stop pacemaker
```

### File modifications and key values
<a name="file-modifications"></a>

 `corosync.conf` is the configuration file for the `corosync` executable. Copy the contents of the [Sample corosync.conf file](#sample-file) to `/etc/corosync/corosync.conf` on both nodes.

Ensure the following when copying the file.
+ Ensure that the node list IP addresses match the primary and secondary IPs on each host (not the overlay IP)
+ Ensure that the file is same on both nodes, with the exception of `bindnetaddr` that should match the relevant local primary IP address on each node.
+ Ensure that the token value is set to 30000. This timeout specifies the time taken in milliseconds until a token loss is declared after not receiving a token. This is important for the stability of the cluster.

### Sample `corosync.conf` file
<a name="sample-file"></a>

The following is a sample `corosync.conf` file.

Ensure that the file is same on both nodes, with the exception of `bindnetaddr` that should match the relevant local primary IP address on each node.

```
#Read the corosync.conf.5 manual page
totem {
  version: 2
  rrp_mode: passive
  token: 30000
  consensus: 36000
  token_retransmits_before_loss_const: 10
  max_messages: 20
  crypto_cipher: aes256
  crypto_hash: sha1
  clear_node_high_bit: yes
  interface {
    ringnumber: 0
    bindnetaddr: <local_ip>
    mcastport: 5405
    ttl: 1
 }
  transport: udpu
}
 logging {
      fileline: off
      to_logfile: yes
      to_syslog: yes
      logfile: /var/log/cluster/corosync.log
      debug: off
      timestamp: on
      logger_subsys {
         subsys: QUORUM
         debug: off
     }
}
nodelist {
  node {
  ring0_addr: <primary_host_ip>
  ring1_addr: <primary_host_additional_ip>
  nodeid: 1
  }
  node {
  ring0_addr: <secondary_host_ip>
  ring1_addr: <secondary_host_additional_ip>
  nodeid: 2
  }
}

quorum {
  #Enable and configure quorum subsystem (default: off)
  #see also corosync.conf.5 and votequorum.5
  provider: corosync_votequorum
  expected_votes: 2
  two_node: 1
}
```

The following table displays example substitutions for IP addresses using the sample IP addresses provided in this document. The <local\$1ip> configuration differs between hosts.


| IP address type | Primary host | Secondary host | 
| --- | --- | --- | 
|  <local\$1ip>  |   **10.1.10.1**   |   **10.1.20.1**   | 
|  <primary\$1host\$1ip>  |  10.1.10.1  |  10.1.10.1  | 
|  <primary\$1host\$1additional\$1ip>  |  10.1.10.2  |  10.1.10.2  | 
|  <secondary\$1host\$1ip>  |  10.1.20.1  |  10.1.20.1  | 
|  <secondary\$1host\$1additional\$1ip>  |  10.1.20.2  |  10.1.20.2  | 

# Cluster configuration
<a name="ase-sles-ha-cluster-configuration"></a>

This section covers the following topics.

**Topics**
+ [Cluster resources](ase-sles-ha-cluster-resources.md)
+ [Sample configuration (crm config)](sample-configuration.md)

# Cluster resources
<a name="ase-sles-ha-cluster-resources"></a>

This section covers the following topics.

**Topics**
+ [Enable and start the cluster](#start-cluster)
+ [Check cluster status](#cluster-status)
+ [Prepare for resource creation](#resource-creation)
+ [Reset configuration – *optional*](#reset-configuration)
+ [Cluster bootstrap](#cluster-bootstrap)
+ [Create Amazon EC2 STONITH resource](#create-stonith)
+ [Create file system resources](#filesystem-resources)
+ [Create overlay IP resources](#overlay-ip-resources)
+ [Create SAP ASE database resource](#ase-database-resource)
+ [Activate cluster](#activate-cluster)

## Enable and start the cluster
<a name="start-cluster"></a>

This is applicable to both cluster nodes. Run the following command to enable and start the `pacemaker` cluster service on both nodes.

```
systemctl enable --now pacemaker
```

or

```
systemctl start pacemaker
```

By enabling the `pacemaker` service, the server automatically joins the cluster after a reboot. This ensures that your system is protected. Alternatively, you can start the `pacemaker` service manually on boot. You can then investigate the cause of failure. However, this is generally not required for SAP NetWeaver ASCS cluster.

Run the following command to check the status of the `pacemaker` service.

```
systemctl status pacemaker
● pacemaker.service - Pacemaker High Availability Cluster Manager
     Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; <enabled>; vendor preset: disabled)
     Active: <active (running)> since Tue XXXX-XX-XX XX:XX:XX XXX; XXh ago
       Docs: man:pacemakerd
             https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html-single/Pacemaker_Explained/index.html
   Main PID: 1899 (pacemakerd)
Enable cluster service (optional)
```

## Check cluster status
<a name="cluster-status"></a>

Once the cluster service `pacemaker` is started, check the cluster status with `crm mon` command, as shown in the following example.

```
crm_mon -1
Cluster Summary:
  * Stack: corosync
  * Current DC: <slxdbhost01> (version 2.0.xxxxxxxxxxx) - partition with quorum
  * Last updated:
  * Last change:  by hacluster via crmd on slxdbhost01
  * 2 nodes configured
  * 0 resource instances configured

Node List:
  * Online: [ <slxdbhost0> <slxdbhost02> ]

Active Resources:
  * No active resources
```

The primary (`slxdbhost01`) and secondary (`slxdbhost02`) must show up as online.

You can find the ring status and the associated IP address of the cluster with `corosync-cfgtool` command, as shown in the following example.

```
corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
        id      = <10.1.10.1>
        status  = ring 0 active with no faults
RING ID 1
        id      = <10.1.10.2>
        status  = ring 1 active with no faults
```

## Prepare for resource creation
<a name="resource-creation"></a>

To ensure that the cluster does not perform any unexpected actions during setup of resources and configuration, set the maintenance mode to true.

Run the following command to put the cluster in maintenance mode.

```
crm maintenance on
```

## Reset configuration – *optional*
<a name="reset-configuration"></a>

**Note**  
The following instructions help you reset the complete configuration. Run these commands only if you want to start setup from the beginning. You can make minor changes with the `crm edit` command.

Run the following command to back up the current configuration for reference.

```
crm config show > /tmp/crmconfig_backup.txt
```

Run the following command to clear the current configuration.

```
crm configure erase
```

**Important**  
Once the preceding erase command is executed, it removes all of the cluster resources from Cluster Information Base (CIB), and disconnects the communication from `corosync` to the cluster. Before starting the resource configuration run `crm cluster restart`, so that cluster reestablishes communication with `corosync`, and retrieves the configuration. The restart of cluster removes *maintenance mode*. Reapply before commencing additional configuration and resource setup.

## Cluster bootstrap
<a name="cluster-bootstrap"></a>

Configure the cluster bootstrap parameters by running the following commands.

```
crm configure rsc_defaults resource-stickiness=1
crm configure rsc_defaults migration-threshold=3
crm configure property stonith-enabled="true"
crm configure property stonith-action="off"
crm configure property stonith-timeout="300s"
crm configure op_defaults timeout="300s"
crm configure op_defaults record-pending="true"
```

## Create Amazon EC2 STONITH resource
<a name="create-stonith"></a>

Modify the following command to match your configuration values.

```
crm configure primitive res_{aws}_STONITH stonith:external/ec2 op start interval=0 timeout=180s op stop interval=0 timeout=180s op monitor interval=180s timeout=60s params tag=pacemaker profile=cluster pcmk_delay_max=30
```

 **profile** – this refers to the AWS CLI profile crated during setup. In the preceding command, *cluster* is the profile name.

## Create file system resources
<a name="filesystem-resources"></a>

Mounting and unmounting file system resources to align with the location of SAP ASE database is done using cluster resources.

Modify and run the following commands to create these file system resources.

 **/sybase** 

```
crm configure primitive rsc_fs_<DBSID>_sybase ocf:heartbeat:Filesystem params device="<nfs.fqdn>:/sybase" directory="/sybase" fstype="nfs4" options=" rw,noatime,vers=4.1,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,nconnect=2,timeo=600,retrans=2" op start timeout=60s interval=0 op stop timeout=60s interval=0 op monitor interval=20s timeout=40s
```

 **/sybase/<DBSID>/sapdata\$11** 

```
crm configure primitive rsc_fs_<DBSID>_data ocf:heartbeat:Filesystem params device="<nfs.fqdn>:/asedata" directory="/sybase/<DBSID>/sapdata_1" fstype="nfs4"
options="rw,noatime,vers=4.1,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,nconnect=8,timeo=600,retrans=2" op start timeout=60s interval=0 op stop timeout=60s interval=0 op monitor interval=20s timeout=40s
```

 **/sybase/<DBSID>/saplog\$11** 

```
crm configure primitive rsc_fs_<DBSID>_log ocf:heartbeat:Filesystem params device="<nfs.fqdn>:/aselog" directory="/sybase/<DBSID>/saplog_1" fstype="nfs4"
options="rw,noatime,vers=4.1,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,nconnect=2,timeo=600,retrans=2" op start timeout=60s interval=0 op stop timeout=60s interval=0 op monitor interval=20s timeout=40s
```

 **/sybase/<DBSID>/sapdiag** 

```
crm configure primitive rsc_fs_<DBSID>_diag ocf:heartbeat:Filesystem params device="<nfs.fqdn>:/sapdiag" directory="/sybase/<DBSID>/sapdiag" fstype="nfs4"
options="rw,noatime,vers=4.1,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,nconnect=2,timeo=600,retrans=2" op start timeout=60s interval=0 op stop timeout=60s interval=0 op monitor interval=20s timeout=40s
```

 **/sybase/<DBSID>/saptmp** 

```
crm configure primitive rsc_fs_<DBSID>_tmp ocf:heartbeat:Filesystem params device="<nfs.fqdn>:/saptmp" directory="/sybase/<DBSID>/saptmp" fstype="nfs4"
options="rw,noatime,vers=4.1,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,nconnect=2,timeo=600,retrans=2" op start timeout=60s interval=0 op stop timeout=60s interval=0 op monitor interval=20s timeout=40s
```

 **/sybasebackup** 

```
crm configure primitive rsc_fs_<DBSID>_bkp ocf:heartbeat:Filesystem params device="<nfs.fqdn>:/sybasebackup" directory="/backup" fstype="nfs4"
options="rw,noatime,vers=4.1,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,nconnect=2,timeo=600,retrans=2" op start timeout=60s interval=0 op stop timeout=60s interval=0 op monitor interval=20s timeout=40s
```

 **/usr/sap** 

```
crm configure primitive rsc_fs_<DBSID>_sap ocf:heartbeat:Filesystem params device="<nfs.fqdn>:/usrsap" directory="/usr/sap" fstype="nfs4"
options="rw,noatime,vers=4.1,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,nconnect=2,timeo=600,retrans=2" op start timeout=60s interval=0 op stop timeout=60s interval=0 op monitor interval=20s timeout=40s
```

 **Notes** 
+ Review the mount options to ensure that they match with your operating system, NFS file system type, and the latest recommendations from SAP and AWS.
+ <nfs.fqdn> must be the alias of the FSx for ONTAP resource. For example, `svm-xxxxx.fs-xxxxx.<region>.amazonaws.com`.
+ Your file system structure can vary – it can have multiple data file systems. The preceding examples must be adapted to your environment.

## Create overlay IP resources
<a name="overlay-ip-resources"></a>

The IP resource provides the details necessary to update the route table entry for overlay IP.

Modify and run the following command to create IP resources.

```
crm configure primitive rsc_ip_<DBSID>_ASEDB ocf:heartbeat:aws-vpc-move-ip params ip=172.16.0.29 routing_table=rtb-xxxxxroutetable1 interface=eth0 profile=cluster op start interval=0 timeout=180s op stop interval=0 timeout=180s op monitor interval=20s timeout=40s
```

 **Notes** 
+ If more than one route table is required for connectivity or because of subnet associations, the `routing_table` parameter can have multiple values separated by a comma. For example, `routing_table=rtb-xxxxxroutetable1, rtb-xxxxxroutetable2`.
+ Additional parameters – `lookup_type` and `routing_table_role` are required for shared VPC. For more information, see [Shared VPC – optional](https://docs.aws.amazon.com/sap/latest/sap-netweaver/sles-netweaver-ha-settings.html#sles-netweaver-ha-shared-vpc).

## Create SAP ASE database resource
<a name="ase-database-resource"></a>

SAP ASE database is started and stopped using cluster resources.

Modify and run the following command to create the `SAPDatabase` resource.

```
crm configure primitive rsc_ase_<DBSID>_ASEDB ocf:heartbeat:SAPDatabase SID=<DBSID> DBTYPE=SYB STRICT_MONITORING=TRUE op start timeout=300 op stop timeout=300
```

Create the cluster resource group, and the resources together in the order in which the services will be started and stopped.

```
crm configure group grp_<DBSID>_ASEDB rsc_fs_<DBSID>_sybase rsc_fs_<DBSID>_data rsc_fs_<DBSID>_log rsc_fs_<DBSID>_diag rsc_fs_<DBSID>_tmp rsc_fs_<DBSID>_bkp rsc_fs_<DBSID>_sap rsc_aws_stonith_<DBSID> rsc_ase_<DBSID>_ASEDB
```

## Activate cluster
<a name="activate-cluster"></a>

Use `crm config show` and `crm config edit` commands to review that all the values have been entered correctly.

On confirmation of correct values, set the maintenance mode to false using the following command. This enables the cluster to take control of the resources.

 `crm maintenance off` 

See the [Sample configuration](https://docs.aws.amazon.com/sap/latest/sap-netweaver/sample-configuration.html).

# Sample configuration (crm config)
<a name="sample-configuration"></a>

```
node 1: slxdbhost01
node 2: slxdbhost02
primitive rsc_ase_ASD_ASEDB SAPDatabase \
        params SID=ASD DBTYPE=SYB STRICT_MONITORING=TRUE \
        op start timeout=300 interval=0s \
        op stop timeout=300 interval=0s \
        op monitor timeout=60s interval=120s \
        meta target-role=Started
primitive rsc_aws_stonith_ASD stonith:external/ec2 \
        params tag=pacemaker profile=cluster pcmk_delay_max=30 \
        op start interval=0 timeout=180s \
        op stop interval=0 timeout=180s \
        op monitor interval=180s timeout=60s
primitive rsc_fs_ASD_bkp Filesystem \
        params device="svm-091efa9986c8e93c7.fs-0c3a4a5162a325aea.fsx.us-east-1.amazonaws.com:/backup" directory="/sybasebackup" fstype=nfs4 options="rw,noatime,vers=4.1,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,nconnect=2,timeo=600,retrans=2" \

        op start timeout=60s interval=0 \
        op stop timeout=60s interval=0 \
        op monitor interval=20s timeout=40s
primitive rsc_fs_ASD_data Filesystem \
        params device="svm-0e6e2738a9ca391ce.fs-0c3a4a5162a325aea.fsx.us-east-1.amazonaws.com:/asedata" directory="/sybase/ASD/sapdata_1" fstype=nfs4 options="rw,noatime,vers=4.1,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,nconnect=8,timeo=600,retrans=2" \
        op start timeout=60s interval=0 \
        op stop timeout=60s interval=0 \
        op monitor interval=20s timeout=40s
primitive rsc_fs_ASD_diag Filesystem \
        params device="svm-091efa9986c8e93c7.fs-0c3a4a5162a325aea.fsx.us-east-1.amazonaws.com:/sapdiag" directory="/sybase/ASD/sapdiag" fstype=nfs4 options="rw,noatime,vers=4.1,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,nconnect=2,timeo=600,retrans=2" \
        op start timeout=60s interval=0 \
        op stop timeout=60s interval=0 \
        op monitor interval=20s timeout=40s
primitive rsc_fs_ASD_log Filesystem \
        params device="svm-0895fe73884c12f83.fs-0c3a4a5162a325aea.fsx.us-east-1.amazonaws.com:/aselog" directory="/sybase/ASD/saplog_1" fstype=nfs4 options="rw,noatime,vers=4.1,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,nconnect=2,timeo=600,retrans=2" \
        op start timeout=60s interval=0 \
        op stop timeout=60s interval=0 \
        op monitor interval=20s timeout=40s
primitive rsc_fs_ASD_sap Filesystem \
        params device="svm-091efa9986c8e93c7.fs-0c3a4a5162a325aea.fsx.us-east-1.amazonaws.com:/usrsap" directory="/usr/sap" fstype=nfs4 options="rw,noatime,vers=4.1,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,nconnect=2,timeo=600,retrans=2" \
        op start timeout=60s interval=0 \
        op stop timeout=60s interval=0 \
        op monitor interval=20s timeout=40s
primitive rsc_fs_ASD_sybase Filesystem \
        params device="svm-091efa9986c8e93c7.fs-0c3a4a5162a325aea.fsx.us-east-1.amazonaws.com:/sybase" directory="/sybase" fstype=nfs4 options="rw,noatime,vers=4.1,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,nconnect=2,timeo=600,retrans=2" \
        op start timeout=60s interval=0 \
        op stop timeout=60s interval=0 \
        op monitor interval=20s timeout=40s
primitive rsc_fs_ASD_tmp Filesystem \
        params device="svm-091efa9986c8e93c7.fs-0c3a4a5162a325aea.fsx.us-east-1.amazonaws.com:/saptmp" directory="/sybase/ASD/saptmp" fstype=nfs4 options="rw,noatime,vers=4.1,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,nconnect=2,timeo=600,retrans=2" \
        op start timeout=60s interval=0 \
        op stop timeout=60s interval=0 \
        op monitor interval=20s timeout=40s
primitive rsc_ip_SD_ASEDB aws-vpc-move-ip \
        params ip=172.16.0.29 routing_table=rtb-0b3f1d6196f45300d interface=eth0 profile=cluster \
        op start interval=0 timeout=180s \
        op stop interval=0 timeout=180s \
        op monitor interval=20s timeout=40s
group grp_ASD_ASEDB rsc_fs_ASD_sybase rsc_fs_ASD_data rsc_fs_ASD_log rsc_fs_ASD_diag rsc_fs_ASD_tmp rsc_fs_ASD_bkp rsc_fs_ASD_sap rsc_ip_SD_ASEDB rsc_ase_ASD_ASEDB
property cib-bootstrap-options: \
        maintenance-mode=false \
        stonith-enabled=true \
        stonith-action=off \
        stonith-timeout=300s \
        last-lrm-refresh=1686941627 \
        have-watchdog=false \
        dc-version="2.1.2+20211124.ada5c3b36-150400.4.9.2-2.1.2+20211124.ada5c3b36" \
        cluster-infrastructure=corosync
rsc_defaults rsc-options: \
        resource-stickiness=1 \
        migration-threshold=1
op_defaults op-options: \
        timeout=300s \
        record-pending=true
```

# Operations
<a name="ase-sles-ha-operations"></a>

This section covers the following topics.

**Topics**
+ [Analysis and maintenance](ase-sles-ha-operations-topics.md)
+ [Testing](testing.md)

# Analysis and maintenance
<a name="ase-sles-ha-operations-topics"></a>

This section covers the following topics.

**Topics**
+ [Viewing the cluster state](#clsuter-state)
+ [Performing planned maintenance](#planned-maintenance)
+ [Post-failure analysis and reset](#analysis-reset)
+ [Alerting and monitoring](#alerting-monitoring)

## Viewing the cluster state
<a name="clsuter-state"></a>

You can view the state of the cluster in two ways - based on your operating system or with a web based console provided by SUSE.

**Topics**
+ [Operating system based](#os-based)
+ [SUSE Hawk2](#suse-hawk)

### Operating system based
<a name="os-based"></a>

There are multiple operating system commands that can be run as root or as a user with appropriate permissions. The commands enable you to get an overview of the status of the cluster and its services. See the following commands for more details.

```
crm status
```

Sample output:

```
slxdbhost01:~  crm status
Cluster Summary:
* Stack: corosync
* Current DC: slxdbhost02 (version 2.1.2+20211124.ada5c3b36-150400.4.9.2-
2.1.2+20211124.ada5c3b36) - partition with quorum
* Last updated: Sat Jun 17 01:16:10 2023
* Last change: Sat Jun 17 01:15:31 2023 by root via crm_resource on slxdbhost01
* 2 nodes configured
* 10 resource instances configured
Node List:
* Online: [ slxdbhost01 slxdbhost02 ]
Full List of Resources:
* rsc_aws_stonith_ASD (stonith:external/ec2): Started slxdbhost02
* Resource Group: grp_ASD_ASEDB:
* rsc_fs_ASD_sybase (ocf::heartbeat:Filesystem): Started slxdbhost01
* rsc_fs_ASD_data (ocf::heartbeat:Filesystem): Started slxdbhost01
* rsc_fs_ASD_log (ocf::heartbeat:Filesystem): Started slxdbhost01
* rsc_fs_ASD_diag (ocf::heartbeat:Filesystem): Started slxdbhost01
* rsc_fs_ASD_tmp (ocf::heartbeat:Filesystem): Started slxdbhost01
* rsc_fs_ASD_bkp (ocf::heartbeat:Filesystem): Started slxdbhost01
* rsc_fs_ASD_sap (ocf::heartbeat:Filesystem): Started slxdbhost01
* rsc_ip_SD_ASEDB (ocf::heartbeat:aws-vpc-move-ip): Started slxdbhost01
* rsc_ase_ASD_ASEDB (ocf::heartbeat:SAPDatabase): Started slxdbhost01
```

The following table provides a list of useful commands.


| Command | Description | 
| --- | --- | 
|   `crm_mon`   |  Display cluster status on the console with updates as they occur  | 
|   `crm_mon -1`   |  Display cluster status on the console just once, and exit  | 
|   `crm_mon -Arnf`   |  -A Display node attributes -n Group resources by node -r Display inactive resources -f Display resource fail counts  | 
|   `crm help`   |  View more options  | 
|   `crm_mon --help-all`   |  View more options  | 

### SUSE Hawk2
<a name="suse-hawk"></a>

Hawk2 is a web-based graphical user interface for managing and monitoring pacemaker highly availability clusters. It must be enabled on every node in the cluster, to point your web browser on any node for accessing it. Use the following command to enable Hawk2.

```
systemctl enable --now hawk
systemctl status hawk
```

Use the following URL to check security groups for access on port 7630 from your administrative host.

```
https://your-server:7630/

e.g https://slxdbhost01:7630
```

For more information, see [Configuring and Managing Cluster Resources with Hawk2](https://documentation.suse.com/sle-ha/12-SP5/html/SLE-HA-all/cha-conf-hawk2.html) in the SUSE Documentation.

## Performing planned maintenance
<a name="planned-maintenance"></a>

The cluster connector is designed to integrate the cluster with SAP start framework (`sapstartsrv`), including the rolling kernel switch (RKS) awareness. Stopping and starting the SAP system using `sapcontrol` should not result in any cluster remediation activities as these actions are not interpreted as failures. Validate this scenario when testing your cluster.

There are different options to perform planned maintenance on nodes, resources, and the cluster.

**Topics**
+ [Maintenance mode](#maintenance-mode)
+ [Placing a node in standby mode](#node-standby)
+ [Moving a resource (not recommended)](#moving-resource)

### Maintenance mode
<a name="maintenance-mode"></a>

Use maintenance mode if you want to make any changes to the configuration or take control of the resources and nodes in the cluster. In most cases, this is the safest option for administrative tasks.

On  
+ Use one of the following commands to turn on maintenance mode.

  ```
  crm maintenance on
  ```

```
crm configure property maintenance-mode="true"
```

Off  
+ Use one of the following commands to turn off maintenance mode.

  ```
  crm maintenance off
  ```

```
crm configure property maintenance-mode="false"
```

### Placing a node in standby mode
<a name="node-standby"></a>

To perform maintenance on the cluster without system outage, the recommended method for moving active resources is to place the node you want to remove from the cluster in standby mode.

```
crm node standby <slxdbhost01>
```

The cluster will cleanly relocate resources, and you can perform activities, including reboots on the node in standby mode. When maintenance activities are complete, you can re-introduce the node with the following command.

```
crm node online <slxdbhost01>
```

### Moving a resource (not recommended)
<a name="moving-resource"></a>

Moving individual resources is not recommended because of the migration or move constraints that are created to lock the resource in its new location. These can be cleared as described in the info messages, but this introduces an additional setup.

```
slxdbhost01:~  crm resource move grp_ASD_ASEDB force
INFO: Move constraint created for grp_ASD_ASEDB
INFO: Use `crm resource clear grp_ASD_ASEDB` to remove this constraint
```

Use the following command once the resources have relocated to their target location.

```
<slxdbhost01>:~  crm resource clear grp_ASD_ASEDB
```

## Post-failure analysis and reset
<a name="analysis-reset"></a>

A review must be conducted after each failure to understand the source of failure as well the reaction of the cluster. In most scenarios, the cluster prevents an application outage. However, a manual action is often required to reset the cluster to a protective state for any subsequent failures.

**Topics**
+ [Checking the logs](#checking-logs)
+ [Cleanup `crm status`](#cleanup-crm)
+ [Restart failed nodes or `pacemaker`](#restart-nodes)
+ [Further analysis](#further-analysis)

### Checking the logs
<a name="checking-logs"></a>

Start your troubleshooting by checking the operating system log `/var/log/messages`. You can find additional information in the cluster and pacemaker logs.
+  **Cluster logs** – updated in the `corosync.conf` file located at `/etc/corosync/corosync.conf`.
+  **Pacemaker logs** – updated in the `pacemaker.log` file located at `/var/log/pacemaker`.
+  **Resource agents** – `/var/log/messages` 

Application based failures can be investigated in the SAP work directory.

### Cleanup `crm status`
<a name="cleanup-crm"></a>

If failed actions are reported using the `crm status` command, and if they have already been investigated, then you can clear the reports with the following command.

```
crm resource cleanup <resource> <hostname>
```

### Restart failed nodes or `pacemaker`
<a name="restart-nodes"></a>

It is recommended that failed (or fenced) nodes are not automatically restarted. It gives operators a chance to investigate the failure, and ensure that the cluster doesn’t make assumptions about the state of resources.

You need to restart the instance or the pacemaker service based on your approach.

### Further analysis
<a name="further-analysis"></a>

The following commands consolidate information from both nodes, highlighting key events and differentiating between originating node to make the analysis clear.

```
crm history events

crm history log
```

If further analysis from SUSE is required, an `hb_report` may be requested. For more information, see SUSE Documentation – [Usage of hb\$1report for SLES HAE](https://www.suse.com/support/kb/doc/?id=000017501).

**Note**  
 `crm history events` and `hb_report` rely on passwordless ssh being set up between the nodes.

## Alerting and monitoring
<a name="alerting-monitoring"></a>

 **Using the cluster alert agents** 

Within the cluster configuration, you can call an external program (an alert agent) to handle alerts. This is a *push* notification. It passes information about the event via environment variables.

The agents can then be configured to send emails, log to a file, update a monitoring system, etc. For example, the following script can be used to access Amazon SNS.

```
#!/bin/sh

#alert_sns.sh
#modified from /usr/share/pacemaker/alerts/alert_smtp.sh.sample

##############################################################################
#SETUP
* Create an SNS Topic and subscribe email or chatbot
* Note down the ARN for the SNS topic
* Give the IAM Role attached to both Instances permission to publish to the SNS Topic
* Ensure the aws cli is installed
* Copy this file to /usr/share/pacemaker/alerts/alert_sns.sh or other location on BOTH nodes
* Ensure the permissions allow for hacluster and root to execute the script
* Run the following as root (modify file location if necessary and replace SNS ARN):

#SLES:
crm configure alert aws_sns_alert /usr/share/pacemaker/alerts/alert_sns.sh meta timeout=30s timestamp-format="%Y-%m-%d_%H:%M:%S" to { arn:aws:sns:region:account-id:myPacemakerAlerts  }
#RHEL:
pcs alert create id=aws_sns_alert path=/usr/share/pacemaker/alerts/alert_sns.sh meta timeout=30s timestamp-format="%Y-%m-%d_%H:%M:%S"
pcs alert recipient add aws_sns_alert value=<arn:aws:sns:region:account-id:myPacemakerAlerts>

#Additional information to send with the alerts.

node_name=`uname -n`
sns_body=`env | grep CRM_alert_`
#Required for SNS
TOKEN=$(/usr/bin/curl --noproxy '*' -s -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")

#Get metadata
REGION=$(/usr/bin/curl --noproxy '*' -w "\n" -s -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/dynamic/instance-identity/document | grep region | awk -F\" '{print $4}')

sns_subscription_arn=${CRM_alert_recipient}

#Format depending on alert type
case ${CRM_alert_kind} in
   node)
     sns_subject="${CRM_alert_timestamp} ${cluster_name}: Node '${CRM_alert_node}' is now '${CRM_alert_desc}'"
   ;;
   fencing)
     sns_subject="${CRM_alert_timestamp} ${cluster_name}: Fencing ${CRM_alert_desc}"
   ;;
   resource)
     if [ ${CRM_alert_interval} = "0" ]; then
         CRM_alert_interval=""
     else
         CRM_alert_interval=" (${CRM_alert_interval})"
     fi
     if [ ${CRM_alert_target_rc} = "0" ]; then
         CRM_alert_target_rc=""
     else
         CRM_alert_target_rc=" (target: ${CRM_alert_target_rc})"
     fi
     case ${CRM_alert_desc} in
         Cancelled)
           ;;
         *)
           sns_subject="${CRM_alert_timestamp}: Resource operation '${CRM_alert_task}${CRM_alert_interval}' for '${CRM_alert_rsc}' on '${CRM_alert_node}': ${CRM_alert_desc}${CRM_alert_target_rc}"
           ;;
     esac
     ;;
   attribute)
     sns_subject="${CRM_alert_timestamp}: The '${CRM_alert_attribute_name}' attribute of the '${CRM_alert_node}' node was updated in '${CRM_alert_attribute_value}'"
     ;;
   *)
     sns_subject="${CRM_alert_timestamp}: Unhandled $CRM_alert_kind alert"
     ;;
esac

#Use this information to send the email.
aws sns publish --topic-arn "${sns_subscription_arn}" --subject "${sns_subject}" --message "${sns_body}" --region ${REGION}
```

# Testing
<a name="testing"></a>

We recommend scheduling regular fault scenario recovery testing at least annually, and as part of the operating system or SAP kernel updates that may impact operations. For more details on best practices for regular testing, see SAP Lens – [Best Practice 4.3 – Regularly test business continuity plans and fault recovery](https://docs.aws.amazon.com/wellarchitected/latest/sap-lens/best-practice-4-3.html).

The tests described here simulate failures. These can help you understand the behavior and operational requirements of your cluster.

In addition to checking the state of cluster resources, ensure that the service you are trying to protect is in the required state. Can you still connect to SAP? Are locks still available in SM12?

Define the recovery time to ensure that it aligns with your business objectives. Record recovery actions in runbooks.

**Topics**
+ [Test 1: Stop SAP ASE database using `sapcontrol`](#test1)
+ [Test 2: Unmount FSx for ONTAP file system on primary host](#test2)
+ [Test 3: Kill the database processes on the primary host](#test3)
+ [Test 4: Simulate hardware failure of an individual node, and repeat for other node](#test4)
+ [Test 5: Simulate a network failure](#test5)
+ [Test 6: Simulate an NFS failure](#test6)
+ [Test 7: Accidental shutdown](#test7)

## Test 1: Stop SAP ASE database using `sapcontrol`
<a name="test1"></a>

 **Simulate failure** – On primary host as root:

```
/usr/sap/hostctrl/exe/saphostctrl -function StopDatabase -dbname ASD -dbtybe syb -force
```

 **Expected behavior** – SAP ASE database is stopped, and the `SAPDatabase` resource agent enters a failed state. The cluster will failover the database to the secondary instance.

 **Recovery action** – No action required.

## Test 2: Unmount FSx for ONTAP file system on primary host
<a name="test2"></a>

 **Simulate failure** – On primary host as root:

```
umount -l /sybase/ASD/sapdata_1
```

 **Expected behavior** – The `rsc_fs` resource enters a failed state. The cluster stops the SAP ASE database, and will failover to the secondary instance.

 **Recovery action** – No action required.

## Test 3: Kill the database processes on the primary host
<a name="test3"></a>

 **Simulate failure** – On primary host as `syb<sid>`:

```
ps -ef |grep -i sybaasd
kill -9 <PID>
```

 **Expected behavior** – SAP ASE database fails, and the `SAPDatabase ` resource enters a failed state. The cluster will failover the database to the secondary instance.

 **Recovery action** – No action required.

## Test 4: Simulate hardware failure of an individual node, and repeat for other node
<a name="test4"></a>

 **Notes** – To simulate a system crash, you must first ensure that `/proc/sys/kernel/sysrq` is set to 1.

 **Simulate failure** – On the primary host as root:

```
echo 'c' > /proc/sysrq-trigger
```

 **Expected behavior** – The node which has been killed fails. The cluster moves the resource (SAP ASE database) that was running on the failed node to the surviving node.

 **Recovery action** – Start the EC2 node.

## Test 5: Simulate a network failure
<a name="test5"></a>

 **Notes** – See the following list.
+ Iptables must be installed.
+ Use a subnet in this command because of the secondary ring.
+ Check for any existing iptables rules as iptables -F will flush all rules.
+ Review pcmk\$1delay and priority parameters if you see neither node survives the fence race.

 **Simulate failure** – On the primary host as root:

```
iptables -A INPUT -s <CIDR_of_other_subnet> -j DROP; iptables -A OUTPUT -d <CIDR_of_other_subnet> -j DROP
```

 **Expected behavior** – The cluster detects the network failure, and fences one of the nodes to avoid a split-brain situation.

 **Recovery action** – If the node where the command was run survives, execute iptables -F to clear the network failure. Start the EC2 node.

## Test 6: Simulate an NFS failure
<a name="test6"></a>

 **Notes** – See the following list.
+ Iptables must be installed.
+ Check for any existing iptables rules as iptables -F will flush all rules.
+ Although rare, this is an important scenario to test. Depending on the activity it may take some time (10 min \$1) to notice that I/O to EFS is not occurring and fail either the Filesystem or SAP resources.

 **Simulate failure** – On the primary host as root:

```
iptables -A OUTPUT -p tcp --dport 2049 -m state --state NEW,ESTABLISHED,RELATED -j DROP; iptables -A INPUT -p tcp --sport 2049 -m state --state ESTABLISHED -j DROP
```

 **Expected behavior** – The cluster detects that NFS is not available, and the `SAPDatabase` resource agent fails, and moves to the FAILED state.

 **Recovery action** – If the node where the command was run survives, execute iptables -F to clear the network failure. Start the EC2 node.

## Test 7: Accidental shutdown
<a name="test7"></a>

 **Notes** – See the following list.
+ Avoid shutdowns without cluster awareness.
+ We recommend the use of systemd to ensure predictable behaviour.
+ Ensure the resource dependencies are in place.

 **Simulate failure** – Login to AWS Management Console, and stop the instance or issue a shutdown command.

 **Expected behavior** – The node which has been shut down fails. The cluster moves the resource (SAP ASE database) that was running on the failed node to the surviving node. If systemd and resource dependencies are not configured, you may notice that while the EC2 instance is shutting down gracefully, the cluster will detect an unclean stop of cluster services on the node and will fence the EC2 instance being shut down.

 **Recovery action** – Start the EC2 node and pacemaker service.