Pacemaker and Corosync

Axigen Documentation

Updated: September 21, 2021

In this short guide, we’ll show you how to build an Axigen active-passive cluster based on the Pacemaker and Corosync cluster stack documented on the Cluster Labs website.

The steps below must be performed on both nodes unless specified otherwise.

How to Setup an Active-Passive Cluster Based on Pacemaker and Corosync

Installation

Following the instructions from Cluster Labs: RHEL7 Quickstart and Cluster Labs: Clusters from Scratch, we start by installing all needed packages:

If you are using default SELinux configuration you must turn on the daemons_enable_cluster_mode boolean, which is disabled by default:

The pcs daemon is used to work with the pcs command-line interface to manage synchronizing the Corosync configuration across all nodes in the cluster. Before the cluster can be configured, the pcs daemon must be started and enabled to start at boot time on each node, using the following commands:

We have chosen to not enable the Corosync and Pacemaker services to start at boot. If a cluster node fails or is rebooted, you will need to run the following command to start the cluster on it:

If you would like to have the cluster services up when the node is started, you should set the Pacemaker and Corosync services to start at boot:

Please note that requiring a manual start of cluster services gives you the opportunity to do a post-mortem investigation of a node failure before returning it to the cluster.

Configuration

Before starting the cluster configuration we have to be sure that both nodes are also reachable by their short name and if the DNS does not resolve them accordingly (based on default search domains) you should add, on both nodes, the following lines in /etc/hosts:

Configure the Cluster

First, we configure the password for the user running the cluster processes on both nodes:

Next, we have to authenticate pcs to pcsd on both nodes:

Then, on only one node, we are creating the cluster and populate it with nodes:

Start the Cluster

We start the cluster on only one node:

From now on, if not specifically mentioned, all commands should be run only on one node.

Prepare the Cluster

For data safety, the cluster default configuration has STONITH enabled. We will disable it and configure it at a later point, by setting the stonith-enabled cluster option to false:

In order to reduce the possibility of data corruption, Pacemaker's default behavior is to stop all resources if the cluster does not have quorum. Because a cluster is said to have quorum when more than half of the known or expected nodes are online, a two-node cluster only has quorum when both nodes are running, which is no longer the case for our cluster. It is possible to control how Pacemaker behaves when quorum is lost. In particular, we can tell the cluster to simply ignore quorum altogether:

Then, on both nodes, verify the cluster status report with the pcs status command:

If on a particular node pcs status is reporting:

then you should repeat the authentication command on that particular node:

Because the resources are being started by the cluster immediately after their creation, we will put both nodes in stand-by and bring them online after finishing resources configuration. To put both nodes in stand-by, just issue the following two commands:

The pcs cluster status command will list nodes confirming their standby status.

DRBD Resource

Create a configuration file to be used to commit all configuration changes atomically:

The first resource you need to add to your cluster is the DRBD filesystem you have previously created. This functionality is provided by the ocf:linbit:drbdresource agent, as follows:

The above resource, named drbd_axigen_ha, specifies only the DRBD resource as parameter and a monitoring interval of 60 seconds.

Next, we need to create a master / slave resource, which will tell the cluster manager to only run the drbd_axigen_ha resource on the node that has DRBD configured as primary.

The third resource related to DRBD is the file system mount itself, provided by the ocf:heartbeat:Filesystem resource agent, configured with parameters specifying the device to mount, the mount point and the file system type.

Finally, we have to specify that the file system resource must run on the Master node and that the mount action must take place on the same machine as the one that has been promoted the master / slave resource:

Review the configured resources:

After you are satisfied with all the changes, you can commit them all at once by pushing the drbd_cfg file into the live CIB.

IP Resource

A floating IP address must be assigned to the active node in the cluster, to ensure transparency for the Axigen services. This can be achieved by defining a resource based on the ocf:heartbeat:IPaddr2 agent, as follows:

Axigen Service Resource

The last resource is the Axigen init script configured above, which should be added like in the example below:

Resource Ordering

Because the successful startup of the defined resources depends on their order, you have to add some ordering constraints, which will ensure the following order: File system → IP Address → Axigen init script.

Location Preference

Like in the case of the resource order, besides being started in a preferred order, they also need to run on the same machine. To achieve this, the IP address and file system resources are constrained to run on the same node as the Axigen init script resource:

You can also set up a preferred node for running the cluster resources, by specifying a location constraint. For example, using our example, you can set the node1 as preferred for running the Axigen init script resource (and its dependencies):

Sometimes, after a node fails, it comes back alive eventually. To avoid the resources being transferred back to it (generating an additional downtime), you can setup general cluster resource stickiness with a higher score than the node preference defined above, as follows:

Fencing

STONITH is an acronym for “Shoot The Other Node In The Head” and it protects your data from being corrupted by rogue nodes or concurrent access. With Pacemaker, STONITH is a node fencing daemon which also must be configured to achieve full data safety.

Using the example configuration from this document (APC), we have defined a STONITH fencing device as follows:

Set back the stonith-enabled cluster property you have switched off at the beginning of the cluster setup:

Notes for other fencing devices

  • Using ESXi / VMware (VMware over SOAP Fencing)

Bring Both Nodes Online

After we have checked that the cluster is configured and all components did not return any errors we could bring both nodes online: