V3, Pacemaker Configuration Style

Axigen Documentation

In order to be able to configure the Pacemaker based cluster, there is a set of official online documents that contain very detailed instructions. You can find these documents at:

  1. Clusters from Scratch, ClusterLabs

  2. ClusterLabs Documentation Wiki page

  3. The Linux-HA User's Guide, publican edition

  4. The Linux-HA User's Guide

Below is an step-by-step set of instructions on how to configure a two-nodes active/passive cluster, based on the example configuration we have described in this document.

The ha.cf File

Just after the installation, first step is to configure the /etc/ha.d/ha.cf configuration file for the Heartbeat cluster messaging layer. The following example is a small and simple ha.cf file:

The autojoin none setting disables cluster node auto-discovery and requires that cluster nodes be listed explicitly, using the node directives, defined at the bottom in the file. This setting speeds up cluster start-up in clusters with a fixed small number of nodes.

The bcast eth0 setting configures eth0 as interface Heartbeat sends UDP broadcast traffic on.

The next options configure node failure detection. They set the time after which Heartbeat issues a warning that a no longer available peer node may be dead (warntime), the time after which Heartbeat considers a node confirmed dead (deadtime), and the maximum time it waits for other nodes to check in at cluster startup (initdead). The keepalive directive sets the interval at which Heartbeat keep-alive packets are sent. All these options are given in seconds.

The node directive identifies cluster members. The option values listed here must match the exact host names of cluster nodes as given by uname -n.

The pacemaker directive set to the respawn value enables the Pacemaker cluster manager.

Prior to Heartbeat release 3.0.4, the pacemaker keyword was named crm. Newer versions still retain the old name as a compatibility alias, but the pacemaker directive is preferred by upstream developers.

The authkeys File

The /etc/ha.d/authkeys file contains pre-shared secrets used for mutual cluster node authentication. It contains pairs of two lines, one specifying a key identifier and the second line specifying key's hashing algorithm and a secret.

An example used in our setup is:

Configuration Propagation

You must copy the /etc/ha.d/ha.cf and /etc/ha.d/authkeys on the second node to have the exactly same content.

You can also use the ha_propagate tool, which uses scp to copy the files to the remote node(s) in the cluster. This tool can be found in either the /usr/share/heartbeat/ or /usr/lib/heartbeat/ directories, depending on the package distribution you have installed.

Service Startup

Make sure you set the heartbeat init script to start at boot time, using the command:

Starting the heartbeat services is as simple as:

Please issue the above commands also on the second node in the cluster.

Verify the service successful startup with the crm_mon command, like:

Cluster Preparations

For data safety, the cluster will have enabled by default STONITH by default. We will disable it and configure it at a later point, by setting stonith-enabled cluster option to false:

After this, the live cluster configuration verification command, crm_verify -L, will return no error.

Because the resources are being started by the cluster immediately after their creation, we will put both nodes in stand-by and bring them online after finishing resources configuration, so that . To put both nodes in stand-by, just issue the following two commands:

The crm configure show command will list nodes with their standby attribute set to on.

In order to reduce the possibility of data corruption, Pacemaker's default behavior is to stop all resources if the cluster does not have quorum. Because a cluster is said to have quorum when more than half the known or expected nodes are online, a two-node cluster only has quorum when both nodes are running, which is no longer the case for our cluster. It is possible to control how Pacemaker behaves when quorum is lost. In particular, we can tell the cluster to simply ignore quorum altogether:

DRBD Resource

The first resource you need to add to your cluster is the DRBD file system you have previously created. This functionality is provided by the ocf:linbit:drbd resource agent, as follows:

The above resource, named drbd_axigen_ha, specifies only the DRBD resource as parameter and a monitoring interval of 60 seconds.

Next, we need to create a master/slave resource, which will tell the cluster manager to only run the drbd_axigen_ha resource on the node that has DRBD configured as primary.

The third resource related to DRBD is the file system mount itself, provided by the ocf:heartbeat:Filesystem resource agent, configured with parameters specifying the device to mount, the mount point and the file system type.

Finally, we have to specify that the file system resource must run on the Master node and that the mount action must take place on the same machine as the one that has been promoted the master/slave resource:

IP Resource

A floating IP address must be assigned to the active node in the cluster, to ensure transparency for the Axigen services. This can be achieved by defining a resource based on the ocf:heartbeat:IPaddr2 agent, as follows:

Axigen Service Resource

The last resource is the Axigen init script configured above, which should be added like in the example below:

Resource Ordering

Because the successful startup of the defined resources depend on their order, you have to add some ordering constraints, which will ensure the following order: File system → IP Address → Axigen init script.

Location Preference

Like in the case of the resource order, besides being started in a preferred order, they also need to run on the same machine. To achieve this, the IP address and file system resources are constrained to run on the same node as the Axigen init script resource:

You can also setup a preferred node for running the cluster resources, by specifying a location constraint. For example, using our example, you can set the n1.cl.axilab.local as preferred for running the Axigen init script resource (and its dependencies):

Sometimes, after a node fails, it come back alive eventually. To avoid the resources being transferred back to it (generating an additional downtime), you can setup a general cluster resource stickiness with a higher score than the node preference defined above, as follows:

Fencing

STONITH is an acronym for Shoot The Other Node In The Head and it protects your data from being corrupted by rogue nodes or concurrent access. With Pacemaker, STONITH is a node fencing daemon which also must be configured to achieve full data safety.

Using the example configuration from this document, we have defined a STONITH fencing device as follows:

Set back to the stonith-enabled cluster property you have switched off at the beginning of the cluster setup:

Moving Resources

You can move a resource to a specific node by using the following command example:

Letting back the cluster to decide where to run the resources ca be performed like:

Removing Resources

If you need to remove a specific resource from the cluster configuration, you can use the crm resource command line, in the following order: