RHCS: Install a two-node basic cluster

RHCS: Install a two-node basic cluster

# Tested on CentOS 7

# Note: For the commands here after, [ALL] indicates that command has to be run on the two
#       nodes and [ONE] indicates that one needs to run it only on one of the hosts.

# The cluster installed here uses Pacemaker and Corosync to provide resource management
# and messaging.
# Pacemaker is a resource manager which, among other capabilities, is able to detect and
# recover from the failure of various nodes, resources and services under its control by
# using the messaging and membership capabilities provided by the choses cluster
# infrastructure (either Corosync or Heartbeat).
# Pacemaker main features:
# - Detection and recovery of node and service-level failures
# - Storage agnostic, no requirement for shared storage
# - Resource agnostic, anything that can be scripted can be clustered
# - Supports fencing for ensuring data integrity
# - Supports large and small clusters
# - Supports both quorate and resource-driven clusters
# - Supports practically any redundancy configuration
# - Automatically replicated configuration that can be updated from any node
# - Ability to specify cluster-wide service ordering, colocation and anti-colocation
# - Support for advanced service types
#       - Clones: for services which need to be active on multiple nodes
#       - Multi-state: for services with multiple modes
#            (e.g. master/slave, primary/secondary) 
# - Unified, scriptable cluster management tools 
# Pacemaker components:
# - Cluster Information Base (CIB)
# - Cluster Resource Management daemon (CRMd)
# - Local Resource Management daemon (LRMd)
# - Policy Engine (PEngine or PE)
# - Fencing daemon (STONITHd - "Shoot-The-Other-Node-In-The-Head")

# ------------------------------------------------------------------------------------------

# If a cluster splits into two (or more) groups of nodes that can no longer communicate
# with each other, quorum is used to prevent resources from starting on more nodes than
# desired, which would risk data corruption.
# A cluster has quorum when more than half of all known nodes are online in the same
# partition (group of nodes).
# For example, if a 5-node cluster split into 3- and 2-node partitions, the 3-node
# partition would have quorum and could continue serving resources. If a 6-node cluster
# split into two 3-node partitions, neither partition would have quorum; pacemaker’s
# default behavior in such cases is to stop all resources, in order to prevent data
# corruption.
# Two-node clusters are a special case. By the above definition, a two-node cluster would
# only have quorum when both nodes are running. This would make the creation of a two-node
# cluster pointless, but corosync has the ability to treat two-node clusters as if only
# one node is required for quorum.
# The pcs cluster setup command will automatically configure two_node: 1 in corosync.conf,
# so a two-node cluster will "just work".
# Depending of the versions of corosync, if may be that you will have to ignore quorum at
# the pacemaker level, using pcs property set no-quorum-policy=ignore.

# ------------------------------------------------------------------------------------------

# First of all, make sure that the two nodes are reachable on their IP addresses and that
# they are known by their names:

root@nodeA:/root#> cat /etc/hosts | egrep "nodeA|nodeB" nodeA nodeB

root@nodeA:/root#> ssh nodeB
root@nodeB's password:
Last login: Wed Jan 24 14:13:38 2018 from

root@nodeB:/root#> ssh nodeA
root@nodeA's password:
Last login: Wed Jan 24 14:13:38 2018 from

# In order to facilitate communications, de-activate SELinux and firewall service

# This may create significant security issues and should not be performed on machines
# that may be exposed to the outside world, but may be appropriate during development and
# testing on a protected host. 

[ALL] sed -i 's/SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config
[ALL] setenforce 0

[ALL] systemctl stop firewalld
[ALL] systemctl disable firewalld

# Install the needed packages

[ALL] yum install pacemaker pcs resource-agents

# Start (and enable) pcs daemon on both nodes

[ALL] systemctl start pcsd.service
[ALL] systemctl enable pcsd.service

# Configure pcs authentication

[ALL] echo "mypassword" | passwd --stdin hacluster
Changing password for user hacluster.
passwd: all authentication tokens updated successfully.

[ONE] pcs cluster auth nodeA nodeB -u hacluster -p mypassword --force
nodeA: Authorized
nodeB: Authorized

# Create the cluster and populate it with the nodes

[ONE] pcs cluster setup --force --name lar_cluster nodeA nodeB
Destroying cluster on nodes: nodeA, nodeB...
nodeA: Stopping Cluster (pacemaker)...
nodeB: Stopping Cluster (pacemaker)...
nodeA: Successfully destroyed cluster
nodeB: Successfully destroyed cluster

Sending 'pacemaker_remote authkey' to 'nodeA', 'nodeB'
nodeA: successful distribution of the file 'pacemaker_remote authkey'
nodeB: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
nodeA: Succeeded
nodeB: Succeeded

Synchronizing pcsd certificates on nodes nodeA, nodeB...
nodeA: Success
nodeB: Success
Restarting pcsd on the nodes in order to reload the certificates...
nodeA: Success
nodeB: Success

# Start the cluster

[ONE] pcs cluster start --all
nodeA: Starting Cluster...
nodeB: Starting Cluster...

# Enable necessary daemons so the cluster starts automatically on boot-up

[ALL] systemctl enable corosync.service
[ALL] systemctl enable pacemaker.service

# Verify corosync installation

[ONE] corosync-cfgtool -s
Printing ring status.
Local node ID 1
        id      =
        status  = ring 0 active with no faults

[ONE] corosync-cmapctl | grep members
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined

# To check the status of the cluster run either one the two following commands

[ONE] pcs status
Cluster name: lar_cluster
WARNING: no stonith devices and stonith-enabled is not false
Stack: corosync
Current DC: nodeB (version 1.1.16-12.el7-94ff4df) - partition with quorum
Last updated: Thu Feb  8 17:39:33 2018
Last change: Thu Feb  8 17:37:46 2018 by hacluster via crmd on nodeB

2 nodes configured
0 resources configured

Online: [ nodeA nodeB ]

No resources

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

[ONE] crm_mon -1
Stack: corosync
Current DC: nodeB (version 1.1.16-12.el7-94ff4df) - partition with quorum
Last updated: Thu Feb  8 17:39:46 2018
Last change: Thu Feb  8 17:37:46 2018 by hacluster via crmd on nodeB

2 nodes configured
0 resources configured

Online: [ nodeA nodeB ]

No active resources

# Voilà! We have installed our basic cluster

# Raw cluster configuration can be shown by using following command:

[ONE] pcs cluster cib
<cib crm_feature_set="3.0.12" validate-with="pacemaker-2.8" epoch="5" num_updates="7" admin_epoch="0" cib-last-written="Thu Feb  8 17:37:46 2018" update-origin="nodeB" update-client="crmd" update-

user="hacluster" have-quorum="1" dc-uuid="2">
      <cluster_property_set id="cib-bootstrap-options">
        <nvpair id="cib-bootstrap-options-have-watchdog" name="have-watchdog" value="false"/>
        <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.16-12.el7-94ff4df"/>
        <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="corosync"/>
        <nvpair id="cib-bootstrap-options-cluster-name" name="cluster-name" value="lar_cluster"/>
      <node id="1" uname="nodeA"/>
      <node id="2" uname="nodeB"/>
    <node_state id="2" uname="nodeB" in_ccm="true" crmd="online" crm-debug-origin="do_state_transition" join="member" expected="member">
      <lrm id="2">
      <transient_attributes id="2">
        <instance_attributes id="status-2">
          <nvpair id="status-2-shutdown" name="shutdown" value="0"/>
    <node_state id="1" uname="nodeA" in_ccm="true" crmd="online" crm-debug-origin="do_state_transition" join="member" expected="member">
      <lrm id="1">
      <transient_attributes id="1">
        <instance_attributes id="status-1">
          <nvpair id="status-1-shutdown" name="shutdown" value="0"/>

# If ever we made some changes to the configuration manually, we can check the correction
# of the XML file by running this command:

[ONE] crm_verify -L -V
   error: unpack_resources:     Resource start-up disabled since no STONITH resources have been defined
   error: unpack_resources:     Either configure some or disable STONITH with the stonith-enabled option
   error: unpack_resources:     NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid

# These errors will be ignored after disabling fencing.
# Cluster logs can be found in /var/log/pacemaker.log and /var/log/cluster/corosync.log
root@nodeA:/root#> cat /var/log/pacemaker.log
Set r/w permissions for uid=189, gid=189 on /var/log/pacemaker.log
Feb 08 17:37:24 [5487] nodeA pacemakerd:     info: crm_log_init:      Changed active directory to /var/lib/pacemaker/cores
Feb 08 17:37:24 [5487] nodeA pacemakerd:     info: get_cluster_type:  Detected an active 'corosync' cluster
Feb 08 17:37:24 [5487] nodeA pacemakerd:     info: mcp_read_config:   Reading configure for stack: corosync
Feb 08 17:37:24 [5487] nodeA pacemakerd:   notice: crm_add_logfile:   Switching to /var/log/cluster/corosync.log
root@nodeA:/root#> tail -20 /var/log/cluster/corosync.log
Feb 08 17:37:46 [5488] nodeA          cib:     info: cib_perform_op:    Diff: +++ 0.5.6 (null)
Feb 08 17:37:46 [5488] nodeA          cib:     info: cib_perform_op:    +  /cib:  @num_updates=6
Feb 08 17:37:46 [5488] nodeA          cib:     info: cib_process_request:       Completed cib_modify operation for section status: OK (rc=0, origin=nodeB/attrd/3, version=0.5.6)
Feb 08 17:37:46 [5488] nodeA          cib:     info: cib_process_request:       Forwarding cib_delete operation for section //node_state[@uname='nodeA']/transient_attributes to all (origin=local/crmd/13)
Feb 08 17:37:46 [5488] nodeA          cib:     info: cib_process_request:       Completed cib_delete operation for section //node_state[@uname='nodeA']/transient_attributes: OK (rc=0, origin=nodeA/crmd/13, version=0.5.6)
Feb 08 17:37:46 [5488] nodeA          cib:     info: cib_perform_op:    Diff: --- 0.5.6 2
Feb 08 17:37:46 [5488] nodeA          cib:     info: cib_perform_op:    Diff: +++ 0.5.7 (null)
Feb 08 17:37:46 [5488] nodeA          cib:     info: cib_perform_op:    +  /cib:  @num_updates=7
Feb 08 17:37:46 [5488] nodeA          cib:     info: cib_perform_op:    ++ /cib/status/node_state[@id='1']:  <transient_attributes id="1"/>
Feb 08 17:37:46 [5488] nodeA          cib:     info: cib_perform_op:    ++                                     <instance_attributes id="status-1">
Feb 08 17:37:46 [5488] nodeA          cib:     info: cib_perform_op:    ++                                       <nvpair id="status-1-shutdown" name="shutdown" value="0"/>
Feb 08 17:37:46 [5488] nodeA          cib:     info: cib_perform_op:    ++                                     </instance_attributes>
Feb 08 17:37:46 [5488] nodeA                  cib:     info: cib_perform_op:    ++                                   </transient_attributes>
Feb 08 17:37:46 [5488] nodeA          cib:     info: cib_process_request:       Completed cib_modify operation for section status: OK (rc=0, origin=nodeB/attrd/4, version=0.5.7)
Feb 08 17:37:46 [5488] nodeA          cib:     info: cib_file_backup:   Archived previous version as /var/lib/pacemaker/cib/cib-3.raw
Feb 08 17:37:46 [5488] nodeA          m01        cib:     info: cib_file_write_with_digest:        Wrote version 0.5.0 of the CIB to disk (digest: ef4905fd38cc2a926d5b6c686d3ab21e)
Feb 08 17:37:46 [5488] nodeA          cib:     info: cib_file_write_with_digest:        Reading cluster configuration file /var/lib/pacemaker/cib/cib.clWuxF (digest: /var/lib/pacemaker/cib/cib.gJSY1F)
Feb 08 17:37:51 [5488] nodeA          cib:     info: cib_process_ping:  Reporting our current digest to nodeB: 2cdda87849aa421905eb3901c98cb8c1 for 0.5.7 (0x563aecfc0970 0)
Feb 08 17:37:55 [5493] nodeA          crmd:     info: crm_procfs_pid_of: Found cib active as process 5488
Feb 08 17:37:55 [5493] nodeA          crmd:     info: throttle_send_command:     New throttle mode: 0000 (was ffffffff)
0 (0)
Article Rating (No Votes)
Rate this article
There are no attachments for this article.
There are no comments for this article. Be the first to post a comment.
Full Name
Email Address
Security Code Security Code
Related Articles RSS Feed
Yum Update: DB_RUNRECOVERY Fatal error, run database recovery
Viewed 945 times since Fri, Jan 17, 2020
RHCS6: Luci - the cluster management console
Viewed 1323 times since Sun, Jun 3, 2018
RHCS6: Extend an existing Logical Volume / GFS2 filesystem
Viewed 1262 times since Sun, Jun 3, 2018
Linux - How to monitor memory usage
Viewed 858 times since Fri, Jun 8, 2018
How to configure an SSH proxy server with Squid
Viewed 273 times since Sun, Dec 6, 2020
HowTo: Kill TCP Connections in CLOSE_WAIT State
Viewed 1434 times since Thu, Feb 14, 2019
LVM: Remove a Filesystem / Logical Volume
Viewed 979 times since Sat, Jun 2, 2018
Linux Network (TCP) Performance Tuning with Sysctl
Viewed 5553 times since Fri, Aug 3, 2018
Using stunnel and TinyProxy to obfuscate HTTP traffic
Viewed 2311 times since Fri, Sep 28, 2018
RHEL: Rebuilding the initial ramdisk image
Viewed 2720 times since Sat, Jun 2, 2018