Linux: Active/Passive Cluster

Linux cluster configuration and management

Heartbeat is a deprecated cluster messaging layer that was historically used with Pacemaker. Although it is available in portage, today Corosync is the preferred messaging layer and Heartbeat is not recommended for new deployments.

Prerequisites

Unfortunately, pacemaker is based on Python version 2. Therefore, /etc/portage/make.conf needs to modified accordingly:

USE_PYTHON="2.7 3.2"

For Apache, it’s necessary to set python3, For corosync, python2. To list available options, use:

eselect python list
Available Python interpreters:
  [1]   python2.7
  [2]   python3.2 *

and to select, use:

eselect python set 1

Setup requested USE flags in /etc/portage/make.conf. Do not use the heartbeat flag!

USE="-X -gtk -gnome -qt4 -kde -dvd -alsa -cdr -heartbeat
     bindist snmp pkcs11 gnutls snmp smtp"

And modify /etc/portage/package.keywords accordingly:

sys-cluster/corosync ~ARCH
sys-cluster/pacemaker ~ARCH
sys-cluster/libqb ~ARCH
sys-cluster/crmsh ~ARCH

Software Installation

Now, it’s time for install of pacemaker and corosync:

emerge -vp sys-cluster/pacemaker sys-cluster/corosync

And add as requested:

usermod -a -G haclient root

Software Configuration

Generate the keys:

corosync-keygen

Copy the initial configuration of Corosync:

cp /etc/corosync/corosync.conf.example /etc/corosync/corosync.conf

Edit the /etc/corosync/corosync.conf file and modify as requested, esp. the bindnetaddr which is the NETWORK IP address and mcastaddr to prevent other multicast services disruption:

totem {
        version: 2
        token: 5000
        token_retransmits_before_loss_const: 20
        join: 1000
        consensus: 7500
        vsftype: none
        max_messages: 20
        secauth: on
        threads: 0
        interface {
                ringnumber: 0
                bindnetaddr: 192.168.0.0
                mcastaddr: 239.255.1.1
                mcastport: 5405
                ttl: 1
        }
}

logging {
        fileline: off
        to_stderr: no
        to_logfile: yes
        logfile: /var/log/cluster/corosync.log
        to_syslog: yes
        debug: off
        timestamp: on
        logger_subsys {
                subsys: AMF
                debug: off
        }
}

amf {
        mode: disabled
}

Replicate the main device configuration to all cluster members (MEMBER-IP):

scp /etc/corosync/authkey MEMBER-IP:/etc/corosync/authkey
scp /etc/corosync/service.d/pcmk MEMBER-IP:/etc/corosync/service.d/pcmk
scp /etc/corosync/corosync.conf MEMBER-IP:/etc/corosync/corosync.conf

If not using DNS, it is necessary to update the /etc/hosts file with the respective domain names:

192.168.0.1	node1.cluster
192.168.0.2	node2.cluster

Software Start

Now, it is time to start corosync and pacemaker. Start the main appliance, and after a few seconds (~30sec) start the rest:

/etc/init.d/corosync start
/etc/init.d/pacemaker start

Verify that all of the nodes are running correctly and they joined the cluster successfully:

cmr_mon

It is important that Current DC is elected and all nodes are Online.

Last updated: Wed Apr  3 10:03:57 2013
Last change: Wed Apr  3 09:17:54 2013 by hacluster via crmd on Node1
Stack: classic openais (with plugin)
Current DC: Node1 - partition with quorum
Version: 1.1.9-2a917dd
2 Nodes configured, 2 expected votes
0 Resources configured.


Online: [ Node1 Node2 ]

All rings can be briefly checked by the following command as well:

corosync-cfgtool -s

Result:

Printing ring status.
Local node ID -921392960
RING ID 0
	id	= 192.168.0.1
	status	= ring 0 active with no faults

As the members are not visible in the previous result, querying the member list can be done as follows:

corosync-objctl | grep member 

If runtime.totem.pg.mrp.srp.members.CLUSTERNODEID.status=joined for all clusted nodes, all of them have successfully joined the cluster.

runtime.totem.pg.mrp.srp.members.-921392960.ip=r(0) ip(192.168.0.1) 
runtime.totem.pg.mrp.srp.members.-921392960.join_count=1
runtime.totem.pg.mrp.srp.members.-921392960.status=joined
runtime.totem.pg.mrp.srp.members.-904615744.ip=r(0) ip(192.168.0.2) 
runtime.totem.pg.mrp.srp.members.-904615744.join_count=1
runtime.totem.pg.mrp.srp.members.-904615744.status=joined

High-Availability Communication Setup

The following commands can be saved in a script or executed directly using the command-line interface. Either way, since all nodes are successfully integrated in the cluster, only the main appliance will be modified and the configuration will be automatically replicated to other cluster members.

#!/bin/bash

# Setting up Active/Passive Cluster

HAIP=192.168.0.9
HAMASK=255.255.255.0
HAIF=eth0
HAIFINT=20s

# Delete everything
crm configure erase

# Disable Stonith
crm configure property stonith-enabled="false"

# With 2 nodes we cannot attain a quorum
crm configure property no-quorum-policy="ignore"

# Configure Virtual IP resource for nodes in one cluster:
crm configure primitive P_VIP ocf:heartbeat:IPaddr2 
	params ip="$HAIP" cidr_netmask="$HAMASK" nic="$HAIF" 
	op monitor interval="$HAIFINT"

# Configure Apache server:
APACHECONF=/etc/apache2/httpd.conf
crm configure primitive P_APACHE ocf:heartbeat:apache 
	params configfile="$APACHECONF" 
        op start interval="0s" timeout="60s" 
        op monitor interval="5s" timeout="20s" 
        op stop interval="0s" timeout="60s"

# Rule for OpenVPN server
#crm configure primitive P_OPENVPN ocf:heartbeat:anything 
#	params binfile="/usr/sbin/openvpn" 
#	cmdline_options="--writepid /var/run/openvpn.pid --config /etc/openvpn/openvpn.conf --cd /etc/openvpn --daemon"
#	pidfile="/var/run/openvpn.pid" 
#	op start timeout="20" op stop timeout="30" op monitor interval="20"

# All services running on the main server
crm configure colocation C_ALL_IN_ONE_PLACE inf: P_VIP P_APACHE

# The order of application startup (Apache once the network is up)
crm configure order O_ORDER inf: P_VIP P_APACHE

# Only in live command-line interface
#crm configure commit

Troubleshooting

To stop a resource, use:

crm resource stop P_IP

To see running setup, use:

crm_mon -r1

Problem with incorrectly compiled crmsh (against python3)

abort: No module named crmsh
(check your install and PYTHONPATH)

Can be easily resolved by changing the currently used version of python and recompiling sys-cluster/crmsh (see the Prerequisites section).

Advertisements
This entry was posted in Linux and tagged , , , , . Bookmark the permalink.