In this tutorial I setup a high availability cluster with Pacemaker and Corosync.
Prerequisites #
My Setup #
I use 3 Ubuntu 22.04 servers with the following hostnames and IP addresses:
192.168.30.31 jkw-han-1
192.168.30.32 jkw-han-2
192.168.30.33 jkw-han-3
And the following floating IP address: 192.168.30.30
Hosts Files #
Make sure to exclude the 127.0.1.1 jkw-han-1
entry from the hosts files!
# Open hosts file
sudo vi /etc/hosts
# Change hosts file
127.0.0.1 localhost
192.168.30.31 jkw-han-1
192.168.30.32 jkw-han-2
192.168.30.33 jkw-han-3
Nginx #
Install nginx on all nodes and setup a unique html file for each node so that you are able to distinguish them from each other.
# Install nginx
sudo apt install nginx -y
# Define nginx webiste on node 1
echo '<h1>jkw-han-1</h1>' > /var/www/html/index.html
# Define nginx webiste on node 2
echo '<h1>jkw-han-2</h1>' > /var/www/html/index.html
# Define nginx webiste on node 3
echo '<h1>jkw-han-3</h1>' > /var/www/html/index.html
# Stop nginx on all nodes
sudo systemctl stop nginx
PCS Version #
Install Pacemaker, Corosync and PCS #
Run the following commands on all nodes:
# Install packages
sudo apt install pacemaker corosync pcs resource-agents -y
# Enable startup
sudo systemctl enable pacemaker corosync pcsd
# Stop Corosync and Pacemaker
sudo systemctl stop corosync && sudo systemctl stop pacemaker
# Start PCS
sudo systemctl start pcsd
Packages Explanation #
-
pacemaker
Pacemaker manages the cluster resources like starting and stopping the services, based on policies you define and based on messages and events from Corosync. -
corosync
Corosync provides the messaging capabilities to allow the nodes from the cluster to communicate with each other. It ensures that each node in the cluster knows about the other nodes and if they are online or offline. -
crmsh
orpcs
Command-line tools to manage and configure Pacemaker and Corosync clusters. -
resource-agents
Scripts or agents that Pacemaker uses to manage various services and resources. Each resource type (e.g., NGINX, Apache, Filesystem, Virtual IP, etc.) has a corresponding resource agent that knows how to start, stop, and monitor that specific resource. -
fence-agents
Used for “fencing” or “stonith” / to ensure data integrity and prevent “split-brain”.
Pacemaker Configuration System (PCS) #
During the installation of “PCS” an user with the name “hacluster” is created, set the same password for the user on all nodes:
# Set password for hacluster user: On all nodes
sudo passwd hacluster
# Shell output:
New password:
Retype new password:
passwd: password updated successfully
# Authenticate / authorize the nodes: Run command on first node
sudo pcs host auth jkw-han-1 jkw-han-2 jkw-han-3 -u hacluster
# Shell Output:
Password:
jkw-han-2: Authorized
jkw-han-3: Authorized
jkw-han-1: Authorized
# Configure Cluster: Run command on first node
sudo pcs cluster setup jkw-cluster jkw-han-1 jkw-han-2 jkw-han-3 --force
Note: Use the --force
option to overwrite the existing (standard) configuration. Replace “jkw-cluster”
with the name of your cluster.
# Start the Cluster: Run command on first node
sudo pcs cluster start --all
# Shell output:
jkw-han-1: Starting Cluster...
jkw-han-3: Starting Cluster...
jkw-han-2: Starting Cluster...
# Enable auto start
sudo pcs cluster enable --all
# Shell output:
jkw-han-1: Cluster Enabled
jkw-han-2: Cluster Enabled
jkw-han-3: Cluster Enabled
# Check Cluster Status:
sudo pcs cluster status
# Shell output:
Cluster Status:
Cluster Summary:
* Stack: corosync
* Current DC: jkw-han-3 (version 2.1.2-ada5c3b36e2) - partition with quorum
* Last updated: Sat Aug 19 22:35:06 2023
* Last change: Sat Aug 19 22:34:25 2023 by hacluster via crmd on jkw-han-3
* 3 nodes configured
* 0 resource instances configured
Node List:
* Online: [ jkw-han-1 jkw-han-2 jkw-han-3 ]
PCSD Status:
jkw-han-1: Online
jkw-han-2: Online
jkw-han-3: Online
# Configure Cluster Options: Run command on first node
sudo pcs property set stonith-enabled=false # Disable fancing / stonith
sudo pcs property set no-quorum-policy=ignore # Disable quorum
# Check Cluster Options
sudo pcs property config
# Shell output:
Cluster Properties:
cluster-infrastructure: corosync
cluster-name: jkw-cluster
dc-version: 2.1.2-ada5c3b36e2
have-watchdog: false
no-quorum-policy: ignore
stonith-enabled: false
# Add nginx as cluster resource sudo pcs resource create webserver ocf:heartbeat:nginx configfile="/etc/nginx/nginx.conf" op monitor interval="30s" # Create floating IP resource sudo pcs resource create floatingip ocf:heartbeat:IPaddr2 ip=192.168.30.30 cidr_netmask=32 op monitor interval=30s # Create resource group sudo pcs resource group add nginx_group webserver floatingip
# Check the resources sudo pcs resource status # Shell output: * Resource Group: nginx_group: * webserver (ocf:heartbeat:nginx): Started jkw-han-1 * floatingip (ocf:heartbeat:IPaddr2): Started jkw-han-1
Cluster Status #
# List nodes status
sudo pcs status nodes
# Shell output:
Pacemaker Nodes:
Online: jkw-han-1 jkw-han-2 jkw-han-3
Standby:
Standby with resource(s) running:
Maintenance:
Offline:
Pacemaker Remote Nodes:
Online:
Standby:
Standby with resource(s) running:
Maintenance:
Offline:
# List nodes IP addresses
sudo corosync-cmapctl | grep members
# Shell output:
runtime.members.1.config_version (u64) = 0
runtime.members.1.ip (str) = r(0) ip(192.168.30.31)
runtime.members.1.join_count (u32) = 1
runtime.members.1.status (str) = joined
runtime.members.2.config_version (u64) = 0
runtime.members.2.ip (str) = r(0) ip(192.168.30.32)
runtime.members.2.join_count (u32) = 1
runtime.members.2.status (str) = joined
runtime.members.3.config_version (u64) = 0
runtime.members.3.ip (str) = r(0) ip(192.168.30.33)
runtime.members.3.join_count (u32) = 1
runtime.members.3.status (str) = joined
# List Corosync members
sudo pcs status corosync
# Shell output:
Membership information
----------------------
Nodeid Votes Name
1 1 jkw-han-1 (local)
2 1 jkw-han-2
3 1 jkw-han-3
Usful Commands #
# Disable / enable resource
sudo pcs resource disable webserver
sudo pcs resource enable webserver
# Delte resource
sudo pcs resource delete webserver
# List resource details
sudo pcs resource config
# Stop / start the cluster
sudo pcs cluster stop --all
sudo pcs cluster start --all
# Pacemaker logs
sudo journalctl -u pacemaker
# Pacemaker status
sudo systemctl status pacemaker
# Corosync status
sudo systemctl status corosync
Testing #
# Open the floating IP in a browser
http://192.168.30.30/
# Browser output:
jkw-han-1
# Stop node 1
sudo pcs cluster stop jkw-han-1
# shell output:
jkw-han-1: Stopping Cluster (pacemaker)...
jkw-han-1: Stopping Cluster (corosync)...
# Open the floating IP in a browser
http://192.168.30.30/
# Browser output:
jkw-han-2
CRMSH Version #
Install Pacemaker, Corosync and CRMSH #
Run the following commands on all nodes:
# Install packages
sudo apt install pacemaker corosync crmsh resource-agents -y
# Enable startup
sudo systemctl enable pacemaker corosync
# Stop Corosync and Pacemaker
sudo systemctl stop corosync && sudo systemctl stop pacemaker
Create Corosync Key #
# Install haveaged for the Corosync key generation
sudo apt install haveged -y
# Create the Corosync key
sudo corosync-keygen
# Shell output:
Writing corosync key to /etc/corosync/authkey
# Check if key exists
ls -la /etc/corosync/
# Shell output:
-r-------- 1 root root 256 Aug 20 13:22 authkey
-rw-r--r-- 1 root root 2001 Jan 12 2022 corosync.conf
drwxr-xr-x 2 root root 4096 Jan 12 2022 uidgid.d
Corosync Configuration #
# Edit the corosync configuration
sudo vi /etc/corosync/corosync.conf
# Define Corosync Configuration
totem {
version: 2
cluster_name: jkw-cluster
transport: knet
crypto_cipher: aes256
crypto_hash: sha256
}
nodelist {
node {
ring0_addr: jkw-han-1
name: jkw-han-1
nodeid: 1
}
node {
ring0_addr: jkw-han-2
name: jkw-han-2
nodeid: 2
}
node {
ring0_addr: jkw-han-3
name: jkw-han-3
nodeid: 3
}
}
quorum {
provider: corosync_votequorum
}
logging {
to_logfile: yes
logfile: /var/log/corosync/corosync.log
to_syslog: yes
timestamp: on
}
Copy Corosync Configuration #
Copy the configuration to the other nodes:
# Copy the corosync configuration to ha-node-2
sudo scp -r /etc/corosync/* ubuntu@jkw-han-2:/home/ubuntu
# Copy the corosync configuration to ha-node-3
sudo scp -r /etc/corosync/* ubuntu@jkw-han-3:/home/ubuntu
# SSH into node and copy the configuration
cd
sudo cp -r * /etc/corosync
Start Cluster #
# Start Corosync on node 1
sudo systemctl start corosync && sudo systemctl enable corosync
# Shell output:
Synchronizing state of corosync.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable corosync
# Start and enable Pacemaker on all nodes
sudo systemctl start pacemaker && sudo systemctl enable pacemaker
# Shell output:
Synchronizing state of pacemaker.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable pacemaker
Cluster Status #
# Check Server status on all nodes
sudo crm status
# Shell output:
Cluster Summary:
* Stack: corosync
* Current DC: jkw-han-2 (version 2.1.2-ada5c3b36e2) - partition with quorum
* Last updated: Sun Aug 20 13:35:19 2023
* Last change: Sun Aug 20 13:34:51 2023 by hacluster via crmd on jkw-han-2
* 3 nodes configured
* 0 resource instances configured
Node List:
* Online: [ jkw-han-1 jkw-han-2 jkw-han-3 ]
Full List of Resources:
* No resources
# List node IPs
sudo corosync-cmapctl | grep members
# Shell output
ubuntu@jkw-han-1:~$ sudo corosync-cmapctl | grep members
runtime.members.1.config_version (u64) = 0
runtime.members.1.ip (str) = r(0) ip(192.168.30.31)
runtime.members.1.join_count (u32) = 1
runtime.members.1.status (str) = joined
runtime.members.2.config_version (u64) = 0
runtime.members.2.ip (str) = r(0) ip(192.168.30.32)
runtime.members.2.join_count (u32) = 1
runtime.members.2.status (str) = joined
runtime.members.3.config_version (u64) = 0
runtime.members.3.ip (str) = r(0) ip(192.168.30.33)
runtime.members.3.join_count (u32) = 1
runtime.members.3.status (str) = joined
Cluster Resources #
# Configure Cluster Options: Run command on first node
sudo crm configure property stonith-enabled=false # Disable fancing / stonith
sudo crm configure property no-quorum-policy=ignore # Disable quorum
# Check the policy status
sudo crm configure show
# Shell output:
node 1: jkw-han-1
node 2: jkw-han-2
node 3: jkw-han-3
property cib-bootstrap-options: \
have-watchdog=false \
dc-version=2.1.2-ada5c3b36e2 \
cluster-infrastructure=corosync \
cluster-name=debian \
stonith-enabled=false \
no-quorum-policy=ignore
# Create virtual IP resoure for the floting IP 192.168.70.30 sudo crm configure primitive floatingip \ ocf:heartbeat:IPaddr2 params ip="192.168.30.30" \ cidr_netmask="32" op monitor interval="10s" \ meta migration-threshold="10" # Shell output WARNING: (unpack_config) warning: Blind faith: not fencing unseen nodes
# Create webserver resource for nginx sudo crm configure primitive webserver \ ocf:heartbeat:nginx configfile=/etc/nginx/nginx.conf \ op start timeout="40s" interval="0" \ op stop timeout="60s" interval="0" \ op monitor interval="10s" timeout="60s" \ meta migration-threshold="10" # Shell output WARNING: (unpack_config) warning: Blind faith: not fencing unseen nodes
# Create resource group
sudo crm configure group jkw-cluster floatingip webserver
# Shell output:
WARNING: (unpack_config) warning: Blind faith: not fencing unseen nodes
# Check resources sudo crm resource status # Shell output: Full List of Resources: * Resource Group: jkw-cluster: * floatingip (ocf:heartbeat:IPaddr2): Started * webserver (ocf:heartbeat:nginx): Started
# Check status sudo crm status # Shell output: Cluster Summary: * Stack: corosync * Current DC: jkw-han-2 (version 2.1.2-ada5c3b36e2) - partition with quorum * Last updated: Sun Aug 20 13:44:10 2023 * Last change: Sun Aug 20 13:42:52 2023 by root via cibadmin on jkw-han-1 * 3 nodes configured * 2 resource instances configured Node List: * Online: [ jkw-han-1 jkw-han-2 jkw-han-3 ] Full List of Resources: * Resource Group: jkw-cluster: * floatingip (ocf:heartbeat:IPaddr2): Started jkw-han-1 * webserver (ocf:heartbeat:nginx): Started jkw-han-1
Testing #
# Open the floating IP in a browser
http://192.168.30.30/
# Browser output:
jkw-han-1
# Stop node 1: Run command on node 1
sudo crm cluster stop
# shell output:
INFO: Cluster services stopped
# Open the floating IP in a browser
http://192.168.30.30/
# Browser output:
jkw-han-2