Overview #
My Setup #
I’m using the following setup based on Ubuntu 24.04 servers:
192.168.30.10 # HAProxy LoadBalancer
192.168.30.11 # Etcd Node
192.168.30.12 # Postgres & Patroni - Data-Node 1
192.168.30.13 # Postgres & Patroni - Data-Node 2
192.168.30.14 # Postgres & Patroni - Data-Node 2
Flow Chart #
read only) -.->|Port 5000| HAproxy App2(Application
read and write) -.->|Port 5001| HAproxy["HAproxy"] HAproxy -.-> Data-Node-1 HAproxy -.-> Data-Node-2 HAproxy -.-> Data-Node-3 %% Postgres Cluster subgraph Cluster[PostgreSQL HA Cluster] %% Data-Node 1 subgraph Data-Node-1[Data-Node 1] P1[Patroni] -.->|health check
decide role| PG1[PostgreSQL] end %% Data-Node 2 subgraph Data-Node-2[Data-Node 2] P2[Patroni] -.->|health check
decide role| PG2[PostgreSQL] end %% Data-Node 3 subgraph Data-Node-3[Data-Node 3] P3[Patroni] -.->|health check
decide role| PG3[PostgreSQL] end end P1 <-.->|Synchronize status| etcd P2 <-.->|Synchronize status| etcd P3 <-.->|Synchronize status| etcd etcd((etcd
Key Value Store)) %% Styling for Bold and Larger Text classDef boldLargeText font-weight:bold class HAproxy,etcd,Cluster boldLargeText %% Bigger Front Size style HAproxy font-size:24px style etcd font-size:24px %% Dashed Border classDef themeColor2 stroke-dasharray:5 5; class Data-Node-1,Data-Node-2,Data-Node-3 themeColor2
Patroni #
Patroni is an open-source tool designed to provide high availability and automated failover for PostgreSQL clusters:
-
Automatically manages primary and replica nodes.
-
Performs automated failover when the primary node becomes unavailable.
-
Uses a distributed consensus store (like etcd, Consul, or ZooKeeper) to elect the primary node and ensure cluster state consistency.
-
PostgreSQL’s built-in streaming replication is used to synchronize data from the primary node to replicas.
Prerequisites #
Set Hostname #
# Set the hostname: "haproxy" node
sudo hostnamectl set-hostname haproxy &&
sudo sed -i 's/^127.0.1.1.*/127.0.1.1\t haproxy/' /etc/hosts &&
sudo reboot
# Set the hostname: "etcd" node
sudo hostnamectl set-hostname etcd &&
sudo sed -i 's/^127.0.1.1.*/127.0.1.1\t etcd/' /etc/hosts &&
sudo reboot
# Set the hostname: "data-node1" node
sudo hostnamectl set-hostname data-node1 &&
sudo sed -i 's/^127.0.1.1.*/127.0.1.1\t data-node1/' /etc/hosts &&
sudo reboot
# Set the hostname: "data-node2" node
sudo hostnamectl set-hostname data-node2 &&
sudo sed -i 's/^127.0.1.1.*/127.0.1.1\t data-node2/' /etc/hosts &&
sudo reboot
# Set the hostname: "data-node3" node
sudo hostnamectl set-hostname data-node3 &&
sudo sed -i 's/^127.0.1.1.*/127.0.1.1\t data-node3/' /etc/hosts &&
sudo reboot
DNS Entry #
Make sure the five hosts can resolve the DNS names, in this setup I’m using a local hosts entries.
# Create the following DNS entries
192.168.30.10 haproxy
192.168.30.11 etcd
192.168.30.12 data-node1
192.168.30.13 data-node2
192.168.30.14 data-node3
Append the DNS entires on all 5 nodes:
# Append DNS entries to /etc/hosts
sudo tee -a /etc/hosts <<EOF
192.168.30.10 haproxy
192.168.30.11 etcd
192.168.30.12 data-node1
192.168.30.13 data-node2
192.168.30.14 data-node3
EOF
HAproxy Setup #
HAproxy Installation #
# Install HAproxy on the HAProxy node
sudo apt -y install haproxy
HAproxy Configuration #
# Edit HAproxy configuration
sudo vi /etc/haproxy/haproxy.cfg
Replace the original configuration:
global
maxconn 100
log 127.0.0.1 local2
defaults
log global
mode tcp
retries 2
timeout client 30m
timeout connect 4s
timeout server 30m
timeout check 5s
listen stats
mode http
bind *:7000
stats enable
stats uri /
# Frontend for general traffic (read-only and leader traffic)
listen postgres
bind *:5000
mode tcp
option httpchk GET /health
http-check expect status 200
default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
server node1 192.168.30.12:5432 maxconn 100 check port 8008 # Node 1
server node2 192.168.30.13:5432 maxconn 100 check port 8008 # Node 2
server node3 192.168.30.14:5432 maxconn 100 check port 8008 # Node 3
# Frontend for leader-only traffic
listen patroni_leader
bind *:5001 # Use a dedicated port for leader connections
mode tcp
option httpchk OPTIONS /leader
http-check expect status 200
default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
server node1 192.168.30.12:5432 maxconn 100 check port 8008 # Node 1
server node2 192.168.30.13:5432 maxconn 100 check port 8008 # Node 2
server node3 192.168.30.14:5432 maxconn 100 check port 8008 # Node 3
Restart HAproxy Service #
# Restart HAproxy service
sudo systemctl restart haproxy
Etcd Setup #
Etcd Installation #
# Download archive
wget https://github.com/etcd-io/etcd/releases/download/v3.5.17/etcd-v3.5.17-linux-amd64.tar.gz
# Extract the archive
tar -xvf etcd-v3.5.17-linux-amd64.tar.gz
# Move binaries
sudo mv etcd-v3.5.17-linux-amd64/etcd /usr/local/bin/
sudo mv etcd-v3.5.17-linux-amd64/etcdctl /usr/local/bin/
Verify the installation:
# Verify installation / check version
etcd --version
# Shell output:
etcd Version: 3.5.17
Git SHA: 507c0de
Go Version: go1.22.9
Go OS/Arch: linux/amd64
# Verify installation / check version
etcdctl version
# Shell output:
etcdctl version: 3.5.17
API version: 3.5
Create Etcd User & Group #
# Create a dedicated system user and group for running etcd
sudo useradd --system --no-create-home --shell /usr/sbin/nologin etcd
Create Etcd Directory #
# Create a directory for Etcd
sudo mkdir -p /var/lib/etcd
# Set owner
sudo chown -R etcd:etcd /var/lib/etcd
# Set permissions
sudo chmod 700 /var/lib/etcd
Create Systemd Service Unit #
# Create a Systemd service unit file
sudo vi /etc/systemd/system/etcd.service
[Unit]
Description=etcd key-value store
Documentation=https://etcd.io
After=network.target
[Service]
Type=notify
ExecStart=/usr/local/bin/etcd \
--name=etcd0 \
--data-dir=/var/lib/etcd \
--listen-peer-urls=http://192.168.30.11:2380 \
--listen-client-urls=http://127.0.0.1:2379,http://192.168.30.11:2379 \
--initial-advertise-peer-urls=http://192.168.30.11:2380 \
--initial-cluster=etcd0=http://192.168.30.11:2380 \
--advertise-client-urls=http://192.168.30.11:2379 \
--initial-cluster-token=etcd-cluster \
--initial-cluster-state=new \
--enable-v2=true
Restart=always
RestartSec=5
User=etcd
Group=etcd
LimitNOFILE=40000
[Install]
WantedBy=multi-user.target
Start and Enable Etcd Service #
# Reload systemd daemon to pick up the new service file
sudo systemctl daemon-reload
# Enable and start the Etcd service
sudo systemctl start etcd && sudo systemctl enable etcd
# Verify the Etcd service status
systemctl status etcd
# Check logs
journalctl -xeu etcd.service -o cat | less -S
Verify Etcd #
# List Etcd nodes and status
etcdctl --endpoints=http://127.0.0.1:2379 member list
# Shell output:
b44f06e0650d7775, started, etcd0, http://192.168.30.11:2380, http://192.168.30.11:2379, false
Postgres Setup #
Postgres Installation #
Install PostgreSQL on the three Postgres nodes:
# Install the "postgresql-common" package, used for the helper scripts
sudo apt install postgresql-common -y
# Add the PostgreSQL (PGDG) repository for latest PostgreSQL versions (bypass installation prompt)
yes "" | sudo /usr/share/postgresql-common/pgdg/apt.postgresql.org.sh
# Install PostgreSQL
sudo apt install postgresql -y
# Stop the PostgreSQL service after installation
sudo systemctl stop postgresql
# Disable the PostgreSQL service automatic start of boot
sudo systemctl disable postgresql
# Remove any existing PostgreSQL data directory, Patroni will handle data initialization
sudo rm -rf /var/lib/postgresql/17/main
-
By default, the PostgreSQL installation initializes a standalone PostgreSQL instance in
/var/lib/postgresql/<version>/main
-
If this instance is running, it will conflict with Patroni, as Patroni uses its own managed data directory in
/var/lib/postgresql/patroni-packages/data
Verify the installation:
# Verify the installation / check version
psql --version
# Shell output:
psql (PostgreSQL) 17.2 (Ubuntu 17.2-1.pgdg24.04+1)
Create Symlink for Patroni #
Patroni requirement:
# Create a symlink on all three Postgres nodes (adapt the PostgreSQL version)
sudo ln -s /usr/lib/postgresql/17/bin/* /usr/sbin
Patroni Setup #
Prerequisites Installation #
# Install Python, pip & venv
sudo apt install net-tools python3-pip python3-dev libpq-dev python3-venv -y
Patroni Installation #
# Switch to "postgres" user
sudo su postgres
# CD into home directory
cd $HOME
# Create and active Python venv
python3 -m venv patroni-packages
source patroni-packages/bin/activate
# Install packages
pip3 install --upgrade setuptools pip
pip install psycopg[binary] patroni python-etcd
# Create "data" directory
cd $HOME/patroni-packages
touch patroni.yml
mkdir -p data
chmod 700 data
Patroni Configuration #
Postgres Node 1 #
Postgres Node 1: 192.168.30.12
# Create a configuration file for Patroni
vi patroni.yml
scope: postgres
namespace: Cluster
name: data-node1 # Define node name
restapi:
listen: 192.168.30.12:8008 # Define node IP
connect_address: 192.168.30.12:8008 # Define node IP
etcd:
host: 192.168.30.11:2379 # Define Etcd node IP
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
initdb:
- encoding: UTF8
- data-checksums
pg_hba:
- host replication replicator 127.0.0.1/32 md5
- host replication replicator 192.168.30.12/0 md5 # Define node 1 IP
- host replication replicator 192.168.30.13/0 md5 # Define node 2 IP
- host replication replicator 192.168.30.14/0 md5 # Define node 3 IP
- host all all 0.0.0.0/0 md5
users:
admin:
password: admin
options:
- createrole
- createdb
postgresql:
listen: 192.168.30.12:5432 # Define node IP
connect_address: 192.168.30.12:5432 # Define node IP
data_dir: /var/lib/postgresql/patroni-packages/data
pgpass: /tmp/pgpass
authentication:
replication:
username: replicator
password: 'my-secure-pw'
superuser:
username: postgres
password: 'my-secure-pw'
parameters:
unix_socket_directories: '.'
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
Postgres Node 2 #
Postgres Node 2: 192.168.30.13
# Create a configuration file for Patroni
vi patroni.yml
scope: postgres
namespace: Cluster
name: data-node2 # Define node name
restapi:
listen: 192.168.30.13:8008 # Define node IP
connect_address: 192.168.30.13:8008 # Define node IP
etcd:
host: 192.168.30.11:2379 # Define Etcd node IP
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
initdb:
- encoding: UTF8
- data-checksums
pg_hba:
- host replication replicator 127.0.0.1/32 md5
- host replication replicator 192.168.30.12/0 md5 # Define node 1 IP
- host replication replicator 192.168.30.13/0 md5 # Define node 2 IP
- host replication replicator 192.168.30.14/0 md5 # Define node 3 IP
- host all all 0.0.0.0/0 md5
users:
admin:
password: admin
options:
- createrole
- createdb
postgresql:
listen: 192.168.30.13:5432 # Define node IP
connect_address: 192.168.30.13:5432 # Define node IP
data_dir: /var/lib/postgresql/patroni-packages/data
pgpass: /tmp/pgpass
authentication:
replication:
username: replicator
password: 'my-secure-pw'
superuser:
username: postgres
password: 'my-secure-pw'
parameters:
unix_socket_directories: '.'
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
Postgres Node 3 #
Postgres Node 3: 192.168.30.14
# Create a configuration file for Patroni
vi patroni.yml
scope: postgres
namespace: Cluster
name: data-node3 # Define node name
restapi:
listen: 192.168.30.14:8008 # Define node IP
connect_address: 192.168.30.14:8008 # Define node IP
etcd:
host: 192.168.30.11:2379 # Define Etcd node IP
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
initdb:
- encoding: UTF8
- data-checksums
pg_hba:
- host replication replicator 127.0.0.1/32 md5
- host replication replicator 192.168.30.12/0 md5 # Define node 1 IP
- host replication replicator 192.168.30.13/0 md5 # Define node 2 IP
- host replication replicator 192.168.30.14/0 md5 # Define node 3 IP
- host all all 0.0.0.0/0 md5
users:
admin:
password: admin
options:
- createrole
- createdb
postgresql:
listen: 192.168.30.14:5432 # Define node IP
connect_address: 192.168.30.14:5432 # Define node IP
data_dir: /var/lib/postgresql/patroni-packages/data
pgpass: /tmp/pgpass
authentication:
replication:
username: replicator
password: 'my-secure-pw'
superuser:
username: postgres
password: 'my-secure-pw'
parameters:
unix_socket_directories: '.'
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
Exit Postgres User #
# Exit the "postgres" user
exit
Create Systemd Service Unit #
# Create a service unit for Patroni
sudo vi /etc/systemd/system/patroni.service
[Unit]
Description=High availability PostgreSQL Cluster
After=syslog.target network.target
[Service]
Type=simple
User=postgres
Group=postgres
ExecStart=/var/lib/postgresql/patroni-packages/bin/patroni /var/lib/postgresql/patroni-packages/patroni.yml
KillMode=process
TimeoutSec=30
Restart=no
[Install]
WantedBy=multi-user.target
Start Patroni Service #
# Start and enable the Patroni service
sudo systemctl restart patroni && sudo systemctl enable patroni
Verify PostgreSQL HA Cluster #
Verify Cluster Status #
Run the following command on any of the Data nodes:
# List PostgreSQL HA cluster status
sudo su postgres -c "/var/lib/postgresql/patroni-packages/bin/patronictl -c /var/lib/postgresql/patroni-packages/patroni.yml list"
# Shell output:
+ Cluster: postgres (7446863254965160475) ---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+------------+---------------+---------+-----------+----+-----------+
| data-node1 | 192.168.30.12 | Replica | streaming | 1 | 0 |
| data-node2 | 192.168.30.13 | Replica | streaming | 1 | 0 |
| data-node3 | 192.168.30.14 | Leader | running | 1 | |
+------------+---------------+---------+-----------+----+-----------+
Test Failover #
SSH into the PostgreSQL master “data-node3” and stop the Patroni service:
# Stop Patroni service
sudo systemctl stop patroni
Verify the PostgreSQL HA cluster status, in this example “data-node2” is now the master:
# List PostgreSQL HA cluster status
sudo su postgres -c "/var/lib/postgresql/patroni-packages/bin/patronictl -c /var/lib/postgresql/patroni-packages/patroni.yml list"
# Shell output:
+ Cluster: postgres (7446863254965160475) ---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+------------+---------------+---------+-----------+----+-----------+
| data-node1 | 192.168.30.12 | Replica | streaming | 2 | 0 |
| data-node2 | 192.168.30.13 | Leader | running | 2 | |
| data-node3 | 192.168.30.14 | Replica | stopped | | unknown |
+------------+---------------+---------+-----------+----+-----------+
Connect from External Host #
Install Postgres Client #
# Install the "postgresql-common" package, used for the helper scripts
sudo apt install postgresql-common -y
# Add the PostgreSQL (PGDG) repository for latest PostgreSQL versions (bypass installation prompt)
yes "" | sudo /usr/share/postgresql-common/pgdg/apt.postgresql.org.sh
# Install Postgres client
sudo apt install postgresql-client -y
Connect to PostgreSQL (Read Only) #
# Connect to PostgreSQL via the HAproxy IP
psql -h haproxy -p 5000 -d postgres -U postgres
# Shell output
Password for user postgres: # Enter PW
Connect to PostgreSQL (Master Node) #
# Connect to PostgreSQL via the HAproxy IP
psql -h 192.168.30.10 -p 5001 -d postgres -U postgres
# Shell output
Password for user postgres: # Enter PW
Create Example DB #
Connect to the PostgreSQL master node:
# Create example database "example_database"
create database example_database;
# Connect to "example_database" database
\c example_database
# Create some tables
create table some_table_1(a varchar(100), b timestamp);
create table some_table_2(a varchar(100), b timestamp);
create table some_table_3(a varchar(100), b timestamp);
# Insert some values
insert into some_table_1 values('some value', '2024-01-01 01:01:01');
insert into some_table_2 values('some value', '2024-01-01 01:01:01');
insert into some_table_3 values('some value', '2024-01-01 01:01:01');
List Databases #
# List databases
\l
# Shell output:
Name | Owner | Encoding | Locale Provider | Collate | Ctype | Locale | ICU Rules | Access privileges
------------------+----------+----------+-----------------+-------------+-------------+--------+-----------+-----------------------
example_database | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | | |
postgres | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | | |
template0 | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | | | =c/postgres +
| | | | | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | | | =c/postgres +
| | | | | | | | postgres=CTc/postgres
(4 rows)
More & Troubleshooting #
Verify PostfreSQL Node Role #
Verify if the current PostgreSQL node is the master or a replica node:
# Verify if the node is a recovery node:
SELECT pg_is_in_recovery();
# Shell output:
pg_is_in_recovery
-------------------
t
(1 row)
Verify Patroni #
Verify the Patroni nodes are accessible from the HAproxy node:
# Curl Data-Node 1
curl http://192.168.30.12:8008/health
# Curl Data-Node 2
curl http://192.168.30.13:8008/health
# Curl Data-Node 3
curl http://192.168.30.14:8008/health
# Shell output # Curl Data-Node 1
{"state": "running", "postmaster_start_time": "2024-12-10 11:24:32.481713+00:00", "role": "primary", "server_version": 170002, "xlog": {"location": 67434680}, "timeline": 1, "replication": [{"usename": "replicator", "application_name": "data-node2", "client_addr": "192.168.30.13", "state": "streaming", "sync_state": "async", "sync_priority": 0}, {"usename": "replicator", "application_name": "data-node3", "client_addr": "192.168.30.14", "state": "streaming", "sync_state": "async", "sync_priority": 0}], "dcs_last_seen": 1733832092, "database_system_identifier": "7446742596774099079", "patroni": {"version": "4.0.4", "scope": "postgres", "name": "data-node1"}}
# Shell output # Curl Data-Node 2
{"state": "running", "postmaster_start_time": "2024-12-10 11:24:34.343277+00:00", "role": "replica", "server_version": 170002, "xlog": {"received_location": 67434680, "replayed_location": 67434680, "replayed_timestamp": "2024-12-10 11:26:32.593877+00:00", "paused": false}, "timeline": 1, "replication_state": "streaming", "dcs_last_seen": 1733832102, "database_system_identifier": "7446742596774099079", "patroni": {"version": "4.0.4", "scope": "postgres", "name": "data-node2"}}
# Shell output # Curl Data-Node 3
{"state": "running", "postmaster_start_time": "2024-12-10 11:24:35.093581+00:00", "role": "replica", "server_version": 170002, "xlog": {"received_location": 67434680, "replayed_location": 67434680, "replayed_timestamp": "2024-12-10 11:26:32.593877+00:00", "paused": false}, "timeline": 1, "replication_state": "streaming", "dcs_last_seen": 1733832112, "database_system_identifier": "7446742596774099079", "patroni": {"version": "4.0.4", "scope": "postgres", "name": "data-node3"}}
Connect to PostgreSQL Instances #
Directly access the PostgrSQL servers from the HAproxy node:
# Connect to PostgreSQL server 1
psql -h 192.168.30.12 -p 5432 -U postgres -d postgres
# Connect to PostgreSQL server 2
psql -h 192.168.30.13 -p 5432 -U postgres -d postgres
# Connect to PostgreSQL server 3
psql -h 192.168.30.14 -p 5432 -U postgres -d postgres
Verify Status & List Logs #
Patroni:
# Verify Patroni status
systemctl status patroni
# List Patroni logs
journalctl -u patroni | tail
# Restart Patroni
sudo systemctl restart patroni
HAproxy:
```shell
# Check the HAproxy logs when necessary
sudo tail -f /var/log/haproxy.log
Reinitialize Patroni #
# Verify Patroni status
sudo su postgres -c "/var/lib/postgresql/patroni-packages/bin/patronictl -c /var/lib/postgresql/patroni-packages/patroni.yml list"
# Shell output:
+ Cluster: postgres (7446742596774099079) ------------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+------------+---------------+---------+--------------+----+-----------+
| data-node1 | 192.168.30.12 | Replica | start failed | | unknown |
| data-node2 | 192.168.30.13 | Replica | streaming | 3 | 0 |
| data-node3 | 192.168.30.14 | Leader | running | 3 | |
+------------+---------------+---------+--------------+----+-----------+
# Stop the Patroni service
sudo systemctl stop patroni
# Clear the data directory
sudo rm -rf /var/lib/postgresql/patroni-packages/data
# Start Patroni to reinitialize
sudo systemctl start patroni
# Verify Patroni status
sudo su postgres -c "/var/lib/postgresql/patroni-packages/bin/patronictl -c /var/lib/postgresql/patroni-packages/patroni.yml list"
# Shell output:
+ Cluster: postgres (7446742596774099079) ---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+------------+---------------+---------+-----------+----+-----------+
| data-node1 | 192.168.30.12 | Replica | streaming | 3 | 0 |
| data-node2 | 192.168.30.13 | Replica | streaming | 3 | 0 |
| data-node3 | 192.168.30.14 | Leader | running | 3 | |
+------------+---------------+---------+-----------+----+-----------+