Overview #
I’m using the following storage layout on all three nodes:
# List block devices
lsblk
# Shell output:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sr0 11:0 1 1024M 0 rom
nvme0n1 259:0 0 20G 0 disk
├─nvme0n1p1 259:1 0 1G 0 part /boot
└─nvme0n1p2 259:2 0 19G 0 part
├─rl-root 253:0 0 17G 0 lvm /var/lib/containers/storage/overlay
│ /
└─rl-swap 253:1 0 2G 0 lvm [SWAP]
nvme0n2 259:3 0 20G 0 disk
nvme0n3 259:4 0 20G 0 disk
The VMs are based on Rocky Linux 9.4, with 4 CPU cores and 8 GB RAM.
192.168.30.100 rocky1 # Initial / Bootstrap Node
192.168.30.101 rocky2 # Node 2
192.168.30.102 rocky3 # Node 3
Prerequisites #
Add hosts entries and install the dependencies on all nodes.
Hosts Entry #
For Ceph it’s recommended to use simple hostnames rather than fully qualified domain names (FQDNs).
# Add hosts entry
sudo tee -a /etc/hosts<<EOF
192.168.30.100 rocky1
192.168.30.101 rocky2
192.168.30.102 rocky3
EOF
Install Dependencies #
# Upgrade packages
sudo dnf update -y && sudo dnf upgrade -y
# Install dependencies
sudo dnf install python3 lvm2 podman -y
Initialize Ceph Cluster #
On the first node, install Cephadm und bootstrap the cluster.
Install Cephadm & Ceph-Common #
# Download Cephadm & change permission
curl --silent --remote-name --location https://download.ceph.com/rpm-18.2.2/el9/noarch/cephadm &&
chmod +x cephadm
# Add Ceph "reef" repository
./cephadm add-repo --release reef
# Optional: Install Cephadm (install binary to "/usr/sbin/cephadm")
./cephadm install
# Verify the installation
which cephadm
# Install Ceph-Common (For CLI usage)
dnf update -y && dnf install ceph-common -y
# Verify the Ceph CLI (Ceph-Command) / check version
ceph --version
Bootstrap the Ceph Cluster #
# Bootstrap the Cluster: Create initial monitor and manager node
cephadm bootstrap --mon-ip 192.168.30.100 \
--initial-dashboard-user admin \
--initial-dashboard-password my-secure-pw
# Shell output:
Ceph Dashboard is now available at:
URL: https://rocky1:8443/
User: admin
Password: my-secure-pw
Enabling client.admin keyring and conf on hosts with "admin" label
Saving cluster configuration to /var/lib/ceph/c21564f0-3200-11ef-85f9-000c29ad85a9/config directory
Enabling autotune for osd_memory_target
You can access the Ceph CLI as following in case of multi-cluster or non-default config:
sudo /usr/sbin/cephadm shell --fsid c21564f0-3200-11ef-85f9-000c29ad85a9 -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring
Or, if you are only running a single cluster on this host:
sudo /usr/sbin/cephadm shell
Please consider enabling telemetry to help improve Ceph:
ceph telemetry on
For more information see:
https://docs.ceph.com/en/latest/mgr/telemetry/
Bootstrap complete.
Verify the Cluster Status #
# Check the cluster status
ceph status
# Shell output:
cluster:
id: c21564f0-3200-11ef-85f9-000c29ad85a9
health: HEALTH_WARN
OSD count 0 < osd_pool_default_size 3
services:
mon: 1 daemons, quorum rocky1 (age 75s)
mgr: rocky1.ybgqbk(active, starting, since 0.0665046s)
osd: 0 osds: 0 up, 0 in
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 0 B used, 0 B / 0 B avail
pgs:
# List Ceph services
ceph orch ps
# Shell output:
NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID
alertmanager.rocky1 rocky1 *:9093,9094 running (9s) 6s ago 48s 15.5M - 0.25.0 c8568f914cd2 a275295ebed3
ceph-exporter.rocky1 rocky1 running (58s) 6s ago 58s 5855k - 18.2.2 3c937764e6f5 1e8865d0ca47
crash.rocky1 rocky1 running (57s) 6s ago 57s 6656k - 18.2.2 3c937764e6f5 cc41d87bec04
grafana.rocky1 rocky1 *:3000 running (7s) 6s ago 31s 39.4M - 9.4.7 954c08fa6188 907e90c75a03
mgr.rocky1.ybgqbk rocky1 *:9283,8765,8443 running (91s) 6s ago 91s 458M - 18.2.2 3c937764e6f5 fae8713cc499
mon.rocky1 rocky1 running (92s) 6s ago 94s 31.0M 2048M 18.2.2 3c937764e6f5 ec9e8e556943
node-exporter.rocky1 rocky1 *:9100 running (54s) 6s ago 54s 15.4M - 1.5.0 0da6a335fe13 2a631d670969
prometheus.rocky1 rocky1 *:9095 running (23s) 6s ago 23s 31.3M - 2.43.0 a07b618ecd1d 454f15c54703
Ceph Dashboard #
Custom TLS Certificate #
As always I’m using Let’s Encrypt wildcard certificates.
# Upload custom TLS certificate
ceph dashboard set-ssl-certificate -i ./fullchain.pem &&
ceph dashboard set-ssl-certificate-key -i ./privkey.pem
# Shell output:
SSL certificate updated
SSL certificate key updated
# Restart Dashboard module
ceph mgr module disable dashboard &&
ceph mgr module enable dashboard
Hosts Entry #
# Create a hosts entry for the Ceph Dashboard
192.168.30.100 ceph.jklug.work
Access the Dashboard #
# Access the Ceph Dashboard
https://ceph.jklug.work:8443
Use the user and password defined in the bootstrap command:
# User
admin
# Password
my-secure-pw
Add Cluster Resources #
Add OSDs (Object Storage Daemons) #
List Available Devices #
# List available storage devices
ceph orch device ls
# Shell output:
HOST PATH TYPE DEVICE ID SIZE AVAILABLE REFRESHED REJECT REASONS
rocky1 /dev/nvme0n2 ssd VMware_Virtual_NVMe_Disk_VMware_NVME_0000 20.0G Yes 32s ago
rocky1 /dev/nvme0n3 ssd VMware_Virtual_NVMe_Disk_VMware_NVME_0000 20.0G Yes 32s ago
Add all Available Devices #
# Add any available and unused device
ceph orch apply osd --all-available-devices
# Shell out:
Scheduled osd.all-available-devices update...
Add specific Device #
# Add specific OSD
ceph orch daemon add osd rocky1:/dev/nvme0n2
ceph orch daemon add osd rocky1:/dev/nvme0n3
# Shell output:
Created osd(s) 0 on host 'rocky1'
Created osd(s) 1 on host 'rocky1'
List OSD Devices #
# List cluster OSDs
ceph osd tree
# Shell output
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.03897 root default
-3 0.03897 host rocky1
0 ssd 0.01949 osd.0 up 1.00000 1.00000
1 ssd 0.01949 osd.1 up 1.00000 1.00000
# List cluster OSDs: Details
ceph osd status
# Shell output:
ID HOST USED AVAIL WR OPS WR DATA RD OPS RD DATA STATE
0 rocky1 26.4M 19.9G 0 0 0 0 exists,up
1 rocky1 26.4M 19.9G 0 0 0 0 exists,up
# List details about specific OSD
ceph osd find 0
# Shell output:
{
"osd": 0,
"addrs": {
"addrvec": [
{
"type": "v2",
"addr": "192.168.30.100:6802",
"nonce": 3273157086
},
{
"type": "v1",
"addr": "192.168.30.100:6803",
"nonce": 3273157086
}
]
},
"osd_fsid": "43749334-1fa2-49a3-abbe-da71d9ab1b12",
"host": "rocky1",
"crush_location": {
"host": "rocky1",
"root": "default"
}
}
# List details about OSD performance stats
ceph osd perf
Add Nodes to Cluster #
Copy SSH Key #
# Copy the Ceph SSH key to the other Ceph nodes
ssh-copy-id -f -i /etc/ceph/ceph.pub root@192.168.30.101
ssh-copy-id -f -i /etc/ceph/ceph.pub root@192.168.30.102
Add Additional Nodes: #
# Add the other nodes to the Ceph cluster
ceph orch host add rocky2 192.168.30.101
ceph orch host add rocky3 192.168.30.102
# Shell output:
Added host 'rocky2' with addr '192.168.30.101'
Added host 'rocky3' with addr '192.168.30.102'
Verify the Cluster Nodes #
# List cluster nodes
ceph orch host ls
# Shell output:
HOST ADDR LABELS STATUS
rocky1 192.168.30.100 _admin
rocky2 192.168.30.101
rocky3 192.168.30.102
3 hosts in cluster
# List cluster nodes and roles
ceph node ls
# Shell output:
{
"mon": {
"rocky1": [
"rocky1"
],
"rocky2": [
"rocky2"
],
"rocky3": [
"rocky3"
]
},
"osd": {
"rocky1": [
0,
1
]
},
"mgr": {
"rocky1": [
"rocky1.ybgqbk"
],
"rocky2": [
"rocky2.megwzq"
]
}
}
Add OSD #
# List available devices
ceph orch device ls
# Shell output:
HOST PATH TYPE DEVICE ID SIZE AVAILABLE REFRESHED REJECT REASONS
rocky1 /dev/nvme0n2 ssd VMware_Virtual_NVMe_Disk_VMware_NVME_0000 20.0G No 3m ago Has a FileSystem, Insufficient space (<10 extents) on vgs, LVM detected
rocky1 /dev/nvme0n3 ssd VMware_Virtual_NVMe_Disk_VMware_NVME_0000 20.0G No 3m ago Has a FileSystem, Insufficient space (<10 extents) on vgs, LVM detected
rocky2 /dev/nvme0n2 ssd VMware_Virtual_NVMe_Disk_VMware_NVME_0000 20.0G Yes 3m ago
rocky2 /dev/nvme0n3 ssd VMware_Virtual_NVMe_Disk_VMware_NVME_0000 20.0G Yes 3m ago
rocky3 /dev/nvme0n2 ssd VMware_Virtual_NVMe_Disk_VMware_NVME_0000 20.0G Yes 3m ago
rocky3 /dev/nvme0n3 ssd VMware_Virtual_NVMe_Disk_VMware_NVME_0000 20.0G Yes 3m ago
# Add OSDs from the other nodes: Manually define OSDs
ceph orch daemon add osd rocky2:/dev/nvme0n2
ceph orch daemon add osd rocky2:/dev/nvme0n3
ceph orch daemon add osd rocky3:/dev/nvme0n2
ceph orch daemon add osd rocky3:/dev/nvme0n3
# Add all available OSDs
ceph orch apply osd --all-available-devices
# List cluster OSDs: Details
ceph osd status
# Shell output:
ID HOST USED AVAIL WR OPS WR DATA RD OPS RD DATA STATE
0 rocky1 27.2M 19.9G 0 0 0 0 exists,up
1 rocky1 26.5M 19.9G 0 0 0 0 exists,up
2 rocky3 27.2M 19.9G 0 0 0 0 exists,up
3 rocky2 26.5M 19.9G 0 0 0 0 exists,up
4 rocky3 26.5M 19.9G 0 0 0 0 exists,up
5 rocky2 27.2M 19.9G 0 0 0 0 exists,up
Storage Pools #
Overview #
Number of Placement Groups “pg_num”:
-
Number of placement groups in a Ceph pool. Each placement group is essentially a bucket that data objects are placed into.
-
By using placement groups, Ceph can distribute and balance the data load across all OSDs in the cluster.
-
The number of placement groups affects how well the data is distributed and balanced across the OSDs. Too few placement groups might not get optimal performance because the data is not evenly distributed. Too many could lead to overhead that might degrade performance because each OSD has to manage more placement groups.
Number of Placement Groups for Placement “pgp_num”:
-
Defines how many of the placement groups are actively used to map data placements in the pool.
-
Used to control the re-balancing of data when pg_num is changed (usually increased to scale with the cluster). It allows the cluster to adjust at a controlled pace without overwhelming the system with too much data movement at once.
List Storage Pools #
# List storage pools
ceph osd lspools
Create Storage Pool #
# Create storage pool: With 64 placement groups
ceph osd pool create pool-1 64 64 replicated
# Set the replication factor
ceph osd pool set pool-1 size 3
Explanation:
-
Pool name:
pool-1
-
Initial number of placement groups (pg_num):
64
-
Number of placement groups for placement (pgp_num):
64
-
Type:
replicated
Data is replicated across multiple OSDs for redundancy
Adjust PG and PGP Number #
# If necessary adjust the number of placement groups for optimal performance
ceph osd pool set pool-1 pg_num 128
ceph osd pool set pool-1 pgp_num 128
List Pool Details #
# List pool details
ceph osd pool get pool-1 all
# Shell output:
size: 3
min_size: 2
pg_num: 64
pgp_num: 64
crush_rule: replicated_rule
hashpspool: true
nodelete: false
nopgchange: false
nosizechange: false
write_fadvise_dontneed: false
noscrub: false
nodeep-scrub: false
use_gmt_hitset: 1
fast_read: 0
pg_autoscale_mode: on
eio: false
bulk: false
RBD Block Storage #
Create Image #
Create a RBD image on the previously created storage pool:
# Create RBD image: Syntax
rbd create --size {megabytes} {pool-name}/{image-name}
# Create RBD image: Example
rbd create --size 2048 --pool pool-1 image-1
List Images & Image Details #
# List images in a pool
rbd ls pool-1
# List image details
rbd info pool-1/image-1
# Shell output:
rbd image 'image-1':
size 2 GiB in 512 objects
order 22 (4 MiB objects)
snapshot_count: 0
id: d3c53de5e1f
block_name_prefix: rbd_data.d3c53de5e1f
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
op_features:
flags:
create_timestamp: Mon Jun 24 20:36:13 2024
access_timestamp: Mon Jun 24 20:36:13 2024
modify_timestamp: Mon Jun 24 20:36:13 2024
RBD Mapping #
Install Ceph Client #
Install the Ceph client on the Linux host where you want to use the RBD:
# Install Ceph client
sudo apt install ceph-common -y
Configure Ceph access:
# Copy the cluster configuration and authentication credentials to the client
scp /etc/ceph/ceph.conf debian@192.168.30.20:~/
scp /etc/ceph/ceph.client.admin.keyring debian@192.168.30.20:~/
Note: This is just a homelab playground and not production ready.
# Move the files and set permissions
sudo mv ~/ceph.conf /etc/ceph/ceph.conf &&
sudo mv ~/ceph.client.admin.keyring /etc/ceph/ceph.client.admin.keyring &&
sudo chown root:root /etc/ceph/ceph.conf /etc/ceph/ceph.client.admin.keyring
Map the RBD Image #
Map the image to a local device file:
# Map the RBD to a local device
sudo rbd map image-1 --pool pool-1
# Shell output:
/dev/rbd0
Mount the RBD Image #
# Create a file system on the RBD image
sudo mkfs.ext4 /dev/rbd0
# Create a mount point directory
sudo mkdir -p /mnt/ceph-image-1
# Mount the RBD image
sudo mount /dev/rbd0 /mnt/ceph-image-1
# Verify the mount
df -h
# Shell output:
Filesystem Size Used Avail Use% Mounted on
udev 1.9G 0 1.9G 0% /dev
tmpfs 389M 736K 388M 1% /run
/dev/mapper/debian--vg-root 28G 2.6G 24G 10% /
tmpfs 1.9G 0 1.9G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
/dev/sda1 455M 172M 259M 40% /boot
tmpfs 389M 0 389M 0% /run/user/1000
/dev/rbd0 2.0G 24K 1.8G 1% /mnt/ceph-image-1
# Unmount the RBD image
sudo umount /mnt/ceph-image-1
RBD Mapping with specific User #
Create User #
# Create user that can access "pool-1"
ceph auth get-or-create client.user1 osd "allow rwx pool=pool-1" mon "allow r" -o /etc/ceph/ceph.client.user1.keyring
Test User Authentication #
# Test Authentication
sudo ceph -s --id user1 --keyring /etc/ceph/ceph.client.user1.keyring
Add Keyring #
Copy the keyring of “client.user1” via SSH, or manually create and paste the keyring on the client where the image will be mapped:
# Create a keyring file for the user
sudo vi /etc/ceph/ceph.client.user1.keyring
# Paste the keyring
[client.user1]
key = AQBN7INmtgrMNhAARX6CPwUvFPJneO6gARoZhg==
Map Image #
# Map the image with the previously created user
sudo rbd --id user1 --keyring /etc/ceph/ceph.client.user1.keyring map image-1 --pool pool-1
# Shell output:
/dev/rbd0
Verify Image Mapping #
# Verify the mapped image
rbd showmapped
# Shell output:
id pool namespace image snap device
0 pool-1 image-1 - /dev/rbd0
Unmap Image #
# Unmap image
sudo rbd unmap /dev/rbd0
# Verify the unmapping
rbd showmapped
Links #
# Install Cephadm
https://docs.ceph.com/en/latest/cephadm/install/
https://docs.ceph.com/en/latest/cephadm/install/#cephadm-install-curl
# Ceph Dashboard
https://docs.ceph.com/en/quincy/mgr/dashboard/
# Add OSD
https://docs.ceph.com/en/latest/cephadm/services/osd/#cephadm-deploy-osds