Overview #
I’m using the following Ubuntu 24.04 based servers in this tuturial:
192.168.30.10 # Prometheus Docker Compose stack
192.168.30.11 # Host monitoring with Node Exporter
192.168.30.12 # Host monitoring with Node Exporter
Prerequisites #
Gmail App Password #
Create a Gmail app password, a unique password specifically for third-party applications like Alertmanage:
-
Define an App-Name like “Alertmanager”
-
Click “Create”
-
Copy the generated App-Password:
kruc rlac oudn emld
Monitoring Nodes #
Node Exporter Installation #
Node Exporter Overview:
-
The Node Exporter is used to monitor bare metal and virtual servers, it provides metrics such as CPU, memory, disk space, disk I/O, and network bandwidth.
-
It does not provide metrics about individual processes or applications.
-
Applications and services should be monitored directly and not together with the host.
The Node Exporter can be installed using a package manager like apt or run as a standalone binary in combination with a Systemd service unit.
Package Manager: Apt #
# Install the Package Manager on Debian based distros
sudo apt install prometheus-node-exporter
# Verify the installation
systemctl status prometheus-node-exporter
Binary with Systemd Service Unit #
Find the latest Node Exporter binary:
https://prometheus.io/download/#node_exporter
# Download the tar file
cd /tmp && sudo wget https://github.com/prometheus/node_exporter/releases/download/v1.8.2/node_exporter-1.8.2.linux-amd64.tar.gz
# Unpack the tar file & move the binary
sudo tar xvfz node_exporter*
sudo mv node_exporter*/node_exporter /usr/local/bin/
# Create a dedicated user for Node Exporter
sudo useradd -rs /bin/false node_exporter
# Create Systemd Service Unit file
sudo vi /etc/systemd/system/node_exporter.service
# node_exporter.service
[Unit]
Description=Prometheus exporter for machine metrics
Documentation=https://github.com/prometheus/node_exporter
[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter
[Install]
WantedBy=multi-user.target
# Reload systemd and start the service
sudo systemctl daemon-reload
# Start and enable the service
sudo systemctl start node_exporter && sudo systemctl enable node_exporter
# Verify status
sudo systemctl status node_exporter
Verify Node Exporter Metrics #
# Check the output from the exporer
curl http://localhost:9100/metrics | grep "node_"
Prometheus Stack #
File and Folder Stucture #
Create the folder structure and adapt the directory owners:
# Create folder structure
sudo mkdir -p /opt/prometheus/{alertmanager_conf,prometheus_conf,prometheus_data,grafana_data,grafana_provisioning/datasources} && cd /opt/prometheus
# Change owner
sudo chown 65534:65534 /opt/prometheus/{alertmanager_conf,prometheus_conf} &&
sudo chown 472:root /opt/prometheus/grafana_data
The file and folder structure looks like this:
/opt/prometheus
├── alertmanager_conf
│ └── alertmanager.yml
├── docker-compose.yml
├── grafana_data
├── grafana_provisioning
│ └── datasources
│ └── prometheus_ds.yml
├── prometheus_conf
│ ├── prometheus.yml
│ └── rules.yml
└── prometheus_data
Docker Compose Manifest #
# Create Docker Compose File
sudo vi docker-compose.yml
services:
prometheus:
image: prom/prometheus:latest
volumes:
- ./prometheus_conf:/etc/prometheus
- ./prometheus_data:/prometheus
ports:
- "9090:9090"
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--storage.tsdb.retention.time=30d'
- '--web.enable-lifecycle'
restart: unless-stopped
alertmanager:
image: prom/alertmanager:latest
volumes:
- ./alertmanager_conf/alertmanager.yml:/etc/alertmanager/config.yml
command:
- '--config.file=/etc/alertmanager/config.yml'
ports:
- "9093:9093"
restart: unless-stopped
grafana:
image: grafana/grafana:latest
ports:
- 3000:3000
depends_on:
- prometheus
volumes:
- ./grafana_data:/var/lib/grafana
- ./grafana_provisioning:/etc/grafana/provisioning
restart: unless-stopped
Configuration Files #
Prometheus Configuration: prometheus.yml #
# Create Prometheus configuration
sudo vi prometheus_conf/prometheus.yml
# prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 10s
rule_files:
- rules.yml
alerting:
alertmanagers:
- scheme: http
static_configs:
- targets: [ 'alertmanager:9093' ]
scrape_configs:
# Prometheus Server
- job_name: 'Prometheus-Server'
static_configs:
- targets:
- localhost:9090
# Linux Servers / Node Exporter
- job_name: 'Linux-Server'
static_configs:
- targets:
- 192.168.30.11:9100
- 192.168.30.12:9100
Prometheus Configuration: rules.yml #
# Create rules.yml
sudo vi prometheus_conf/rules.yml
# rules.yml
groups:
- name: NodeExporter
rules:
- alert: InstanceDown
expr: up{job="Linux-Server"} == 0
for: 1m
Alertmanager Configuration #
# create alertmanager.yml
sudo vi alertmanager_conf/alertmanager.yml
# alertmanager.yml
global:
resolve_timeout: 5m
route:
receiver: 'email'
repeat_interval: 1h
group_by: [ alertname ]
receivers:
- name: 'email'
email_configs:
- smarthost: 'smtp.gmail.com:587'
auth_username: 'jueklu85@gmail.com'
auth_password: "krucrlacoudnemld"
from: 'jueklu85@gmail.com'
to: 'juergen@jklug.work'
- Define the Gmail App Password without spaces in the
auth_password:
section
Grafana Data Source #
# Create Grafana data source configuration
sudo vi grafana_provisioning/datasources/prometheus_ds.yml
datasources:
- name: Prometheus
access: proxy
type: prometheus
url: http://prometheus:9090
isDefault: true
Start and Verify Docker Containers #
# Create / start Docker container
sudo docker compose up -d
Verify the Prometheus container stack:
# List containers
docker ps
# Shell output:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
697245a0fc4b grafana/grafana:latest "/run.sh" 2 hours ago Up 23 minutes 0.0.0.0:3000->3000/tcp, :::3000->3000/tcp prometheus-grafana-1
985135f1dc80 prom/alertmanager:latest "/bin/alertmanager -…" 2 hours ago Up 23 minutes 0.0.0.0:9093->9093/tcp, :::9093->9093/tcp prometheus-alertmanager-1
a365a67d3cbc prom/prometheus:latest "/bin/prometheus --c…" 2 hours ago Up 23 minutes 0.0.0.0:9090->9090/tcp, :::9090->9090/tcp prometheus-prometheus-1
Webinterfaces #
Prometheus Webinterface #
Open Webinterface #
# Open the Prometheus webinterface
http://192.168.30.10:9090
Verify Endpoints #
- Select “Status” > “Target health”
Alertmanager Webinterface #
# Open the Alertmanager webinterface
http://192.168.30.10:9093/#/alerts
Grafana Webinterface #
Login #
- Login with the default credentials and change the password
# Grafana webinterface
192.168.30.10:3000
# Default user
admin
# Default password
admin
Import the “Node Exporter Full” Dashboard #
Import the “Node Exporter Full” dashboard with the ID “1860”:
-
Go to: “Home” > “Dashboard”
-
Click “+ Create dashboard”
-
Select “Import dashboard”
-
Paste the dashboard ID “1860” and click “Load”
-
Select the “Prometheus” data source
-
Click “Import”
The dashboard should look like this:
Example Alert #
Stop Node Exporter Service #
Stop the Nope Exporter Service on one of the monitoring nodes:
# Stop the Node Exporter service
sudo systemctl stop prometheus-node-exporter
# Verify the Node Exporter service status
systemctl status prometheus-node-exporter
Verify the Prometheus Alert #
# Verify the alert in the Prometheus webinterface
http://192.168.30.10:9090/alerts
Verify the Alertmanager Alert #
# Verify the alert in the Prometheus webinterface
http://192.168.30.10:9093/#/alerts
Verify Email Alert #
An Email alert like this should be sent:
Links #
# Prometheus GitHub
https://github.com/prometheus
# Prometheus Alerts
https://samber.github.io/awesome-prometheus-alerts/
# Download Prometheus
https://prometheus.io/download/
# Download Node Exporter
https://prometheus.io/download/#node_exporter
# Official Documentation
https://prometheus.io/docs/introduction/overview/
# Exporters
https://prometheus.io/docs/instrumenting/exporters/
# Node Exporter
https://prometheus.io/docs/guides/node-exporter/
# Node Exporter GitHub
https://github.com/prometheus/node_exporter