Overview #
Kubernetes Setup #
In this tutorial I’m using the following K3s Kubernetes cluster:
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
ubuntu1 Ready control-plane,master 20d v1.30.5+k3s1 192.168.30.10 <none> Ubuntu 24.04.1 LTS 6.8.0-45-generic containerd://1.7.21-k3s2
ubuntu2 Ready worker 20d v1.30.5+k3s1 192.168.30.11 <none> Ubuntu 24.04.1 LTS 6.8.0-45-generic containerd://1.7.21-k3s2
ubuntu3 Ready worker 20d v1.30.5+k3s1 192.168.30.12 <none> Ubuntu 24.04.1 LTS 6.8.0-45-generic containerd://1.7.21-k3s2
ubuntu4 Ready worker 20d v1.30.5+k3s1 192.168.30.13 <none> Ubuntu 24.04.1 LTS 6.8.0-45-generic containerd://1.7.21-k3s2
Kube-Prometheus-Stack Installation #
Add Helm Repository #
# Add Helm repository
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts &&
helm repo update
Install Kube Prometheus Stack #
# Install Kube Prometheus Stack
helm upgrade --install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace
# Shell output:
NAME: kube-prometheus-stack
LAST DEPLOYED: Mon Oct 28 10:07:25 2024
NAMESPACE: monitoring
STATUS: deployed
REVISION: 1
NOTES:
kube-prometheus-stack has been installed. Check its status by running:
kubectl --namespace monitoring get pods -l "release=kube-prometheus-stack"
Visit https://github.com/prometheus-operator/kube-prometheus for instructions on how to create & configure Alertmanager and Prometheus instances using the Operator.
Verify Deployment Resources #
# List pods in the "monitoring" namespace
kubectl get pods -n monitoring
# Shell output:
NAME READY STATUS RESTARTS AGE
alertmanager-kube-prometheus-stack-alertmanager-0 2/2 Running 0 72s
kube-prometheus-stack-grafana-6fb84bdc8c-ssslc 3/3 Running 0 78s
kube-prometheus-stack-kube-state-metrics-76bf68bd74-f9zv5 1/1 Running 0 78s
kube-prometheus-stack-operator-9c988c48b-jpksp 1/1 Running 0 78s
kube-prometheus-stack-prometheus-node-exporter-4rkvn 1/1 Running 0 78s
kube-prometheus-stack-prometheus-node-exporter-mv9p9 1/1 Running 0 78s
kube-prometheus-stack-prometheus-node-exporter-z77zm 1/1 Running 0 78s
kube-prometheus-stack-prometheus-node-exporter-zr2wb 1/1 Running 0 78s
prometheus-kube-prometheus-stack-prometheus-0 2/2 Running 0 72s
# List services in the "monitoring" namespace
kubectl get svc -n monitoring
# Shell output:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 88s
kube-prometheus-stack-alertmanager ClusterIP 10.43.104.170 <none> 9093/TCP,8080/TCP 94s
kube-prometheus-stack-grafana ClusterIP 10.43.100.62 <none> 80/TCP 94s
kube-prometheus-stack-kube-state-metrics ClusterIP 10.43.61.118 <none> 8080/TCP 94s
kube-prometheus-stack-operator ClusterIP 10.43.82.86 <none> 443/TCP 94s
kube-prometheus-stack-prometheus ClusterIP 10.43.33.0 <none> 9090/TCP,8080/TCP 94s
kube-prometheus-stack-prometheus-node-exporter ClusterIP 10.43.234.243 <none> 9100/TCP 94s
prometheus-operated ClusterIP None <none> 9090/TCP 88s
Access Grafana #
Port Forwarding #
# Create a port forwarding for the Grafana service
kubectl port-forward --address 0.0.0.0 -n monitoring svc/kube-prometheus-stack-grafana 3000:80
# Access the Grafana webinterface
http://192.168.30.10:3000
# Default user:
admin
# Retrieve admin password
kubectl get secret -n monitoring kube-prometheus-stack-grafana -o jsonpath="{.data.admin-password}" | base64 --decode
# Shell output:
prom-operator
Treafik Ingress #
TLS Certificate #
In this setup I’m using a Let’s Encrypt wildcard certificate.
# Create a Kubernetes secret for the TLS certificate
kubectl create secret tls grafana-tls --cert=./fullchain.pem --key=./privkey.pem
# Shell output:
secret/grafana-tls created
# Verify the secret
kubectl get secrets
# Shell output:
grafana-tls kubernetes.io/tls 2 11s
...
# List secret details
kubectl describe secret grafana-tls
# Shell output:
...
Data
====
tls.crt: 3578 bytes
tls.key: 1708 bytes
Deploy Ingress #
# Create a manifest for the ingress
vi grafana-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: grafana-ui
namespace: monitoring
annotations:
traefik.ingress.kubernetes.io/router.entrypoints: websecure
spec:
ingressClassName: traefik
tls:
- hosts:
- "grafana.jklug.work"
secretName: grafana-tls
rules:
- host: "grafana.jklug.work"
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: kube-prometheus-stack-grafana
port:
number: 80
# Deploy the ingress resource
kubectl apply -f grafana-ingress.yaml
# Verify the ingress resource
kubectl get ingress -n monitoring
# Shell output:
NAME CLASS HOSTS ADDRESS PORTS AGE
grafana-ui traefik grafana.jklug.work 192.168.30.10,192.168.30.11,192.168.30.12,192.168.30.13 80, 443 11s
DNS Entry #
# Create a DNS entry for the Postgress Operator UI
192.168.30.10 grafana.jklug.work
Access the Grafana Webinterface #
# Open the Postgres Operator UI
https://grafana.jklug.work/
# Default user:
admin
# Retrieve admin password
kubectl get secret -n monitoring kube-prometheus-stack-grafana -o jsonpath="{.data.admin-password}" | base64 --decode
# Shell output:
prom-operator
Kuberhealthy Installation #
Create Namespace #
# Create the Kuberhealthy namespace
kubectl create ns kuberhealthy
Add Helm Chart #
# Add the Kuberhealthy Helm chart
helm repo add kuberhealthy https://kuberhealthy.github.io/kuberhealthy/helm-repos &&
helm repo update
Adopt Helm Chart Values #
Optional, save and adopt the Helm chart values:
# Save the Kuberhealthy Helm chart values into a file
helm show values kuberhealthy/kuberhealthy > kuberhealthy-values.yaml
Alternative, create a new file for the Kuberhealthy values:
# Create a manifest for the values
vi kuberhealthy-values.yaml
# kuberhealthy-values.yaml
prometheus:
enabled: true
serviceMonitor:
enabled: true
release: kube-prometheus-stack
namespace: monitoring
endpoints:
# https://github.com/kuberhealthy/kuberhealthy/issues/726
bearerTokenFile: ''
prometheusRule:
enabled: true
release: kube-prometheus-stack
namespace: monitoring
check:
daemonset:
enabled: false
deployment:
enabled: false
dnsInternal:
enabled: false
-
namespace: monitoring
Define the Prometheus namespace -
release: kube-prometheus-stack
Define the release name that was used to install Prometheus:helm upgrade --install kube-prometheus-stack prometheus-community/kube-prometheus-stack
Install Kuberhealthy #
Note: It was necessary to define the Kuberhealthy image tag, otherwise the image was not found and the pods got stuck with the status “ImagePullBackOff”.
I got the correct container image by manually deploying Kuberhealthy with kubectl apply -f https://raw.githubusercontent.com/kuberhealthy/kuberhealthy/master/deploy/kuberhealthy-prometheus.yaml
and then from the pod details.
# Install Kuberhealthy
helm install kuberhealthy kuberhealthy/kuberhealthy \
--values kuberhealthy-values.yaml \
--set image.tag=v2.8.0-rc2 \
--namespace kuberhealthy
# Shell output:
NAME: kuberhealthy
LAST DEPLOYED: Mon Oct 28 11:08:09 2024
NAMESPACE: kuberhealthy
STATUS: deployed
REVISION: 1
TEST SUITE: None
# Delete Kuberhealthy
helm delete kuberhealthy \
--namespace kuberhealthy
Verify Deployment Resources #
# List pods in the "kuberhealthy" namespace
kubectl get pods -n kuberhealthy
# Shell output:
NAME READY STATUS RESTARTS AGE
kuberhealthy-7475db986d-cxwlv 1/1 Running 0 17s
kuberhealthy-7475db986d-swbdk 1/1 Running 0 17s
# List services in the "kuberhealthy" namespace
kubectl get svc -n kuberhealthy
# Shell output:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kuberhealthy ClusterIP 10.43.228.204 <none> 80/TCP 34s
Test Kuberhealthy Metrics #
# Create a port forwarding for the Kuberhealthy service
kubectl port-forward -n kuberhealthy svc/kuberhealthy 8080:80
# JSON status of the Kuberhealthy cluster (Use new shell)
curl localhost:8080 | jq .
# Shell output:
{
"OK": true,
"Errors": [],
"CheckDetails": {},
"JobDetails": {},
"CurrentMaster": "kuberhealthy-7475db986d-cxwlv",
"Metadata": {}
}
# Verify "kuberhealthy_cluster_state" and "kuberhealthy_running"
curl 'localhost:8080/metrics'
# Shell output:
# HELP kuberhealthy_running Shows if kuberhealthy is running error free
# TYPE kuberhealthy_running gauge
kuberhealthy_running{current_master="kuberhealthy-7475db986d-cxwlv"} 1
# HELP kuberhealthy_cluster_state Shows the status of the cluster
# TYPE kuberhealthy_cluster_state gauge
kuberhealthy_cluster_state 1
# HELP kuberhealthy_check Shows the status of a Kuberhealthy check
# TYPE kuberhealthy_check gauge
# HELP kuberhealthy_check_duration_seconds Shows the check run duration of a Kuberhealthy check
# TYPE kuberhealthy_check_duration_seconds gauge
# HELP kuberhealthy_job Shows the status of a Kuberhealthy job
# TYPE kuberhealthy_job gauge
# HELP kuberhealthy_job_duration_seconds Shows the job run duration of a Kuberhealthy job
# TYPE kuberhealthy_job_duration_seconds gauge
Configure Kuberhealthy Check #
Example Check: Ping #
Deploy Example Check #
# Create a manifest for the example check
vi example-ping-check.yaml
apiVersion: comcast.github.io/v1
kind: KuberhealthyCheck
metadata:
name: ping-check
namespace: kuberhealthy
spec:
runInterval: 30m
timeout: 10m
podSpec:
containers:
- env:
- name: CONNECTION_TIMEOUT
value: "10s"
- name: CONNECTION_TARGET
value: "tcp://google.com:443"
image: kuberhealthy/network-connection-check:v0.2.0
name: main
# Deploy the example check
kubectl create -f example-ping-check.yaml
# Shell output:
kuberhealthycheck.comcast.github.io/ping-check created
Verify the Example Check #
# List pods in the "kuberhealthy" namespace
kubectl get pods -n kuberhealthy
# Shell output:
NAME READY STATUS RESTARTS AGE
kuberhealthy-7475db986d-cxwlv 1/1 Running 0 13m
kuberhealthy-7475db986d-swbdk 1/1 Running 0 13m
ping-check-1730114447 0/1 Completed 0 56s
# List logs of the ping-check pod
kubectl logs -n kuberhealthy ping-check-1730114447
# Shell output:
time="2024-10-28T11:20:56Z" level=info msg="Found instance namespace: kuberhealthy"
time="2024-10-28T11:20:56Z" level=info msg="Kuberhealthy is located in the kuberhealthy namespace."
time="2024-10-28T11:20:56Z" level=info msg="Check time limit set to: 9m45.702084383s"
time="2024-10-28T11:20:56Z" level=info msg="CONNECTION_TARGET_UNREACHABLE could not be parsed."
time="2024-10-28T11:20:56Z" level=info msg="Running network connection checker"
time="2024-10-28T11:20:56Z" level=info msg="Successfully reported success to Kuberhealthy servers"
time="2024-10-28T11:20:56Z" level=info msg="Done running network connection check for: tcp://google.com:443"
Example Check: TLS Cert Test #
Deploy Example Check #
# Create a manifest for the example check
vi example-tls-check.yaml
apiVersion: comcast.github.io/v1
kind: KuberhealthyCheck
metadata:
name: website-ssl-expiry-30d
namespace: kuberhealthy
spec:
runInterval: 24h
timeout: 15m
podSpec:
containers:
- env:
- name: DOMAIN_NAME
value: "jklug.work"
- name: PORT
value: "443"
- name: DAYS
value: "30"
- name: INSECURE
value: "false" # Switch to 'true' if using 'unknown authority' (intranet)
image: kuberhealthy/ssl-expiry-check:v3.2.0
imagePullPolicy: IfNotPresent
name: main
# Deploy the example check
kubectl create -f example-tls-check.yaml
# Shell output:
kuberhealthycheck.comcast.github.io/website-ssl-expiry-30d created
Verify the Example Check #
# List pods in the "kuberhealthy" namespace
kubectl get pods -n kuberhealthy
# Shell output:
NAME READY STATUS RESTARTS AGE
kuberhealthy-7475db986d-cxwlv 1/1 Running 0 19m
kuberhealthy-7475db986d-swbdk 1/1 Running 0 19m
ping-check-1730114447 0/1 Completed 0 6m26s
ping-check-1730114825 0/1 Completed 0 8s
website-ssl-expiry-30d-1730114825 0/1 Completed 0 8s
# List logs of the tls-check pod
kubectl logs -n kuberhealthy website-ssl-expiry-30d-1730114825
# Shell output:
time="2024-10-28T11:27:14Z" level=info msg="Found instance namespace: kuberhealthy"
time="2024-10-28T11:27:14Z" level=info msg="Kuberhealthy is located in the kuberhealthy namespace."
time="2024-10-28T11:27:14Z" level=info msg="Check time limit set to: 14m45.024017787s"
time="2024-10-28T11:27:14Z" level=debug msg="Checking if the kuberhealthy endpoint: http://kuberhealthy.kuberhealthy.svc.cluster.local/externalCheckStatus is ready."
time="2024-10-28T11:27:14Z" level=debug msg="http://kuberhealthy.kuberhealthy.svc.cluster.local/externalCheckStatus is ready."
time="2024-10-28T11:27:14Z" level=debug msg="Kuberhealthy endpoint: http://kuberhealthy.kuberhealthy.svc.cluster.local/externalCheckStatus is ready. Proceeding to run check."
time="2024-10-28T11:27:14Z" level=info msg="Testing SSL expiration on host jklug.work over port 443"
time="2024-10-28T11:27:15Z" level=info msg="Certificate for jklug.work is valid from 2024-07-24 00:00:00 +0000 UTC until 2025-08-22 23:59:59 +0000 UTC"
time="2024-10-28T11:27:15Z" level=info msg="Certificate for domain jklug.work is currently valid and will expire in 298 days"
time="2024-10-28T11:27:15Z" level=info msg="Successfully reported success status to Kuberhealthy servers"
Verify Kuberhealthy Metrics in Grafana #
-
Go to: “Explore” > “Metrics”
-
Select “Data source”: “Prometheus”
-
Search for
kuberhealthy
Links #
# Kuberhealthy
https://github.com/kuberhealthy/kuberhealthy/blob/master/README.md
# Kuberhealthy Checks
https://github.com/kuberhealthy/kuberhealthy/blob/master/docs/CHECKS_REGISTRY.md