Skip to main content

Kubernetes Kuberhealthy - Kuberhealthy with Kube-Prometheus-Stack, Example Health Checks

1466 words·
Kubernetes Kuberhealthy Prometheus Grafana Kube-Prometheus-Stack Kubernetes-Operator Helm
Table of Contents

Overview
#

Kubernetes Setup
#

In this tutorial I’m using the following K3s Kubernetes cluster:

NAME      STATUS   ROLES                  AGE   VERSION        INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
ubuntu1   Ready    control-plane,master   20d   v1.30.5+k3s1   192.168.30.10   <none>        Ubuntu 24.04.1 LTS   6.8.0-45-generic   containerd://1.7.21-k3s2
ubuntu2   Ready    worker                 20d   v1.30.5+k3s1   192.168.30.11   <none>        Ubuntu 24.04.1 LTS   6.8.0-45-generic   containerd://1.7.21-k3s2
ubuntu3   Ready    worker                 20d   v1.30.5+k3s1   192.168.30.12   <none>        Ubuntu 24.04.1 LTS   6.8.0-45-generic   containerd://1.7.21-k3s2
ubuntu4   Ready    worker                 20d   v1.30.5+k3s1   192.168.30.13   <none>        Ubuntu 24.04.1 LTS   6.8.0-45-generic   containerd://1.7.21-k3s2



Kube-Prometheus-Stack Installation
#

Add Helm Repository
#

# Add Helm repository
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts &&
helm repo update

Install Kube Prometheus Stack
#

# Install Kube Prometheus Stack
helm upgrade --install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
 --namespace monitoring \
 --create-namespace
# Shell output:
NAME: kube-prometheus-stack
LAST DEPLOYED: Mon Oct 28 10:07:25 2024
NAMESPACE: monitoring
STATUS: deployed
REVISION: 1
NOTES:
kube-prometheus-stack has been installed. Check its status by running:
  kubectl --namespace monitoring get pods -l "release=kube-prometheus-stack"

Visit https://github.com/prometheus-operator/kube-prometheus for instructions on how to create & configure Alertmanager and Prometheus instances using the Operator.

Verify Deployment Resources
#

# List pods in the "monitoring" namespace
kubectl get pods -n monitoring

# Shell output:
NAME                                                        READY   STATUS    RESTARTS   AGE
alertmanager-kube-prometheus-stack-alertmanager-0           2/2     Running   0          72s
kube-prometheus-stack-grafana-6fb84bdc8c-ssslc              3/3     Running   0          78s
kube-prometheus-stack-kube-state-metrics-76bf68bd74-f9zv5   1/1     Running   0          78s
kube-prometheus-stack-operator-9c988c48b-jpksp              1/1     Running   0          78s
kube-prometheus-stack-prometheus-node-exporter-4rkvn        1/1     Running   0          78s
kube-prometheus-stack-prometheus-node-exporter-mv9p9        1/1     Running   0          78s
kube-prometheus-stack-prometheus-node-exporter-z77zm        1/1     Running   0          78s
kube-prometheus-stack-prometheus-node-exporter-zr2wb        1/1     Running   0          78s
prometheus-kube-prometheus-stack-prometheus-0               2/2     Running   0          72s
# List services in the "monitoring" namespace
kubectl get svc -n monitoring

# Shell output:
NAME                                             TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
alertmanager-operated                            ClusterIP   None            <none>        9093/TCP,9094/TCP,9094/UDP   88s
kube-prometheus-stack-alertmanager               ClusterIP   10.43.104.170   <none>        9093/TCP,8080/TCP            94s
kube-prometheus-stack-grafana                    ClusterIP   10.43.100.62    <none>        80/TCP                       94s
kube-prometheus-stack-kube-state-metrics         ClusterIP   10.43.61.118    <none>        8080/TCP                     94s
kube-prometheus-stack-operator                   ClusterIP   10.43.82.86     <none>        443/TCP                      94s
kube-prometheus-stack-prometheus                 ClusterIP   10.43.33.0      <none>        9090/TCP,8080/TCP            94s
kube-prometheus-stack-prometheus-node-exporter   ClusterIP   10.43.234.243   <none>        9100/TCP                     94s
prometheus-operated                              ClusterIP   None            <none>        9090/TCP                     88s

Access Grafana
#

Port Forwarding
#

# Create a port forwarding for the Grafana service
kubectl port-forward --address 0.0.0.0 -n monitoring svc/kube-prometheus-stack-grafana 3000:80

# Access the Grafana webinterface
http://192.168.30.10:3000
# Default user:
admin

# Retrieve admin password
kubectl get secret -n monitoring kube-prometheus-stack-grafana -o jsonpath="{.data.admin-password}" | base64 --decode

# Shell output:
prom-operator

Treafik Ingress
#

TLS Certificate
#

In this setup I’m using a Let’s Encrypt wildcard certificate.

# Create a Kubernetes secret for the TLS certificate
kubectl create secret tls grafana-tls --cert=./fullchain.pem --key=./privkey.pem

# Shell output:
secret/grafana-tls created
# Verify the secret
kubectl get secrets

# Shell output:
grafana-tls                                                        kubernetes.io/tls    2      11s
...
# List secret details
kubectl describe secret grafana-tls

# Shell output:
...
Data
====
tls.crt:  3578 bytes
tls.key:  1708 bytes

Deploy Ingress
#

# Create a manifest for the ingress
vi grafana-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: grafana-ui
  namespace: monitoring
  annotations:
    traefik.ingress.kubernetes.io/router.entrypoints: websecure
spec:
  ingressClassName: traefik
  tls:
  - hosts:
    - "grafana.jklug.work"
    secretName: grafana-tls
  rules:
  - host: "grafana.jklug.work"
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: kube-prometheus-stack-grafana
            port:
              number: 80
# Deploy the ingress resource
kubectl apply -f grafana-ingress.yaml
# Verify the ingress resource
kubectl get ingress -n monitoring

# Shell output:
NAME         CLASS     HOSTS                ADDRESS                                                   PORTS     AGE
grafana-ui   traefik   grafana.jklug.work   192.168.30.10,192.168.30.11,192.168.30.12,192.168.30.13   80, 443   11s

DNS Entry
#

# Create a DNS entry for the Postgress Operator UI
192.168.30.10 grafana.jklug.work

Access the Grafana Webinterface
#

# Open the Postgres Operator UI
https://grafana.jklug.work/
# Default user:
admin

# Retrieve admin password
kubectl get secret -n monitoring kube-prometheus-stack-grafana -o jsonpath="{.data.admin-password}" | base64 --decode

# Shell output:
prom-operator



Kuberhealthy Installation
#

Create Namespace
#

# Create the Kuberhealthy namespace
kubectl create ns kuberhealthy

Add Helm Chart
#

# Add the Kuberhealthy Helm chart
helm repo add kuberhealthy https://kuberhealthy.github.io/kuberhealthy/helm-repos &&
helm repo update

Adopt Helm Chart Values
#

Optional, save and adopt the Helm chart values:

# Save the Kuberhealthy Helm chart values into a file
helm show values kuberhealthy/kuberhealthy > kuberhealthy-values.yaml

Alternative, create a new file for the Kuberhealthy values:

# Create a manifest for the values
vi kuberhealthy-values.yaml
# kuberhealthy-values.yaml
prometheus:
  enabled: true

  serviceMonitor:
    enabled: true
    release: kube-prometheus-stack
    namespace: monitoring
    endpoints:
      # https://github.com/kuberhealthy/kuberhealthy/issues/726
      bearerTokenFile: ''
  prometheusRule:
    enabled: true
    release: kube-prometheus-stack
    namespace: monitoring

check:
  daemonset:
    enabled: false
  deployment:
    enabled: false
  dnsInternal:
    enabled: false
  • namespace: monitoring Define the Prometheus namespace

  • release: kube-prometheus-stack Define the release name that was used to install Prometheus: helm upgrade --install kube-prometheus-stack prometheus-community/kube-prometheus-stack


Install Kuberhealthy
#

Note: It was necessary to define the Kuberhealthy image tag, otherwise the image was not found and the pods got stuck with the status “ImagePullBackOff”. I got the correct container image by manually deploying Kuberhealthy with kubectl apply -f https://raw.githubusercontent.com/kuberhealthy/kuberhealthy/master/deploy/kuberhealthy-prometheus.yaml and then from the pod details.

# Install Kuberhealthy
helm install kuberhealthy kuberhealthy/kuberhealthy \
 --values kuberhealthy-values.yaml \
 --set image.tag=v2.8.0-rc2 \
 --namespace kuberhealthy


# Shell output:
NAME: kuberhealthy
LAST DEPLOYED: Mon Oct 28 11:08:09 2024
NAMESPACE: kuberhealthy
STATUS: deployed
REVISION: 1
TEST SUITE: None
# Delete Kuberhealthy
helm delete kuberhealthy \
 --namespace kuberhealthy

Verify Deployment Resources
#

# List pods in the "kuberhealthy" namespace
kubectl get pods -n kuberhealthy

# Shell output:
NAME                            READY   STATUS    RESTARTS   AGE
kuberhealthy-7475db986d-cxwlv   1/1     Running   0          17s
kuberhealthy-7475db986d-swbdk   1/1     Running   0          17s
# List services in the "kuberhealthy" namespace
kubectl get svc -n kuberhealthy

# Shell output:
NAME           TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
kuberhealthy   ClusterIP   10.43.228.204   <none>        80/TCP    34s

Test Kuberhealthy Metrics
#

# Create a port forwarding for the Kuberhealthy service
kubectl port-forward -n kuberhealthy svc/kuberhealthy 8080:80
# JSON status of the Kuberhealthy cluster (Use new shell)
curl localhost:8080 | jq .

# Shell output:
{
  "OK": true,
  "Errors": [],
  "CheckDetails": {},
  "JobDetails": {},
  "CurrentMaster": "kuberhealthy-7475db986d-cxwlv",
  "Metadata": {}
}
# Verify "kuberhealthy_cluster_state" and "kuberhealthy_running"
curl 'localhost:8080/metrics'

# Shell output:
# HELP kuberhealthy_running Shows if kuberhealthy is running error free
# TYPE kuberhealthy_running gauge
kuberhealthy_running{current_master="kuberhealthy-7475db986d-cxwlv"} 1
# HELP kuberhealthy_cluster_state Shows the status of the cluster
# TYPE kuberhealthy_cluster_state gauge
kuberhealthy_cluster_state 1
# HELP kuberhealthy_check Shows the status of a Kuberhealthy check
# TYPE kuberhealthy_check gauge
# HELP kuberhealthy_check_duration_seconds Shows the check run duration of a Kuberhealthy check
# TYPE kuberhealthy_check_duration_seconds gauge
# HELP kuberhealthy_job Shows the status of a Kuberhealthy job
# TYPE kuberhealthy_job gauge
# HELP kuberhealthy_job_duration_seconds Shows the job run duration of a Kuberhealthy job
# TYPE kuberhealthy_job_duration_seconds gauge



Configure Kuberhealthy Check
#

Example Check: Ping
#

Deploy Example Check
#

# Create a manifest for the example check
vi example-ping-check.yaml
apiVersion: comcast.github.io/v1
kind: KuberhealthyCheck
metadata:
  name: ping-check
  namespace: kuberhealthy
spec:
  runInterval: 30m
  timeout: 10m
  podSpec:
    containers:
      - env:
          - name: CONNECTION_TIMEOUT
            value: "10s"
          - name: CONNECTION_TARGET
            value: "tcp://google.com:443"
        image: kuberhealthy/network-connection-check:v0.2.0
        name: main
# Deploy the example check
kubectl create -f example-ping-check.yaml

# Shell output:
kuberhealthycheck.comcast.github.io/ping-check created

Verify the Example Check
#

# List pods in the "kuberhealthy" namespace
kubectl get pods -n kuberhealthy

# Shell output:
NAME                            READY   STATUS      RESTARTS   AGE
kuberhealthy-7475db986d-cxwlv   1/1     Running     0          13m
kuberhealthy-7475db986d-swbdk   1/1     Running     0          13m
ping-check-1730114447           0/1     Completed   0          56s
# List logs of the ping-check pod
kubectl logs -n kuberhealthy ping-check-1730114447

# Shell output:
time="2024-10-28T11:20:56Z" level=info msg="Found instance namespace: kuberhealthy"
time="2024-10-28T11:20:56Z" level=info msg="Kuberhealthy is located in the kuberhealthy namespace."
time="2024-10-28T11:20:56Z" level=info msg="Check time limit set to: 9m45.702084383s"
time="2024-10-28T11:20:56Z" level=info msg="CONNECTION_TARGET_UNREACHABLE could not be parsed."
time="2024-10-28T11:20:56Z" level=info msg="Running network connection checker"
time="2024-10-28T11:20:56Z" level=info msg="Successfully reported success to Kuberhealthy servers"
time="2024-10-28T11:20:56Z" level=info msg="Done running network connection check for: tcp://google.com:443"

Example Check: TLS Cert Test
#

Deploy Example Check
#

# Create a manifest for the example check
vi example-tls-check.yaml
apiVersion: comcast.github.io/v1
kind: KuberhealthyCheck
metadata:
  name: website-ssl-expiry-30d
  namespace: kuberhealthy
spec:
  runInterval: 24h
  timeout: 15m
  podSpec:
    containers:
      - env:
          - name: DOMAIN_NAME
            value: "jklug.work"
          - name: PORT
            value: "443"
          - name: DAYS
            value: "30"
          - name: INSECURE
            value: "false"  # Switch to 'true' if using 'unknown authority' (intranet)
        image: kuberhealthy/ssl-expiry-check:v3.2.0
        imagePullPolicy: IfNotPresent
        name: main
# Deploy the example check
kubectl create -f example-tls-check.yaml

# Shell output:
kuberhealthycheck.comcast.github.io/website-ssl-expiry-30d created

Verify the Example Check
#

# List pods in the "kuberhealthy" namespace
kubectl get pods -n kuberhealthy

# Shell output:
NAME                                READY   STATUS      RESTARTS   AGE
kuberhealthy-7475db986d-cxwlv       1/1     Running     0          19m
kuberhealthy-7475db986d-swbdk       1/1     Running     0          19m
ping-check-1730114447               0/1     Completed   0          6m26s
ping-check-1730114825               0/1     Completed   0          8s
website-ssl-expiry-30d-1730114825   0/1     Completed   0          8s
# List logs of the tls-check pod
kubectl logs -n kuberhealthy website-ssl-expiry-30d-1730114825

# Shell output:
time="2024-10-28T11:27:14Z" level=info msg="Found instance namespace: kuberhealthy"
time="2024-10-28T11:27:14Z" level=info msg="Kuberhealthy is located in the kuberhealthy namespace."
time="2024-10-28T11:27:14Z" level=info msg="Check time limit set to: 14m45.024017787s"
time="2024-10-28T11:27:14Z" level=debug msg="Checking if the kuberhealthy endpoint: http://kuberhealthy.kuberhealthy.svc.cluster.local/externalCheckStatus is ready."
time="2024-10-28T11:27:14Z" level=debug msg="http://kuberhealthy.kuberhealthy.svc.cluster.local/externalCheckStatus is ready."
time="2024-10-28T11:27:14Z" level=debug msg="Kuberhealthy endpoint: http://kuberhealthy.kuberhealthy.svc.cluster.local/externalCheckStatus is ready. Proceeding to run check."
time="2024-10-28T11:27:14Z" level=info msg="Testing SSL expiration on host jklug.work over port 443"
time="2024-10-28T11:27:15Z" level=info msg="Certificate for jklug.work is valid from 2024-07-24 00:00:00 +0000 UTC until 2025-08-22 23:59:59 +0000 UTC"
time="2024-10-28T11:27:15Z" level=info msg="Certificate for domain jklug.work is currently valid and will expire in 298 days"
time="2024-10-28T11:27:15Z" level=info msg="Successfully reported success status to Kuberhealthy servers"



Verify Kuberhealthy Metrics in Grafana
#

  • Go to: “Explore” > “Metrics”

  • Select “Data source”: “Prometheus”

  • Search for kuberhealthy



Links #

# Kuberhealthy
https://github.com/kuberhealthy/kuberhealthy/blob/master/README.md

# Kuberhealthy Checks
https://github.com/kuberhealthy/kuberhealthy/blob/master/docs/CHECKS_REGISTRY.md