Kubernetes Troubleshooting #
Inspecting the Cluster #
# List ControlPlane addresses and services
kubectl cluster-info
# Shell output:
Kubernetes control plane is running at https://127.0.0.1:6443
CoreDNS is running at https://127.0.0.1:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
Metrics-server is running at https://127.0.0.1:6443/api/v1/namespaces/kube-system/services/https:metrics-server:https/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
# List further details for debugging and diagnosis
kubectl cluster-info dump
Kubernetes Nodes #
List Nodes #
# List Kubernetes nodes
kubectl get nodes
# List Kubernetes nodes: More details
kubectl get nodes -o wide
# Shell output:
ubuntu1 Ready control-plane,master 42d v1.30.5+k3s1 192.168.30.10 <none> Ubuntu 24.04.1 LTS 6.8.0-48-generic containerd://1.7.21-k3s2
ubuntu2 Ready worker 42d v1.30.5+k3s1 192.168.30.11 <none> Ubuntu 24.04.1 LTS 6.8.0-45-generic containerd://1.7.21-k3s2
ubuntu3 Ready worker 42d v1.30.5+k3s1 192.168.30.12 <none> Ubuntu 24.04.1 LTS 6.8.0-45-generic containerd://1.7.21-k3s2
ubuntu4 Ready worker 42d v1.30.5+k3s1 192.168.30.13 <none> Ubuntu 24.04.1 LTS 6.8.0-45-generic containerd://1.7.21-k3s2
Node Status #
Status | Explanation |
---|---|
Ready | The node is healthy |
DiskPressure | The disk capacity is low |
MemoryPressure | The node memory is low |
PIDPressure | Too many processes are running on the node |
NetworkUnavailable | The networking is misconfigured |
SchedulingDisabled | Appears after a node cordon |
Node Details #
This are the most imported parts of the node details:
-
Conditions:
More details of the node status -
Non-terminated Pods:
Resource requests and limits from the pods running in the cluster -
Allocated resources:
Current CPU, memory and storage consumption of the cluster
# List details of a node
kubectl describe node ubuntu1
# Shell output snippet:
...
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Tue, 12 Nov 2024 12:10:58 +0000 Mon, 30 Sep 2024 17:40:37 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Tue, 12 Nov 2024 12:10:58 +0000 Mon, 30 Sep 2024 17:40:37 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Tue, 12 Nov 2024 12:10:58 +0000 Mon, 30 Sep 2024 17:40:37 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Tue, 12 Nov 2024 12:10:58 +0000 Mon, 30 Sep 2024 17:40:38 +0000 KubeletReady kubelet is posting ready status
...
Non-terminated Pods: (7 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system coredns-7b98449c4-wmbsb 100m (2%) 0 (0%) 70Mi (1%) 170Mi (4%) 42d
kube-system csi-nfs-node-kwqxb 30m (0%) 0 (0%) 60Mi (1%) 500Mi (12%) 41d
kube-system local-path-provisioner-6795b5f9d8-vvdjl 0 (0%) 0 (0%) 0 (0%) 0 (0%) 42d
kube-system metrics-server-cdcc87586-r9t67 100m (2%) 0 (0%) 70Mi (1%) 0 (0%) 42d
kube-system svclb-postgres-cluster-1-2a758881-x76kf 0 (0%) 0 (0%) 0 (0%) 0 (0%) 41d
kube-system svclb-traefik-29cb166f-nhrkl 0 (0%) 0 (0%) 0 (0%) 0 (0%) 42d
kube-system traefik-67f6c94c47-hgbv6 0 (0%) 0 (0%) 0 (0%) 0 (0%) 42d
...
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 230m (5%) 0 (0%)
memory 200Mi (5%) 670Mi (17%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
# Optional output the node details in YAML format
kubectl get node ubuntu1 -o yaml
List CPU, Memory, Processes & Available Storage #
# List CPU, memory & processes of node
top
# List available disk storage
df -h
Inspecting Kubernetes Components #
Pods Details #
# List Kubernetes system related pods
kubectl get pods -n kube-system
# List Kubernetes system related pods: List on which nodes the pods are running
kubectl get pods -n kube-system -o wide
# List pods details
kubectl describe pod pod-name -n kube-system
# Save pod details in YAML format into file
kubectl get pod pod-name -n kube-system -o yaml > pod-details.yaml
# List logs of a pod
kubectl logs pod-name -n kube-system
Pod Status Errors #
# Example: Run a failing pod
kubectl run new-pod --image=nginx:37
# List pods
NAME READY STATUS RESTARTS AGE
new-pod 0/1 ErrImagePull 0 3s
Status Errors:
-
ImagePullBackOff
Export YAML config, update image -
CrashLoopBackOff
Use thekubectl describe
andkubectl log
commands. Sometimes the error was also caused by cluster components. -
Pending
Use thekubectl describe
command. Can be a scheduling issue caused by no available nodes or exceeding the resources. Check the node status and use thetop
command to check out the resource allocation. -
Completed
Use thekubectl describe
command. -
Error
Use thekubectl describe
command.
Troubleshooting Kubelet Agent #
K8s #
# List the Kubelet status
systemctl status kubelet
# List logs on the Kubelet service
journalctl -u kubelet.service
# Restart the Kubelet
sudo systemctl restart kubelet
K3s #
K3s bundles the kubelet, the container runtime and other Kubernetes components into a single binary that is managed by a single systemd service called k3s:
# Check K3s status: Controller node
systemctl status k3s
# Check K3s status: Worker node
systemctl status k3s-agent