Skip to main content

Kubernetes Troubleshooting: Cluster, Nodes, Pods & Kubelet

765 words·
Kubernetes Kubectl
Table of Contents
Kubernetes-Components - This article is part of a series.
Part 27: This Article

Kubernetes Troubleshooting
#

Inspecting the Cluster
#

# List ControlPlane addresses and services
kubectl cluster-info 

# Shell output:
Kubernetes control plane is running at https://127.0.0.1:6443
CoreDNS is running at https://127.0.0.1:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
Metrics-server is running at https://127.0.0.1:6443/api/v1/namespaces/kube-system/services/https:metrics-server:https/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
# List further details for debugging and diagnosis
kubectl cluster-info dump

Kubernetes Nodes
#

List Nodes
#

# List Kubernetes nodes
kubectl get nodes

# List Kubernetes nodes: More details
kubectl get nodes -o wide
# Shell output:
ubuntu1   Ready    control-plane,master   42d   v1.30.5+k3s1   192.168.30.10   <none>        Ubuntu 24.04.1 LTS   6.8.0-48-generic   containerd://1.7.21-k3s2
ubuntu2   Ready    worker                 42d   v1.30.5+k3s1   192.168.30.11   <none>        Ubuntu 24.04.1 LTS   6.8.0-45-generic   containerd://1.7.21-k3s2
ubuntu3   Ready    worker                 42d   v1.30.5+k3s1   192.168.30.12   <none>        Ubuntu 24.04.1 LTS   6.8.0-45-generic   containerd://1.7.21-k3s2
ubuntu4   Ready    worker                 42d   v1.30.5+k3s1   192.168.30.13   <none>        Ubuntu 24.04.1 LTS   6.8.0-45-generic   containerd://1.7.21-k3s2

Node Status
#

Status Explanation
Ready The node is healthy
DiskPressure The disk capacity is low
MemoryPressure The node memory is low
PIDPressure Too many processes are running on the node
NetworkUnavailable The networking is misconfigured
SchedulingDisabled Appears after a node cordon

Node Details
#

This are the most imported parts of the node details:

  • Conditions: More details of the node status

  • Non-terminated Pods: Resource requests and limits from the pods running in the cluster

  • Allocated resources: Current CPU, memory and storage consumption of the cluster

# List details of a node
kubectl describe node ubuntu1

# Shell output snippet:
...
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Tue, 12 Nov 2024 12:10:58 +0000   Mon, 30 Sep 2024 17:40:37 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Tue, 12 Nov 2024 12:10:58 +0000   Mon, 30 Sep 2024 17:40:37 +0000   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Tue, 12 Nov 2024 12:10:58 +0000   Mon, 30 Sep 2024 17:40:37 +0000   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Tue, 12 Nov 2024 12:10:58 +0000   Mon, 30 Sep 2024 17:40:38 +0000   KubeletReady                 kubelet is posting ready status

...
Non-terminated Pods:          (7 in total)
  Namespace                   Name                                       CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                       ------------  ----------  ---------------  -------------  ---
  kube-system                 coredns-7b98449c4-wmbsb                    100m (2%)     0 (0%)      70Mi (1%)        170Mi (4%)     42d
  kube-system                 csi-nfs-node-kwqxb                         30m (0%)      0 (0%)      60Mi (1%)        500Mi (12%)    41d
  kube-system                 local-path-provisioner-6795b5f9d8-vvdjl    0 (0%)        0 (0%)      0 (0%)           0 (0%)         42d
  kube-system                 metrics-server-cdcc87586-r9t67             100m (2%)     0 (0%)      70Mi (1%)        0 (0%)         42d
  kube-system                 svclb-postgres-cluster-1-2a758881-x76kf    0 (0%)        0 (0%)      0 (0%)           0 (0%)         41d
  kube-system                 svclb-traefik-29cb166f-nhrkl               0 (0%)        0 (0%)      0 (0%)           0 (0%)         42d
  kube-system                 traefik-67f6c94c47-hgbv6                   0 (0%)        0 (0%)      0 (0%)           0 (0%)         42d

...
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests    Limits
  --------           --------    ------
  cpu                230m (5%)   0 (0%)
  memory             200Mi (5%)  670Mi (17%)
  ephemeral-storage  0 (0%)      0 (0%)
  hugepages-1Gi      0 (0%)      0 (0%)
  hugepages-2Mi      0 (0%)      0 (0%)
# Optional output the node details in YAML format
kubectl get node ubuntu1 -o yaml

List CPU, Memory, Processes & Available Storage
#

# List CPU, memory & processes of node
top

# List available disk storage
df -h

Inspecting Kubernetes Components
#

Pods Details
#

# List Kubernetes system related pods
kubectl get pods -n kube-system

# List Kubernetes system related pods: List on which nodes the pods are running
kubectl get pods -n kube-system -o wide
# List pods details
kubectl describe pod pod-name -n kube-system

# Save pod details in YAML format into file
kubectl get pod pod-name -n kube-system -o yaml > pod-details.yaml
# List logs of a pod
kubectl logs pod-name -n kube-system

Pod Status Errors
#

# Example: Run a failing pod
kubectl run new-pod --image=nginx:37

# List pods
NAME                                    READY   STATUS         RESTARTS       AGE
new-pod                                 0/1     ErrImagePull   0              3s

Status Errors:

  • ImagePullBackOff Export YAML config, update image

  • CrashLoopBackOff Use the kubectl describe and kubectl log commands. Sometimes the error was also caused by cluster components.

  • Pending Use the kubectl describe command. Can be a scheduling issue caused by no available nodes or exceeding the resources. Check the node status and use the top command to check out the resource allocation.

  • Completed Use the kubectl describe command.

  • Error Use the kubectl describe command.


Troubleshooting Kubelet Agent
#

K8s
#

# List the Kubelet status
systemctl status kubelet
# List logs on the Kubelet service
journalctl -u kubelet.service
# Restart the Kubelet
sudo systemctl restart kubelet

K3s
#

K3s bundles the kubelet, the container runtime and other Kubernetes components into a single binary that is managed by a single systemd service called k3s:

# Check K3s status: Controller node
systemctl status k3s

# Check K3s status: Worker node
systemctl status k3s-agent
Kubernetes-Components - This article is part of a series.
Part 27: This Article