Kubernetes Troubleshooting: Cluster, Nodes, Pods & Kubelet

# List ControlPlane addresses and services
kubectl cluster-info 

# Shell output:
Kubernetes control plane is running at https://127.0.0.1:6443
CoreDNS is running at https://127.0.0.1:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
Metrics-server is running at https://127.0.0.1:6443/api/v1/namespaces/kube-system/services/https:metrics-server:https/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

# List further details for debugging and diagnosis
kubectl cluster-info dump

Kubernetes Nodes
#

List Nodes
#

# List Kubernetes nodes
kubectl get nodes

# List Kubernetes nodes: More details
kubectl get nodes -o wide

# Shell output:
ubuntu1   Ready    control-plane,master   42d   v1.30.5+k3s1   192.168.30.10   <none>        Ubuntu 24.04.1 LTS   6.8.0-48-generic   containerd://1.7.21-k3s2
ubuntu2   Ready    worker                 42d   v1.30.5+k3s1   192.168.30.11   <none>        Ubuntu 24.04.1 LTS   6.8.0-45-generic   containerd://1.7.21-k3s2
ubuntu3   Ready    worker                 42d   v1.30.5+k3s1   192.168.30.12   <none>        Ubuntu 24.04.1 LTS   6.8.0-45-generic   containerd://1.7.21-k3s2
ubuntu4   Ready    worker                 42d   v1.30.5+k3s1   192.168.30.13   <none>        Ubuntu 24.04.1 LTS   6.8.0-45-generic   containerd://1.7.21-k3s2

Node Status
#

Status	Explanation
Ready	The node is healthy
DiskPressure	The disk capacity is low
MemoryPressure	The node memory is low
PIDPressure	Too many processes are running on the node
NetworkUnavailable	The networking is misconfigured
SchedulingDisabled	Appears after a node cordon

Node Details
#

This are the most imported parts of the node details:

Conditions: More details of the node status
Non-terminated Pods: Resource requests and limits from the pods running in the cluster
Allocated resources: Current CPU, memory and storage consumption of the cluster

# List details of a node
kubectl describe node ubuntu1

# Shell output snippet:
...
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Tue, 12 Nov 2024 12:10:58 +0000   Mon, 30 Sep 2024 17:40:37 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Tue, 12 Nov 2024 12:10:58 +0000   Mon, 30 Sep 2024 17:40:37 +0000   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Tue, 12 Nov 2024 12:10:58 +0000   Mon, 30 Sep 2024 17:40:37 +0000   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Tue, 12 Nov 2024 12:10:58 +0000   Mon, 30 Sep 2024 17:40:38 +0000   KubeletReady                 kubelet is posting ready status

...
Non-terminated Pods:          (7 in total)
  Namespace                   Name                                       CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                       ------------  ----------  ---------------  -------------  ---
  kube-system                 coredns-7b98449c4-wmbsb                    100m (2%)     0 (0%)      70Mi (1%)        170Mi (4%)     42d
  kube-system                 csi-nfs-node-kwqxb                         30m (0%)      0 (0%)      60Mi (1%)        500Mi (12%)    41d
  kube-system                 local-path-provisioner-6795b5f9d8-vvdjl    0 (0%)        0 (0%)      0 (0%)           0 (0%)         42d
  kube-system                 metrics-server-cdcc87586-r9t67             100m (2%)     0 (0%)      70Mi (1%)        0 (0%)         42d
  kube-system                 svclb-postgres-cluster-1-2a758881-x76kf    0 (0%)        0 (0%)      0 (0%)           0 (0%)         41d
  kube-system                 svclb-traefik-29cb166f-nhrkl               0 (0%)        0 (0%)      0 (0%)           0 (0%)         42d
  kube-system                 traefik-67f6c94c47-hgbv6                   0 (0%)        0 (0%)      0 (0%)           0 (0%)         42d

...
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests    Limits
  --------           --------    ------
  cpu                230m (5%)   0 (0%)
  memory             200Mi (5%)  670Mi (17%)
  ephemeral-storage  0 (0%)      0 (0%)
  hugepages-1Gi      0 (0%)      0 (0%)
  hugepages-2Mi      0 (0%)      0 (0%)

# Optional output the node details in YAML format
kubectl get node ubuntu1 -o yaml

List CPU, Memory, Processes & Available Storage
#

# List CPU, memory & processes of node
top

# List available disk storage
df -h

Inspecting Kubernetes Components
#

Pods Details
#

# List Kubernetes system related pods
kubectl get pods -n kube-system

# List Kubernetes system related pods: List on which nodes the pods are running
kubectl get pods -n kube-system -o wide

# List pods details
kubectl describe pod pod-name -n kube-system

# Save pod details in YAML format into file
kubectl get pod pod-name -n kube-system -o yaml > pod-details.yaml

# List logs of a pod
kubectl logs pod-name -n kube-system

Pod Status Errors
#

# Example: Run a failing pod
kubectl run new-pod --image=nginx:37

# List pods
NAME                                    READY   STATUS         RESTARTS       AGE
new-pod                                 0/1     ErrImagePull   0              3s

Status Errors:

ImagePullBackOff Export YAML config, update image
CrashLoopBackOff Use the kubectl describe and kubectl log commands. Sometimes the error was also caused by cluster components.
Pending Use the kubectl describe command. Can be a scheduling issue caused by no available nodes or exceeding the resources. Check the node status and use the top command to check out the resource allocation.
Completed Use the kubectl describe command.
Error Use the kubectl describe command.

Troubleshooting Kubelet Agent
#

K8s
#

# List the Kubelet status
systemctl status kubelet

# List logs on the Kubelet service
journalctl -u kubelet.service

# Restart the Kubelet
sudo systemctl restart kubelet

K3s
#

K3s bundles the kubelet, the container runtime and other Kubernetes components into a single binary that is managed by a single systemd service called k3s:

# Check K3s status: Controller node
systemctl status k3s

# Check K3s status: Worker node
systemctl status k3s-agent

Kubernetes-Components - This article is part of a series.

Part 1: Kubernetes Configuration Manifests: Create Manifests from RAW Output and Dry-Run Command

Part 2: Kubernetes Non-Disruptive & Disruptive Configuration Updates: Kubectl Apply, Edit, Patch & Replace; Update Rollouts and Rollbacks with Set Image Command

Part 3: Kubernetes Pods: Create Pods with Run-Command and YAML Configuration; Single & Multi Container Pods, Port-Forwarding, Find Container on Worker Node

Part 4: Kubernetes Pods: Init & Sidecar Container Overview, Init Container Examples

Part 6: Kubernetes Services: Example ClusterIP, NodePort & LoadBalancer Services with Expose-Command and YAML Configuration; Service for External Endpoint

Part 8: Kubernetes Monitoring & Logs: Monitor Applications with top, Monitor Events Pod specific and Cluster wide, Container STDOUT and STDERR Logs

Part 9: Kubernetes Security: Immutable Deployment - Deploy Container with ReadOnly-Filesystem and Writable-Volume

Part 10: Kubernetes Security: Pod Security Admission (PSA) - Overview, Enforce Pod Security Standard at a Namespace; Example Nginx Pod SecurityContext for Restricted PSS

Part 12: Kubernetes Secrets: Opaque Secret Configuration, Pod Examples with Environment Variable Secrets and Volume Secrets; SSH Authentication Secret with Pod Example

Part 13: Kubernetes ConfigMaps: Mount ConfigMap to Pod as Volume, Mount ConfigMap as Environment Variable

Part 14: Kubernetes Sets - ReplicaSets & DaemonSets: Overview, Example ReplicaSet, Example DaemonSet with and without NodeSelector / Node Labeling

Part 15: Kubernetes Sets - StatefulSets: Difference between StatefulSet & ReplcaSet; StatefulSet Example with VolumeClaimTemplate and Stateless Service

Part 17: Kubernetes Configuration Management - Helm Charts: Create a Custom Helm Chart

Part 18: Kubernetes Configuration Management - Kustomize: Kustomize Example

Part 19: Kubernetes Networking - Network Policies: Ingress and Egress Policy Examples

Part 20: Kubernetes Networking - Liveness, Readiness & Startup Probe Examples: Liveness Probe (TCP), Readiness Probes (TCP / Single Pod and Multi Pod Dependency), Startup Probe (Command, TCP & HTTP)

Part 21: Kubernetes Networking - DNS: CoreDNS custom Hosts entry (K3s & K8s Version), Deployment with custom Hosts entry; Backup and Restore the CoreDNS ConfigMap; DNS Troubleshooting

Part 22: Kubernetes Horizontal Pod Autoscaling: Install Kubernetes Metrics Server, Example Deployment with Horizontal Pod Autoscaler (HPA)

Part 23: Kubernetes Jobs: Jobs Overview, Basic Non-parallel & Parallel Job Examples; CronJob & RBAC Example that Restarts a Deployment

Part 24: Kubernetes Role Based Access Control (RBAC): RBAC Overview, Create Service Account, Example Role and RoleBinding

Part 25: Kubernetes Kubeconfig: Create example Kubeconfig with new (RBAC) Service Account and ClusterRole / ClusterRole Binding

Part 26: Kubernetes Etcd Snapshot: Etcd Snapshot and Restore with Etcdctl, Verify Etcd Member Health; Etcdctl Commands

Part 27: This Article

Kubernetes Troubleshooting #

Inspecting the Cluster #

Kubernetes Nodes #

List Nodes #

Node Status #

Node Details #

List CPU, Memory, Processes & Available Storage #

Inspecting Kubernetes Components #

Pods Details #

Pod Status Errors #

Troubleshooting Kubelet Agent #

K8s #

K3s #

Kubernetes Troubleshooting
#

Inspecting the Cluster
#