Kubernetes troubleshooting guides
Diagnose Kubernetes pod and workload failures — CrashLoopBackOff, OOMKilled, ImagePullBackOff and friends — with the change that caused them.
New here? Start with the field guide: Debugging Kubernetes pod failures →
See how Intellira automates this: Kubernetes root cause analysis →
ContainerCreating
A Pod stuck in ContainerCreating is waiting on a volume, image pull, secret, or network attachment. Here is how to find which one and fix it.
CrashLoopBackOff
A pod in CrashLoopBackOff keeps crashing and restarting. Read the real reason — app error, failed probe, init container, or exit 0 — and fix it.
CreateContainerConfigError
Why a Pod is stuck in CreateContainerConfigError and how to fix the missing ConfigMap, Secret, or key reference behind it.
DNSResolution
Pods cannot resolve Service or external names. Six causes diagnosed and fixed: CoreDNS down, dnsPolicy, ndots, NetworkPolicy, resolv.conf loops, name forms.
ImagePullBackOff
ImagePullBackOff means the kubelet cannot pull the image. Read the exact pull error and fix it: wrong tag, auth, rate limit, wrong arch, or pull policy.
NetworkPolicyBlocked
Connections time out or are refused because a NetworkPolicy denies them. How to confirm isolation, find the missing allow rule, and unblock DNS.
NodeNotReady
A NotReady node has stopped reporting healthy to the control plane. Its pods get evicted and rescheduled. Here is how to find why the kubelet went unhealthy.
OOMKilled
OOMKilled (exit code 137) means a container exceeded its memory limit and the kernel killed it. Here is how to confirm it and fix the real cause.
PodPending
A pod stuck in Pending was not scheduled. The FailedScheduling event says why — resources, taints, affinity, topology spread, or an unbound volume.
PVCPending
A PVC stuck Pending means no volume bound it: missing StorageClass, WaitForFirstConsumer, no provisioner, no matching PV, or access-mode mismatch.
ServiceNoEndpoints
A Service with no endpoints returns connection refused or timeouts. Causes: selector mismatch, unready pods, port mismatch, or no running pods.