Skip to content
Intellira
Kuberneteshigh severity

DNSResolution

Pods cannot resolve Service or external names. Six causes diagnosed and fixed: CoreDNS down, dnsPolicy, ndots, NetworkPolicy, resolv.conf loops, name forms.

Written by Intellira Engineering, Editorial team

What a DNS resolution failure looks like

A pod cannot turn a name into an IP. nslookup inside the pod returns can't resolve, SERVFAIL, or times out, and your app logs name-lookup errors. The trap is assuming there is one cause. There are at least six independent ones, and they need different fixes: CoreDNS pods being down or crashlooping, the wrong dnsPolicy/dnsConfig, the default ndots:5 slowing or breaking external lookups, a NetworkPolicy blocking egress to kube-dns, an upstream/resolv.conf forwarding loop, and querying the wrong Service name form. Start by deciding whether the failure is cluster-wide (every pod) or scoped to one workload — that split points you at the right half of this page.

The two authoritative references are Debugging DNS Resolution and DNS for Services and Pods.

First triage: cluster-wide or one workload

Run a clean debug pod so you are not debugging through your app's own quirks. The official dnsutils pod is the canonical tool:

# In-cluster name (should return the kube-dns clusterIP -> kubernetes svc IP)
kubectl exec -i -t dnsutils -- nslookup kubernetes.default

# What the pod was actually handed
kubectl exec -ti dnsutils -- cat /etc/resolv.conf

A healthy kubernetes.default lookup returns the DNS server and the answer:

Server:    10.0.0.10
Address 1: 10.0.0.10

Name:      kubernetes.default
Address 1: 10.0.0.1

A failure prints the server but no answer (Debugging DNS Resolution):

nslookup: can't resolve 'kubernetes.default'

A correct /etc/resolv.conf for a pod in the default namespace looks like this — note the cluster search domains, the kube-dns Service IP as nameserver, and options ndots:5 (Debugging DNS Resolution):

search default.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.0.0.10
options ndots:5

If a fresh dnsutils pod also fails, the problem is cluster-wide — work causes 1, 5, and 6 below. If only your workload fails, work causes 2, 3, and 4.

Cause 1: CoreDNS pods are down or crashlooping

DNS in current clusters is served by the CoreDNS add-on; both CoreDNS and the older kube-dns use the label k8s-app=kube-dns (Debugging DNS Resolution). If the resolver itself is unhealthy, every pod loses DNS.

Diagnose:

# Are the DNS pods Running and Ready?
kubectl get pods --namespace=kube-system -l k8s-app=kube-dns

# Does the Service exist with a clusterIP?
kubectl get svc --namespace=kube-system

# Are there backing endpoints? Zero endpoints = nothing answers on :53
kubectl get endpointslice -l kubernetes.io/service-name=kube-dns --namespace=kube-system

# What is CoreDNS logging?
kubectl logs --namespace=kube-system -l k8s-app=kube-dns

A healthy startup log shows the version banner and a config hash (Debugging DNS Resolution). If the pods are CrashLoopBackOff, read the logs for the reason — a common one is the forwarding loop covered in Cause 5. If the pods are Running but the EndpointSlice is empty, the Service has no backends and no query is answered; that is the same shape as a missing-endpoints Service problem.

Fix: restore the pods (resolve the crash cause, fix resource limits, or scale the Deployment back up). Blast radius note: CoreDNS is a cluster-wide singleton path — restarting it briefly affects every pod's lookups, so expect transient SERVFAILs during a roll.

Cause 2: wrong dnsPolicy

A pod's dnsPolicy decides which resolver it even uses. The four values and their exact behavior (DNS for Services and Pods):

  • Default — the pod inherits name resolution from the node. This does NOT use cluster DNS, so in-cluster Service names will not resolve.
  • ClusterFirst — queries that do not match the cluster domain are forwarded upstream by the cluster DNS; in-cluster names resolve via CoreDNS.
  • ClusterFirstWithHostNet — required for hostNetwork pods. The docs are explicit: a hostNetwork pod left on ClusterFirst "will fallback to the behavior of the Default policy."
  • None — ignores cluster DNS entirely; you must supply everything via dnsConfig.

Critically, "Default" is not the default: if dnsPolicy is unset, the value used is ClusterFirst (DNS for Services and Pods).

Diagnose:

kubectl get pod <pod> -o jsonpath='{.spec.dnsPolicy}{"\n"}'
kubectl get pod <pod> -o jsonpath='{.spec.hostNetwork}{"\n"}'

If a pod that needs Service discovery shows Default, or a hostNetwork: true pod shows ClusterFirst, that is your bug. Fix by setting the right policy:

spec:
  hostNetwork: true
  dnsPolicy: ClusterFirstWithHostNet

Cause 3: ndots:5 breaking or slowing external lookups

This is the most misdiagnosed DNS problem, because in-cluster names work fine while external names are slow or intermittently fail. kubelet writes options ndots:5 into every pod's resolv.conf (Debugging DNS Resolution). With ndots:5, any name containing fewer than 5 dots is first tried against every search domain before the name is tried as-is. The docs spell out the default value of 5 and that "any hostname with fewer than 5 dots will trigger search domain expansion, causing extra queries" (DNS for Services and Pods).

So a lookup for api.example.com (two dots, under the threshold of five) first tries api.example.com.default.svc.cluster.local, then ...svc.cluster.local, then ...cluster.local, each returning NXDOMAIN, before finally querying the real name. That is several extra round trips per resolution, and under DNS packet loss it shows up as slow or flaky external calls.

Diagnose by enabling query logging on CoreDNS and watching the expansion. Add log to the Corefile (Debugging DNS Resolution):

kubectl -n kube-system edit configmap coredns
.:53 {
    log
    errors
    health
    kubernetes cluster.local in-addr.arpa ip6.arpa {
      pods insecure
      fallthrough in-addr.arpa ip6.arpa
    }
    forward . /etc/resolv.conf
    cache 30
    loop
    reload
    loadbalance
}

You will see the cluster-suffixed variants resolve NXDOMAIN ahead of the real query. Two fixes, with tradeoffs:

  • Per-pod dnsConfig lowering ndots (smallest blast radius — affects only this workload):
spec:
  dnsConfig:
    options:
      - name: ndots
        value: "2"
  • Or query a fully qualified name with a trailing dot (api.example.com.), which skips search expansion for that call. Tradeoff: it only fixes the call sites you edit, not the whole workload.

Do not blanket-lower ndots for everything without thought — short in-cluster names like mysvc rely on search-domain expansion to resolve, and an aggressive ndots:1 can break those. ndots:2 is the common compromise.

Cause 4: a NetworkPolicy blocking egress to kube-dns

An egress NetworkPolicy is an allow-list. The docs are explicit: "When a pod is isolated for egress, the only allowed connections from the pod are those allowed by the egress list of some NetworkPolicy that applies to the pod for egress" (Network Policies). So the moment any egress policy selects your pod, DNS is dropped unless you also allow port 53 to kube-dns. The classic symptom is a workload that resolved fine until a security policy landed, then resolves nothing.

Diagnose:

# Which egress policies select this pod?
kubectl get networkpolicy -n <namespace> -o wide

# Test whether :53 to kube-dns is even reachable
kubectl exec -ti dnsutils -- nslookup kubernetes.default

Fix by explicitly allowing egress to the kube-dns pods on both UDP and TCP (TCP matters for large or truncated answers, and NetworkPolicy supports TCP, UDP, and SCTP Network Policies):

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-dns-egress
  namespace: <namespace>
spec:
  podSelector: {}
  policyTypes:
    - Egress
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - port: 53
          protocol: UDP
        - port: 53
          protocol: TCP

Blast radius note: confirm your CNI actually enforces NetworkPolicy (some default configs do not), and confirm the kube-dns label in your cluster — a selector that matches nothing silently blocks DNS.

Cause 5: a resolv.conf / upstream forwarding loop

CoreDNS forwards non-cluster queries to whatever is in /etc/resolv.conf via forward . /etc/resolv.conf. On hosts running systemd-resolved, that file points at the loopback stub 127.0.0.53, so CoreDNS forwards to itself and the loop plugin halts the server. The CoreDNS loop plugin "detects simple forwarding loops and halts the server," logging (CoreDNS loop plugin):

plugin/loop: Loop (127.0.0.1:55953 -> :1053) detected for zone ".",
see https://coredns.io/plugins/loop#troubleshooting.

The documented cause is "CoreDNS forwarding requests directly to itself. e.g. via a loopback address such as 127.0.0.1, ::1 or 127.0.0.53," and in Kubernetes specifically "systemd-resolved will put the loopback address 127.0.0.53 as a nameserver into /etc/resolv.conf" (CoreDNS loop plugin). The Kubernetes docs echo that some distros (e.g. Ubuntu) use systemd-resolved, which can replace /etc/resolv.conf (Debugging DNS Resolution).

Diagnose: this shows up as CoreDNS CrashLoopBackOff (Cause 1) with the Loop line in kubectl logs -n kube-system -l k8s-app=kube-dns.

Fix, in order of preference (CoreDNS loop plugin):

  • Point kubelet at the real upstream file rather than the systemd stub, via the kubelet config resolvConf set to a path to the actual nameserver file (commonly /run/systemd/resolve/resolv.conf on systemd-resolved hosts).
  • Or replace forward . /etc/resolv.conf with a concrete upstream, e.g. forward . 8.8.8.8.
  • Or restore the host's original /etc/resolv.conf and disable the local DNS cache on nodes.

Cause 6: querying the wrong Service name form

If resolution "fails" but DNS is healthy, you may be asking for a name that does not exist. The record formats are exact (DNS for Services and Pods):

  • A normal Service: my-svc.my-namespace.svc.cluster-domain.example resolves to the Service's cluster IP.
  • A headless Service (clusterIP: None): the same name resolves to the set of IPs of all selected pods, not a single VIP. Clients consume the set or round-robin over it.
  • SRV records use _port-name._port-protocol.my-svc.my-namespace.svc.cluster-domain.example. For a headless Service these return one answer per pod, of the form hostname.my-svc.my-namespace.svc.cluster-domain.example.

Two failure modes here. First, cross-namespace: "DNS queries that don't specify a namespace are limited to the Pod's namespace. To access Services in other namespaces, specify the namespace" — query data.prod, not bare data (DNS for Services and Pods). Second, expecting a single IP from a headless Service: an app that picks the "first" answer and caches it will miss the other pods and behave like a partial outage.

Diagnose by resolving each form explicitly and comparing:

kubectl exec -ti dnsutils -- nslookup my-svc.my-namespace.svc.cluster.local
kubectl exec -ti dnsutils -- nslookup my-svc.my-namespace

Fix: use the namespace-qualified name (or the full FQDN), and for headless Services design the client to consume all returned addresses.

Work the split first — cluster-wide versus single-workload — because it halves the search space before you touch anything. From a fresh dnsutils pod, if kubernetes.default fails, go straight to Cause 1 (and Cause 5 if CoreDNS is crashlooping). If only your workload fails, read its dnsPolicy and /etc/resolv.conf, list the NetworkPolicies selecting it, then compare the name forms it queries. Change one variable at a time, and prefer per-pod dnsConfig over cluster-wide CoreDNS edits — the blast radius of a Corefile mistake is every pod in the cluster.

Validation

# In-cluster resolution
kubectl exec -i -t dnsutils -- nslookup kubernetes.default

# External resolution (and check it is not slow)
kubectl exec -i -t dnsutils -- nslookup kubernetes.io

# Confirm the resolver config the pod actually has
kubectl exec -ti dnsutils -- cat /etc/resolv.conf

# Confirm CoreDNS is healthy and has endpoints
kubectl get pods -n kube-system -l k8s-app=kube-dns
kubectl get endpointslice -l kubernetes.io/service-name=kube-dns -n kube-system

A fix is confirmed when both in-cluster and external names resolve from a fresh debug pod, CoreDNS shows no Loop lines, and your workload's resolv.conf and NetworkPolicy match intent.

Prevention

  • Pin ndots deliberately. The default ndots:5 is a latency trap for external-heavy workloads; standardize on a per-workload dnsConfig where it matters rather than discovering it under load.
  • Treat "allow DNS egress" as a required baseline in every default-deny egress policy template, so security rollouts never silently sever name resolution.
  • Alert on CoreDNS SERVFAIL rate and pod restarts. CoreDNS exposes Prometheus metrics on :9153 (the prometheus :9153 line in the Corefile above), which is the natural signal source.
  • Keep node resolv.conf off the systemd-resolved loopback stub for CoreDNS's upstream, so a node rebuild does not reintroduce the forwarding loop.

How Intellira diagnoses this

DNS failures are a correlation problem: the symptom (an app can't reach a dependency) sits one or two hops from the cause (a NetworkPolicy merged an hour ago, a CoreDNS roll, a systemd-resolved change on a node). Intellira's read-only RCA approach is to line up the timeline across those systems — the recent NetworkPolicy and ConfigMap changes, CoreDNS pod/endpoint state, and the failing pod's dnsPolicy/resolv.conf — and surface the most likely cause with the evidence attached, rather than running the six manual checks above by hand. It does not mutate cluster state; the remediation steps stay with you.

Sources

By Intellira Engineering. AI-assisted draft, reviewed by the Intellira engineering team; claims cited inline; last verified 2026-06-02.

Frequently asked questions

Why are my external DNS lookups slow but in-cluster ones fast?
The default ndots:5 makes any name with fewer than 5 dots try every search domain first, so a name like api.example.com runs several failing cluster lookups before the real one. Set a per-pod ndots:2 in dnsConfig or use a fully qualified name with a trailing dot.
My pod resolves nothing at all after I added a NetworkPolicy. Why?
An egress NetworkPolicy is default-deny once it selects a pod. If it does not explicitly allow egress to kube-dns on port 53 (UDP and TCP), every DNS query is dropped. Add an egress rule to the kube-system kube-dns pods.
How do I tell a CoreDNS problem from a per-pod config problem?
If every pod across namespaces fails, suspect CoreDNS pods, endpoints, or an upstream loop. If only one workload fails, suspect its dnsPolicy, dnsConfig, NetworkPolicy, or the name form it is querying.

Related errors

Find the root cause of DNSResolution on your stack

Connect read-only and Intellira correlates the change behind it across Bitbucket, Jenkins, ArgoCD and Kubernetes — with the evidence to prove it.