Certified Kubernetes Administrator (CKA) #20 Networking 3: CoreDNS, NetworkPolicy

In #19 Networking 2 we used Ingress to split incoming external traffic by host and path. But inside the cluster, how do Pods find each other? A Pod IP changes on every restart, so you can’t hardcode it, and even the IP attached to a Service is too unstable to bake into a manifest. That’s why Kubernetes ships a built-in in-cluster DNS that lets resources find each other by name.

The two topics of this post are CoreDNS, which handles that name resolution, and NetworkPolicy, which controls who can send traffic to whom between Pods. One answers “how do I find it,” the other “who am I allowed to talk to.” Both show up often on the exam’s Services and Networking domain, and both are frequent starting points for chasing down outages in production — so we’ll put the weight on how they work and how to debug them.

CoreDNS: the cluster’s name resolver #

CoreDNS is the default DNS server of a Kubernetes cluster. In a cluster installed with kubeadm, CoreDNS runs as a Deployment in the kube-system namespace, fronted by a Service named kube-dns that holds a fixed ClusterIP. Each Pod’s /etc/resolv.conf points its nameserver at this ClusterIP, so every name resolution request from a Pod flows into CoreDNS.

# Check the CoreDNS Deployment and Pods
k get deploy coredns -n kube-system
k get pods -n kube-system -l k8s-app=kube-dns

# The Service fronting CoreDNS (the name stays kube-dns)
k get svc kube-dns -n kube-system

If the CoreDNS Pod is down or its ClusterIP changes, name resolution stops across the whole cluster. That’s why DNS is always the prime suspect for the symptom “some app can’t reach another app.”

DNS names for Services and Pods #

CoreDNS assigns regular, predictable names to cluster objects. The one you’ll use most often is the Service’s DNS name.

<service>.<namespace>.svc.cluster.local

For example, the web Service in the default namespace resolves as web.default.svc.cluster.local. Within the same namespace you can reach it with just the short web, and to call a Service in another namespace you append the namespace as web.default. This shortening works because the Pod’s resolv.conf contains search domains.

Name formMeaning
webThe web Service in the same namespace
web.defaultThe web Service in the default namespace
web.default.svc.cluster.localFQDN (fully qualified name). Resolves identically from anywhere

Pods also get DNS names, but in a different form. They take the shape 10-244-1-5.default.pod.cluster.local, with the dots of the Pod IP replaced by hyphens, and you rarely use them directly in practice. That said, StatefulSet Pods tied to a Headless Service get a stable name of <pod>.<service>.<namespace>.svc.cluster.local, which lets you address an individual Pod by that name.

Corefile: the CoreDNS configuration #

CoreDNS behavior is defined by a configuration called the Corefile, and that Corefile lives in the coredns ConfigMap in the kube-system namespace. When you want to view or change the configuration, you work with this ConfigMap.

# View the Corefile contents
k get configmap coredns -n kube-system -o yaml

The Corefile inside the ConfigMap looks roughly like this.

.:53 {
    errors
    health
    ready
    kubernetes cluster.local in-addr.arpa ip6.arpa {
        pods insecure
        fallthrough in-addr.arpa ip6.arpa
    }
    prometheus :9153
    forward . /etc/resolv.conf       # send domains outside the cluster to the upstream DNS
    cache 30
    loop
    reload
    loadbalance
}

There are two key blocks. kubernetes cluster.local ... is the plugin that resolves Service and Pod names for the cluster.local domain by integrating with the Kubernetes API, and forward . /etc/resolv.conf is the setting that hands off external names that are not in-cluster domains (e.g. google.com) to the node’s upstream DNS. After editing the Corefile, CoreDNS picks up the change automatically via the reload plugin, but to apply it immediately you can restart the CoreDNS Pods.

# To apply a Corefile change immediately, roll out a restart of CoreDNS
k rollout restart deploy coredns -n kube-system

Debugging DNS #

In both the exam and production, the fastest way to settle “the name won’t resolve” is to spin up a throwaway Pod and query directly. The nslookup in the busybox image is the standard tool.

# Look up a Service name with a one-off Pod (auto-deleted when done)
k run -it --rm test --image=busybox:1.28 --restart=Never -- nslookup web

# Look up a Service in another namespace by FQDN
k run -it --rm test --image=busybox:1.28 --restart=Never -- \
  nslookup web.default.svc.cluster.local

The busybox image behaves differently for nslookup depending on the version. The 1.28 tag causes the fewest problems with DNS lookups, so it’s used as the de facto standard for debugging.

When a lookup fails, narrow it down in this order.

  • CoreDNS Pod status: confirm it’s Running with k get pods -n kube-system -l k8s-app=kube-dns. If it’s in CrashLoop, check the cause with k logs.
  • kube-dns Service: with k get svc kube-dns -n kube-system, check that it has a ClusterIP and that its endpoints aren’t empty.
  • The Pod’s resolv.conf: go in with k exec and check cat /etc/resolv.conf. The nameserver should point at the kube-dns ClusterIP.
  • The name itself: you may have dropped the namespace or misspelled the FQDN. If it’s not the same namespace, call it as <service>.<namespace>.

DNS troubleshooting is wrapped up again together with certificates and RBAC in #25.

NetworkPolicy: controlling Pod-to-Pod traffic #

In a default Kubernetes cluster, every Pod can talk to every other Pod freely. Different namespace, different labels — none of it matters; if you know the IP, you can reach it. The tool that tightens this all-allow state is NetworkPolicy.

There is one key rule to how NetworkPolicy works. As long as no policy applies to a Pod, that Pod stays all-allow. But the moment even a single policy applies to a Pod, that Pod switches to a whitelist model where it only accepts explicitly allowed traffic. In other words, NetworkPolicy is an allow list, not a block list.

StateBehavior
No policy on the PodAll inbound/outbound allowed
One or more policies on the PodFor that direction (ingress/egress), only what the policy lists is allowed, the rest blocked

The structure of a NetworkPolicy #

The key fields are as follows.

  • podSelector: selects the target Pods this policy applies to by label. Leaving it empty ({}) targets every Pod in the namespace.
  • policyTypes: declares whether this policy handles Ingress (incoming traffic), Egress (outgoing traffic), or both.
  • ingress.from: whom to allow inbound from. You specify the source with podSelector, namespaceSelector, or ipBlock.
  • egress.to: where to allow outbound to. Likewise, you specify the destination with the same three selectors.
  • ports: narrows down the allowed ports and protocols.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend-to-api
  namespace: default
spec:
  podSelector:              # target of this policy: app=api Pods
    matchLabels:
      app: api
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:          # allow only app=frontend Pods in the same namespace
        matchLabels:
          app: frontend
    ports:
    - protocol: TCP
      port: 8080

The policy above applies to app=api Pods and allows only inbound that app=frontend Pods in the same namespace send over TCP 8080. All other inbound is blocked. Egress is not in policyTypes, so it’s untouched and stays all-allow.

The three selectors for from/to #

The selectors inside from and to choose the source or destination in three ways. Distinguishing these three precisely is a point that often decides the outcome on the exam.

selectorMeaning
podSelectorSelects Pods by label within the same namespace
namespaceSelectorSelects Pods in the namespace(s) chosen by label
ipBlockSelects IPs by CIDR range (e.g. sources outside the cluster)
  ingress:
  - from:
    - namespaceSelector:     # allow traffic from namespaces labeled team=prod
        matchLabels:
          team: prod
    - ipBlock:               # allow traffic from a specific CIDR (except 10.0.5.0/24)
        cidr: 10.0.0.0/16
        except:
        - 10.0.5.0/24

Here is a subtle but recurring exam trap. Putting podSelector and namespaceSelector together inside a single from item makes it an AND condition: “the Pods with that label in that namespace.” Splitting them into separate items with two - entries, on the other hand, makes it an OR condition: “all Pods in that namespace” or “the Pods with that label.”

  ingress:
  - from:
    - namespaceSelector:     # AND: only app=frontend Pods in the team=prod namespace
        matchLabels:
          team: prod
      podSelector:           # (under the same -, grouped by indentation)
        matchLabels:
          app: frontend

Note that there is no - between - namespaceSelector and podSelector. The presence or absence of a single hyphen separates AND from OR, so after writing the manifest, always confirm with k describe networkpolicy that it’s grouped the way you intended.

The default deny pattern #

The most common pattern in production is to first block the whole namespace, then open only the necessary communication with separate policies. You build a default deny policy that blocks everything by leaving podSelector empty and including no ingress rule.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
  namespace: default
spec:
  podSelector: {}            # applies to every Pod in the namespace
  policyTypes:
  - Ingress
  # no ingress rule = all inbound blocked

With podSelector: {} you target every Pod in the namespace, and by including no ingress item at all, the allow list is empty and all inbound is blocked. To block egress too, add Egress to policyTypes and leave the egress rule empty. A complete block of both inbound and outbound looks like this.

spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

With default deny laid down like this, you can add allow policies one at a time — like the earlier allow-frontend-to-api — to build a cluster where only the necessary paths are open. When multiple NetworkPolicies apply to the same Pod, they combine as a union (OR), so when a block policy and an allow policy coexist, the path the allow policy opens lets traffic through.

In a default deny namespace, even DNS queries to CoreDNS (UDP/TCP 53) are blocked. If you blocked egress, you must separately allow egress on port 53 toward kube-dns in kube-system for name resolution to work. This is why you suspect NetworkPolicy when you hit a “name won’t resolve” symptom.

It only works if the CNI supports it #

There’s a fact people miss most often with NetworkPolicy. Even after you create a NetworkPolicy object, nothing happens unless the CNI plugin enforces it. A NetworkPolicy is only a rule declaration; what actually blocks packets is the CNI.

  • Enforces: Calico, Cilium, Weave Net, Antrea, and so on
  • Does not enforce: CNIs that don’t support NetworkPolicy, like Flannel (default configuration)

So in a cluster with only Flannel installed, no matter how precisely you write a NetworkPolicy, traffic flows right through. The first thing to check for the symptom “I made a policy but it doesn’t block” is the CNI. The Pod networking model and the role of the CNI were covered in #3 Cluster Architecture 2.

Verification and troubleshooting #

After creating a NetworkPolicy, verify with a throwaway Pod whether traffic is actually blocked or allowed.

# Check the policy, its target, and rules
k get networkpolicy
k describe networkpolicy allow-frontend-to-api

# Try connecting from a Pod with the allowed source label (should get through)
k run probe --image=busybox:1.28 --restart=Never --labels=app=frontend \
  -- wget -qO- --timeout=2 api:8080

# Try connecting from a non-allowed Pod (should be blocked)
k run probe2 --image=busybox:1.28 --restart=Never \
  -- wget -qO- --timeout=2 api:8080

When blocked, wget falls over with a timeout; when allowed, a response comes back. If intent and result diverge, check the following.

  • Target Pod labels: cross-check with k get pods --show-labels that the label podSelector chose is actually on the target Pods.
  • Source labels/namespace: check that the from selector matches the actual source Pod and namespace labels.
  • AND vs OR: re-confirm that the hyphen structure of the from items is the condition you intended.
  • CNI: if nothing is blocked at all, check whether the CNI enforces NetworkPolicy.
  • DNS egress: if it’s a policy that blocked egress, check that port 53 is open.

Exam points #

In CKA’s Services and Networking domain (20%), CoreDNS and NetworkPolicy are often tested together. Get the following into your hands.

  • A Service’s DNS name is <service>.<namespace>.svc.cluster.local. Same namespace uses the short name, another namespace calls it as <service>.<namespace>.
  • The standard for DNS debugging is k run -it --rm test --image=busybox:1.28 -- nslookup <name>. CoreDNS is a Deployment in kube-system, and its configuration is the Corefile in the coredns ConfigMap.
  • NetworkPolicy is a whitelist. A Pod with a policy only accepts what’s explicitly allowed. A Pod with no policy is all-allow.
  • podSelector: {} + no rules = default deny. You choose the ingress/egress direction with policyTypes.
  • Distinguish podSelector, namespaceSelector, and ipBlock in from/to, and tell apart the AND inside a single item from the OR of separate items.
  • NetworkPolicy works only when the CNI enforces it. It’s ignored in Flannel’s default configuration.

NetworkPolicy cannot be created imperatively, so writing a manifest is practically mandatory. Quickly copying an example from the official docs and changing only the labels and ports is the fastest move in the exam room. For the broader context of in-cluster communication and DNS, it’s worth reviewing alongside K8s Intermediate #7.

Summary #

What this post locked in:

  • CoreDNS is the cluster’s default DNS. It runs as a Deployment in kube-system, the fronting Service is named kube-dns, and Services resolve as <service>.<namespace>.svc.cluster.local.
  • The CoreDNS configuration is the Corefile in the coredns ConfigMap. The kubernetes plugin handles cluster names, and forward handles external names.
  • DNS debugging starts by spinning up busybox 1.28’s nslookup as a one-off Pod.
  • NetworkPolicy is a whitelist. With no policy, everything is allowed; once a policy applies, only the allow list gets through for that direction.
  • You build rules with podSelector/policyTypes/ingress.from/egress.to/namespaceSelector/ipBlock/ports, block with default deny, then open only the necessary paths.
  • NetworkPolicy works only if the CNI enforces it. It has no effect in Flannel’s default configuration.

Next — Helm and Kustomize #

With three networking posts, we’ve mapped out the cluster’s communication layer from Service and Ingress down to DNS and NetworkPolicy. Next come the tools that manage all these manifests efficiently.

In #21 Helm and Kustomize: manifest management, we’ll compare Helm (charts, releases, helm install/upgrade), which treats manifests as templates and values, with Kustomize (kustomization.yaml, overlays), which layers patches over a base to produce per-environment variants — both from an operations point of view.

X