Certified Kubernetes Administrator (CKA) #3 Cluster Architecture 2: Node (kubelet/kube-proxy/CRI), the Pod Networking Model
In #2 Cluster Architecture 1, we looked at how the four control plane components make the cluster’s decisions. The apiserver becomes the gateway for all communication, etcd stores state, the scheduler decides where Pods go, and the controller-manager runs the reconciliation loops. But every one of these decisions is just a decision. Actually launching containers, pushing traffic, and attaching disks and networks all happens on the worker nodes.
This post looks inside that node. We’ll cover what each of the three components running on a node — kubelet, kube-proxy, and the container runtime — does, what the CRI standard that connects kubelet to the runtime is, and the Pod networking model that lets every Pod call every other Pod directly, all the way down to the CNI plugin that actually implements it, all from an operations point of view.
The node executes the decisions the control plane makes #
Splitting Kubernetes into two layers makes it click fast. The control plane is the brain that decides “what to run and where,” and the worker nodes are the hands and feet that take that decision and “actually launch the containers.” When the scheduler decides “place this Pod on node01,” that decision is merely stored in etcd — no container has come up yet. The workload only runs the moment node01’s kubelet reads that decision and tells the container runtime to launch the container.
That’s why understanding the node components solves half of troubleshooting. If a Pod is stuck in Pending, it may be a scheduler-side problem, but if it’s stuck in ContainerCreating, you need to look at kubelet, the runtime, or the CNI. When a node goes NotReady, every Pod on that node is affected. The three node components are as follows.
| Component | Role | Where it runs |
|---|---|---|
| kubelet | Node agent. Actually runs Pods and reports their state | Every node (including control plane nodes) |
| kube-proxy | Implements a Service’s virtual IP as routing rules on the node | Every node (usually a DaemonSet) |
| Container runtime | Takes a container image and launches the actual container | Every node |
Control plane nodes are in fact worker nodes too. Because the control plane components themselves run as static Pods, kubelet and the runtime run on control plane nodes as well. The apiserver and etcd we saw running as static Pods in #2 are, in the end, launched by that node’s kubelet.
kubelet: the node’s agent #
kubelet is the single agent that runs on every node, and it’s the most central of the node components. Unlike the other components, which run as Pods, kubelet runs directly as a systemd service on the node. The reason kubelet is a service rather than a Pod is clear: kubelet is the thing that launches Pods, so if it were itself a Pod, you’d fall into a chicken-and-egg problem.
Here’s what kubelet does, summarized.
- Running Pods. It watches the apiserver, and when a Pod scheduled to its own node appears, it tells the container runtime to launch the container. It passes the runtime everything in the PodSpec — the image, volumes, environment variables, and probe settings.
- Reporting state. It periodically reports its own node’s state (whether it’s Ready, available resources) and the state of each Pod to the apiserver. The status you see in
k get nodesandk get podsis, in the end, what kubelet reported. - Running probes. The thing that actually runs livenessProbe, readinessProbe, and startupProbe is kubelet. When liveness fails, kubelet restarts the container itself.
- Managing static Pods. Even without the apiserver, kubelet reads the manifests placed in
/etc/kubernetes/manifestsand launches Pods. These are static Pods, and this is exactly how the control plane components are bootstrapped.
Static Pods matter especially in CKA. Even when the apiserver is down, kubelet can launch Pods just by looking at this directory, which makes it possible for the control plane to bootstrap by “bringing itself up.” A situation where you’ve botched the apiserver’s manifest and the apiserver won’t come up is one of the signature troubleshooting scenarios covered in #24.
# kubelet runs as a systemd service
systemctl status kubelet
# kubelet logs (the #1 place to trace a NotReady cause)
journalctl -u kubelet -f
# static Pod manifest directory
ls /etc/kubernetes/manifests/When kubelet dies, that node can no longer report its state to the apiserver, so after a short while the node flips to NotReady. That said, Pods that were already running don’t immediately disappear just because kubelet died. When kubelet comes back, it starts managing those Pods again. Tracing the cause of NotReady is covered in earnest in #23.
kube-proxy: implementing Service on the node #
kube-proxy is the component that turns a Service’s virtual IP into actual routing. What a Service is gets covered in detail in #18, but here’s the key point: a ClusterIP Service has a virtual IP like 10.96.0.10, yet no network interface anywhere in the cluster holds that IP. It’s just an agreed-upon address.
So how does traffic sent to that virtual IP reach an actual Pod? Precisely because kube-proxy lays down routing rules on every node. kube-proxy watches the apiserver, and when a Service and the Pods behind it (endpoints) change, it updates the node’s kernel rules. The result is a rule on every node that says “packets going to this virtual IP get sent to one of the actual Pod IPs.”
kube-proxy is usually deployed as a DaemonSet, running one copy per node. Since traffic sent from any node has to pass through that node’s rules to reach the destination Pod, the rules must exist on every node.
iptables mode and IPVS mode #
There are broadly two ways kube-proxy lays down rules.
| Mode | How it works | Characteristics |
|---|---|---|
| iptables | DNAT handled with the kernel’s iptables rules | The default. Stable. As Services grow, rule evaluation increases linearly and performance drops |
| IPVS | Handled with the kernel’s IPVS (a hash table) | Favorable for large clusters with many Services. Supports various load-balancing algorithms |
The default is iptables mode. In large clusters where Services grow to the thousands, the cost of evaluating iptables rules sequentially becomes significant, so the hash-based IPVS mode delivers better performance. For CKA, it’s enough to know the difference between the two modes and which one favors scale. You can check which mode is in use from kube-proxy’s ConfigMap or its logs.
The container runtime and CRI #
kubelet does not launch containers directly. kubelet only gives the order “launch one container from this image”; the actual work of pulling the image, creating namespaces and cgroups, and starting the process is done by the container runtime. The most widely used runtime today is containerd, and CRI-O is also common.
CRI: the standard interface between kubelet and the runtime #
The reason kubelet can talk to any runtime in the same way is that a standard called CRI (Container Runtime Interface) exists. CRI is a gRPC-based standard interface that sits between kubelet and the runtime. kubelet sends requests like “launch a container” and “pull an image” according to the CRI contract, and any runtime that implements that contract will handle the request.
In the past, kubelet contained an adapter called dockershim that called Docker directly. But Docker doesn’t implement CRI itself, so it needed a separate adapter, and this dockershim was removed in Kubernetes 1.24. As a result, today’s standard path is kubelet → CRI → containerd (or CRI-O) → container. Images built with Docker (OCI images) can be used as-is, so there’s no image compatibility problem. It’s just that the thing launching containers on the node became containerd rather than the Docker daemon.
crictl: inspecting containers at the CRI level #
Now that the runtime has shifted to containerd, when you want to inspect containers directly on a node you use crictl instead of docker. crictl is a debugging tool that talks to the runtime through CRI.
# inspect containers running on the node (the CRI version of docker ps)
crictl ps
# list container images
crictl imagesWhen the Pods kubelet reports and the containers crictl shows diverge (for example, kubelet says it’s alive but the container won’t come up), you start to suspect a runtime-level problem. It’s the tool you reach for when troubleshooting takes you one level deeper inside the node.
Node registration and status checks #
When a node joins the cluster (we’ll do this join process hands-on with kubeadm in #4), that node’s kubelet registers itself with the apiserver and then periodically reports its state. The first command an administrator looks at is k get nodes.
# node list and status
k get nodes
# node IP, OS, kernel, container runtime — all at a glance
k get nodes -o wideAdding -o wide shows the internal IP, operating system, kernel version, and even the container runtime version (e.g., containerd://1.7.x). You can see at a glance which node uses which runtime.
A node’s STATUS is usually Ready, but it can become NotReady for reasons like these.
- kubelet died or failed to start. The most common cause.
systemctl status kubeletandjournalctl -u kubeletare the first check points - The container runtime died. If kubelet can’t reach the runtime over CRI, it can’t report the node as healthy
- CNI is not ready. If the network plugin isn’t installed yet or is broken, the node stays NotReady
- Resource pressure. Conditions like disk pressure or memory pressure get attached and block normal scheduling
Here we’ll just get the big picture of “when you see NotReady, where do you start,” and leave the per-cause recovery for #23, which covers it step by step.
The Pod networking model #
The starting point of Kubernetes networking is a single promise: every Pod must be able to communicate directly with every other Pod’s IP, without NAT. Spelled out, this model is as follows.
- Every Pod has a unique IP. The same holds whether they’re on the same node or different nodes.
- One Pod can send packets directly to another Pod’s IP without NAT.
- Agents on the node (kubelet and the like) can also communicate directly with that node’s Pods.
Thanks to this promise, developers can design communication purely by IP without caring which node a Pod is on. The port mapping and NAT you commonly saw in traditional virtualization environments don’t exist between Pods.
Pod CIDR and per-node partitioning #
To implement this model, IPs must not overlap. So the cluster sets aside a large IP range for all Pods, the Pod CIDR (e.g., 10.244.0.0/16), and slices this range up per node. For example, node01 gets 10.244.1.0/24, node02 gets 10.244.2.0/24, and so on. Each node hands out Pod IPs only within the subnet assigned to it, so Pod IPs never overlap across the whole cluster.
So how does a packet sent from a Pod on node01 to a Pod on node02 cross the node boundary? What solves this with inter-node routing is the CNI plugin we’ll look at next.
The pause container #
The reason multiple containers inside a Pod share the same IP and the same network namespace is that for each Pod, an invisible container called the pause container comes up first. The pause container holds the network namespace, and the other containers of the same Pod join that namespace. That’s why containers within one Pod can call each other over localhost.
The CNI plugin implements the actual network #
Kubernetes itself only sets the rule — the “Pod networking model” — and delegates the work of implementing that rule as an actual network to the CNI (Container Network Interface) plugin. When kubelet launches a Pod, it calls the CNI plugin to attach an IP to that Pod and connect it to the network. A node with no CNI installed can’t give Pods a network, so it stays NotReady.
The representative CNI plugins are as follows.
| Plugin | Characteristics |
|---|---|
| Calico | BGP-based routing, widely used for its rich NetworkPolicy support |
| Cilium | eBPF-based. High performance and fine-grained security policy |
| Flannel | A simple overlay network. Easy to configure, good for learning and small setups |
To go deeper on the structure of CNI plugins and how NetworkPolicy works, I recommend reading the CNI installment of the Kubernetes advanced series alongside this post. Within the CKA scope, the priority is to grasp the causality that “the CNI implements the Pod networking model, and without it a node goes NotReady.”
The core commands for checking nodes #
Here are the commands for actually checking the components covered in this post, gathered in one place.
# node status and runtime at a glance
k get nodes -o wide
# kubelet status (the #1 check for NotReady)
systemctl status kubelet
journalctl -u kubelet -f
# containers running on the node (CRI level)
crictl ps
# kube-proxy and CNI are usually checked as Pods in the kube-system namespace
k get pods -n kube-system -o wideWith the last command, you can check that kube-proxy and the CNI plugin (e.g., calico-node) are up and healthy on each node. If only one node is NotReady, checking whether that node’s kube-proxy or CNI Pod is up is a useful clue.
Exam points #
- kubelet runs as a systemd service, not a Pod. The #1 step for tracing NotReady is
systemctl status kubeletandjournalctl -u kubelet. - Static Pods are launched by kubelet from
/etc/kubernetes/manifestswithout the apiserver. The control plane components are bootstrapped this way. - kube-proxy implements a Service’s virtual IP as routing rules on the node. Distinguish the two modes: iptables (default) and IPVS (favorable at scale).
- CRI is the standard interface between kubelet and the runtime. After dockershim was removed in 1.24, containerd/CRI-O are the standard, and on the node you check with
crictl ps, notdocker. - The Pod networking model is “every Pod communicates directly without NAT.” The Pod CIDR is partitioned per node, and the actual implementation is done by a CNI plugin (Calico/Cilium/Flannel). Without a CNI, a node stays NotReady.
k get nodes -o widelets you check even the container runtime version per node.
Wrap-up #
What this post locked in:
- The node is the hands and feet that execute the control plane’s decisions. kubelet takes the placement the scheduler decided and has the runtime launch the container.
- kubelet. The node agent. Running Pods, reporting state, running probes, managing static Pods. Runs as a systemd service.
- kube-proxy. Implements a Service’s virtual IP as iptables/IPVS rules. Usually a DaemonSet.
- The container runtime and CRI. kubelet talks to containerd/CRI-O through CRI. dockershim was removed in 1.24. You check with
crictl. - The Pod networking model. NAT-free Pod-to-Pod communication, a per-node Pod CIDR, the pause container, with the CNI plugin doing the actual implementation.
Next — kubeadm Cluster Install #
We’ve now swept the architecture across both layers — the control plane (#2) and the node (this post). Now it’s time to stand these components up with our own hands.
In #4 kubeadm Cluster Install, we’ll install the container runtime and kubeadm on a bare Linux machine, bootstrap a single control plane with kubeadm init, apply a CNI plugin, and join worker nodes with kubeadm join — following the process from start to finish. It’s the post where you confirm with your own hands how the kubelet, CRI, and CNI we saw here all mesh together in one place.