Kubernetes and Cloud Native Associate (KCNA) #4: Container Orchestration (22%) — Runtime, Security, Networking, Storage, Service Mesh

If everything up to #3 covered Kubernetes’ own core resources and API, this post is about the layer beneath Kubernetes that actually holds your containers up. It deals with what runtime the containers inside a Pod run on, which plugin network packets pass through, where data is stored, and who has permission to call all of this.

One idea runs through this entire layer: standard interfaces. Kubernetes does not implement runtime, networking, or storage itself. It defines only the standard boundaries called CRI, CNI, and CSI, then swaps the implementations in as plugins. The KCNA Container Orchestration domain carries a weight of 22%, the second largest after Fundamentals, and telling these three interfaces apart is a recurring exam item.

Container Runtime #

A container runtime is the software that turns an image into an actual process and isolates it. In Kubernetes, when the kubelet on a worker node brings up a Pod, it does not create the container directly — it hands the work off to this runtime. Runtimes split into two layers by level of abstraction.

High-level and low-level runtimes #

  • High-level runtime. It pulls images, unpacks and stores them, and manages the container lifecycle. containerd and CRI-O are the prime examples. The high-level runtime is what the kubelet talks to directly.
  • Low-level runtime. The lowest layer, which actually sets up the Linux kernel’s namespaces and cgroups to isolate and run the process. runc is the de facto standard, and high-level runtimes call runc internally.

To sum up: the kubelet calls containerd, and containerd calls runc to start the container process. Think of containerd as the “manager” and runc as the “executor,” each handling its own role.

The OCI standard #

If runtimes and images each go their own way, compatibility breaks. The standard that prevents this is the OCI (Open Container Initiative). OCI consists of two specifications.

  • image-spec. Defines the format of a container image (layer structure, manifest, and so on).
  • runtime-spec. Defines how an unpacked image should be run (filesystem bundle, lifecycle).

Thanks to OCI, an image built with any tool runs identically on any OCI-compliant runtime. runc is the reference implementation of the OCI runtime-spec.

CRI: the boundary between kubelet and runtime #

The CRI (Container Runtime Interface) is the standard gRPC API between the kubelet and the container runtime. The kubelet talks to the runtime only through this agreed-upon interface, so whether containerd or CRI-O sits behind it, the kubelet code does not change. This boundary is why swapping the runtime does not shake Kubernetes.

Early Kubernetes kept a conversion layer called dockershim in the core in order to use Docker, but as standard CRI runtimes took hold, dockershim was removed in v1.24 (the dockershim deprecation). Images built with Docker follow the OCI standard, so they run just fine on top of containerd. The key point is that the build tool and the runtime are separate things.

The three standard interfaces: CRI, CNI, CSI #

The most confusing point in this domain — and the most frequently tested — is telling the three interfaces apart. All three share the trait of being “standard boundaries that Kubernetes does not implement directly but delegates to plugins,” yet what they delegate differs.

InterfaceFull nameWhat it pluginizesRepresentative implementations
CRIContainer Runtime InterfaceContainer runtime (execution)containerd, CRI-O
CNIContainer Network InterfacePod networking (connectivity)Calico, Cilium, Flannel
CSIContainer Storage InterfacePersistent storage (storing)drivers such as EBS, Ceph

Spell out the acronyms as “R is Runtime, N is Network, S is Storage” and the answer choices split apart instantly. On the exam, a question like “Which standard interface attaches a persistent volume to a Pod?” asks for CSI, while “Which standard swaps out the runtime?” asks for CRI.

Security #

RBAC #

RBAC (Role-Based Access Control) is the permission model that defines “who can do what.” Four kinds of resources pair up.

  • Role. A bundle of permissions tying together the allowed actions (verbs) and targets (resources) within a specific Namespace.
  • ClusterRole. A bundle of permissions scoped to the entire cluster. It is not bound to a Namespace.
  • RoleBinding / ClusterRoleBinding. Connects a Role or ClusterRole to a user, group, or ServiceAccount. The bridge that joins a permission bundle to a subject.

Here, a ServiceAccount is the identity used when a process running inside a Pod calls the Kubernetes API. The distinguishing point is that it is the identity of a workload, not of a human user. RBAC is a whitelist model, so any permission not explicitly granted is denied by default.

NetworkPolicy #

In its default state, Kubernetes lets all Pods communicate freely with one another (default allow). NetworkPolicy lays firewall rules over this flat plane to restrict ingress (incoming traffic) and egress (outgoing traffic).

The core behavior is this. The moment even one NetworkPolicy applies to a Pod, that Pod receives only the traffic the policy explicitly allows, and everything else is blocked. In other words, with no policy everything is allowed; once a policy is attached, everything outside the allow list is blocked. That said, NetworkPolicy only takes effect if the CNI plugin that actually enforces it supports it (for example, Calico or Cilium). On a plugin that does not support it, the policy you create simply does nothing.

SecurityContext and Pod Security #

SecurityContext specifies the privileges under which a container or Pod runs. It blocks running as root with runAsNonRoot: true, or narrows privileges with settings like readOnlyRootFilesystem and allowPrivilegeEscalation: false. It is the mechanism that minimizes runtime privileges so a container cannot threaten the host.

At the cluster level, Pod Security Admission (PSA) acts as the gate. When a Pod is created, it judges whether the Pod passes against a three-tier policy called Pod Security Standards.

  • Privileged. No restrictions. For trusted system workloads.
  • Baseline. The minimal restriction that blocks known privilege escalation.
  • Restricted. The strictest tier, enforcing best practices hard.

It is enough to know the overall arc: the older PodSecurityPolicy (PSP) was removed and Pod Security Admission took its place.

The hands-on application of RBAC and NetworkPolicy is covered with manifests in Hands-on Track Intermediate #7.

Networking #

The Kubernetes network model #

The starting point of Kubernetes networking is a simple rule: every Pod communicates directly with another Pod’s IP, with no NAT. That is, every Pod appears to belong to one flat network, and each Pod has a unique IP. The responsibility for actually implementing this model lies with the CNI plugin.

CNI plugins #

The CNI (Container Network Interface) is the standard that attaches a network interface and assigns an IP when a Pod is created. The representative implementations are as follows.

  • Flannel. A simple overlay network. Easy to set up and lightweight.
  • Calico. Supports NetworkPolicy strongly and provides BGP routing.
  • Cilium. eBPF-based, providing high performance and sophisticated policy and observability.

CNI plugin choice and how it works are covered in more depth in Hands-on Track Advanced #1.

Service and kube-proxy #

A Pod’s IP changes every time the Pod is recreated. The stable touchpoint that hides this unstable IP is the Service. A Service provides a fixed virtual IP (ClusterIP) and a name, and distributes traffic to a set of Pods grouped by a label selector. The component that actually applies these distribution rules on each node is kube-proxy. You need to distinguish three Service types.

TypeExposure scopeUse
ClusterIPCluster-internal only (default)Communication between internal services
NodePortExternally exposed on a specific port of each nodeSimple external access and testing
LoadBalancerExternally exposed via a cloud load balancerThe external entry point in production

The three types are easy to grasp as a containment relationship. NodePort contains ClusterIP, and LoadBalancer contains NodePort.

CoreDNS and service discovery #

To reach a Service by name, you need DNS to resolve that name into an IP. CoreDNS takes on that role. It is a DNS server running inside the cluster that translates a name like my-svc.my-namespace.svc.cluster.local into the Service’s ClusterIP. A Pod can find another service by name alone, without knowing its IP, and this name-based lookup is called service discovery.

Ingress #

The Service LoadBalancer type requires standing up one load balancer per service, which is costly. Ingress is an L7 layer that routes HTTP/HTTPS requests from a single entry point to multiple Services according to host and path rules. However, creating the Ingress resource alone does nothing — an Ingress Controller (for example, nginx or Traefik) that interprets it and actually handles the traffic must be running in the cluster.

Storage #

Ephemeral vs. persistent #

A container’s filesystem disappears along with the container. Storage splits into two by lifetime.

  • Ephemeral volume. Tied to the Pod’s lifetime. The prime example is emptyDir: when the Pod is deleted, the data is gone too. Used for caching or temporary sharing between containers.
  • Persistent volume. The data survives even when the Pod is gone. Needed for workloads that must keep state, such as databases.

Volume / PV / PVC / StorageClass #

Persistent storage is organized through the relationship of four concepts.

  • PersistentVolume (PV). The actual storage resource. Supplied by a cluster administrator or by dynamic provisioning.
  • PersistentVolumeClaim (PVC). The request form in which a user asks for “this much storage.” A Pod uses a volume through a PVC.
  • StorageClass. The template that defines what kind and what performance of PV to create. It is the basis for dynamic provisioning.

The flow is that a user submits a PVC, that request is bound to a suitable PV, and it is mounted into the Pod. The hands-on use of PV and PVC is covered with manifests in Hands-on Track Intermediate #2.

CSI and dynamic provisioning #

The CSI (Container Storage Interface) is the standard for connecting external storage systems to Kubernetes. When a cloud block storage or distributed filesystem vendor provides a CSI driver, Kubernetes attaches and detaches volumes through it. When CSI and StorageClass mesh, dynamic provisioning becomes possible. That is, when a PVC comes in, a PV is created automatically and bound according to the StorageClass definition, without an administrator having pre-created a PV.

What happens to a PV when the PVC that claimed it is released is decided by the reclaim policy. Delete deletes the underlying storage as well, while Retain leaves the data and waits for manual handling.

Service Mesh #

What it is and why it is used #

As microservices multiply, cross-cutting concerns — retries, timeouts, encryption, authentication, traffic distribution, and call tracing — recur in every service-to-service communication path. Implementing these in each application’s code produces inconsistent results across languages and teams. Service Mesh is the approach of pulling this service-to-service communication management out into an infrastructure layer outside the app code.

The sidecar pattern #

A traditional Service Mesh uses the sidecar pattern. It attaches a proxy container (for example, Envoy) to each Pod, so that all traffic that Pod sends and receives passes through the proxy. The application stays unaware of communication control, while a data plane of these gathered proxies and a control plane that governs them manage the mesh as a whole. The representative implementations are Istio and Linkerd.

What you gain #

What a Service Mesh provides outside the code falls into three broad areas.

  • mTLS. Automatically mutually authenticates and encrypts service-to-service communication.
  • Traffic management. Handles canary deployments, weighted routing, and retries and timeouts declaratively.
  • Observability. Automatically collects metrics and traces of service-to-service calls.

KCNA asks about Service Mesh only at a conceptual level. One sentence — “the layer that manages service-to-service communication outside the app with sidecar proxies and provides mTLS, traffic management, and observability” — plus the names Istio and Linkerd is enough.

Exam points #

  • Distinguishing CRI, CNI, CSI. Runtime, Network, Storage. This is the most frequently tested pattern in this domain.
  • High-level vs. low-level runtime. containerd and CRI-O (high-level) call runc (low-level). dockershim was removed, and Docker images run as-is because they follow the OCI standard.
  • Default behavior of NetworkPolicy. With no policy, everything is allowed; with a policy attached, everything outside the explicit allow list is blocked. A supporting CNI is required for it to work.
  • Service types. ClusterIP (internal) ⊂ NodePort (node port) ⊂ LoadBalancer (cloud). CoreDNS handles service discovery.
  • PV, PVC, StorageClass, CSI. The user requests with a PVC, and CSI and StorageClass handle dynamic provisioning.
  • RBAC components. Role/ClusterRole + RoleBinding/ClusterRoleBinding + ServiceAccount (workload identity).

Wrap-up #

What we pinned down in this post:

  • Runtime. The kubelet calls a high-level runtime (containerd, CRI-O) over CRI, and that calls a low-level runtime (runc). OCI (image-spec, runtime-spec) guarantees compatibility.
  • The three interfaces. CRI (runtime), CNI (network), and CSI (storage) each pluginize a different layer.
  • Security. RBAC (permissions), NetworkPolicy (traffic isolation), and SecurityContext with Pod Security Standards (restricting runtime privileges).
  • Networking. The NAT-free Pod communication model, CNI plugins, Service types (ClusterIP, NodePort, LoadBalancer), CoreDNS, and Ingress.
  • Storage. Ephemeral (emptyDir) vs. persistent (PV, PVC, StorageClass), CSI dynamic provisioning, and reclaim policy.
  • Service Mesh. Managing service-to-service communication outside the app with sidecar proxies. mTLS, traffic management, observability. Istio, Linkerd.

Next: Cloud Native Architecture #

We have now pinned down the layer that holds containers up. Next we broaden our view beyond Kubernetes into cloud native design thinking.

In #5 Cloud Native Architecture (16%): Autoscaling, Serverless, Community, Open Standards, we will cover autoscaling (HPA, VPA, Cluster Autoscaler), serverless and FaaS, the CNCF community and project maturity levels, and open standards.

X