Kubernetes

K8s Advanced #5: Observability — Prometheus / Grafana / Loki / OpenTelemetry
10 min read

K8s Advanced #5: Observability — Prometheus / Grafana / Loki / OpenTelemetry

Operational cluster observability is composed of three axes — metrics, logs, and traces. The K8s standard stack for each axis is nearly settled. Metrics with Prometheus + kube-state-metrics + node-exporter, logs with Loki (or EFK), traces with OpenTelemetry, visualization with Grafana, alerting with Alertmanager. This post organizes the three-axis model, the standard components for each axis, and operational principles like cardinality, retention period, and alert design — all in one cycle.

K8s Advanced #4: CRD and the Operator Pattern — controller-runtime
10 min read

K8s Advanced #4: CRD and the Operator Pattern — controller-runtime

One reason K8s is powerful is that you can extend its API itself. Defining new object kinds with CustomResourceDefinition and writing a reconcile loop for those objects with controller-runtime makes domain objects live as standard resources on top of K8s. Objects with names like PostgresCluster, RedisFailover, KafkaBroker are the result. This post organizes the CRD model, an Operator skeleton based on controller-runtime, and ownerReference / finalizer / status subresource — all in one cycle.

K8s Advanced #3: Admission Controller — OPA Gatekeeper / Kyverno
10 min read

K8s Advanced #3: Admission Controller — OPA Gatekeeper / Kyverno

The K8s API server has a stage that can inspect and mutate manifests right before they're stored in etcd. This stage, called Admission Controller, is the entry point for the operational cluster's policy engine. Policies like "reject containers without limits," "force specific labels," "restrict image origins" are blocked at the manifest level without changing a line of code. This post organizes the position of the admission stage, built-in controllers, ValidatingWebhook and MutatingWebhook, and the models of two policy engines OPA Gatekeeper and Kyverno — all in one cycle.

K8s Advanced #2: RBAC / ServiceAccount in Depth — Aggregated ClusterRole / Impersonation / IRSA / Workload Identity
11 min read

K8s Advanced #2: RBAC / ServiceAccount in Depth — Aggregated ClusterRole / Impersonation / IRSA / Workload Identity

[Intermediate #7](/en/posts/k8s-intermediate-7) covered the four RBAC objects and the ServiceAccount model. On top of that, there's more depth encountered in operational clusters. Aggregated ClusterRole that makes ClusterRoles extensible by composing them via labels, Impersonation that temporarily acts as another user's permission, the flow where ServiceAccount tokens shifted from legacy secrets to projected tokens, and EKS's IRSA and GKE's Workload Identity that tie K8s ServiceAccounts to cloud IAM — one more layer of the permission model in depth.

K8s Advanced #1: CNI in Depth — Calico / Cilium / eBPF
14 min read

K8s Advanced #1: CNI in Depth — Calico / Cilium / eBPF

The first post in the K8s Advanced series. In [Intermediate #7](/en/posts/k8s-intermediate-7), one line was left while covering NetworkPolicy: "the manifest is K8s standard, but actually blocking traffic is the CNI plugin's job." This post unfolds that one line. What CNI is, how the same K8s manifest runs differently on Calico vs Cilium, and how eBPF redraws the data plane — all in one cycle.

K8s Intermediate #7: RBAC / NetworkPolicy / ResourceQuota — Security and Resource Policy
22 min read

K8s Intermediate #7: RBAC / NetworkPolicy / ResourceQuota — Security and Resource Policy

The final post in the K8s Intermediate series. Through [#6](/en/posts/k8s-intermediate-6) we covered the workload operations model — controllers, persistent data, external entry points, resource model, health checks, autoscaling. This post covers the three objects `RBAC`, `NetworkPolicy`, and `ResourceQuota` that fill the last gap of multi-tenant operation, where multiple teams and environments share one cluster. The three dimensions of who can create objects, what traffic flows, and how much can be made are all bundled as namespace-level policy, and the real value of Namespace briefly noted in [Basics #7](/en/posts/k8s-basics-7) is unfolded by these three objects. Since this is the last post in the series, a 7-post retrospective and a preview of the next track (K8s Advanced) is also included.

K8s Intermediate #6: Autoscaling — HPA / VPA / Cluster Autoscaler
22 min read

K8s Intermediate #6: Autoscaling — HPA / VPA / Cluster Autoscaler

The model covered through [#5](/en/posts/k8s-intermediate-5) was at the dimension of a single Pod's resources and health signals. But operational load swings with time, user patterns, and events, and having a person manually adjust `replicas` each time quickly hits a wall. This post walks through the three dimensions of autoscaling that fill that gap — `HPA` which auto-scales Pod count, `VPA` which auto-recommends and adjusts a Pod's resource requests/limits, and `Cluster Autoscaler` which auto-adds and removes nodes themselves — in one cycle. The metrics-server precondition, HPA's `autoscaling/v2` manifest and algorithm, the asymmetric `behavior` of scale up/down, custom metrics and KEDA, VPA's three components, HPA/VPA conflict, Karpenter — all included.

K8s Intermediate #5: Health Checks — liveness / readiness / startup probes
20 min read

K8s Intermediate #5: Health Checks — liveness / readiness / startup probes

If [#4](/en/posts/k8s-intermediate-4) covered the Pod's resource model, this post covers the model of how K8s judges whether a container is "alive" and "ready to take traffic." Three kinds of probes — liveness, readiness, startup — each play a different role, and misconfiguring them leads directly to operational incidents like infinite restart loops, traffic misses, and startup failure. This post walks through `httpGet` / `tcpSocket` / `exec` check methods, common parameters like `initialDelaySeconds` / `periodSeconds` / `failureThreshold`, the cascading failure that happens when external dependencies are put into liveness, and the graceful shutdown drawn by `terminationGracePeriodSeconds` and the PreStop hook — all in one cycle.

K8s Intermediate #4: resources.requests / limits — Pod Resource Requests and Limits
17 min read

K8s Intermediate #4: resources.requests / limits — Pod Resource Requests and Limits

[#3](/en/posts/k8s-intermediate-3) covered the path of external traffic into the cluster. This post moves the viewpoint back inside the Pod — the model of how a container requests and is limited on CPU and memory. `resources.requests` is what the scheduler sees when picking a node; `resources.limits` is the runtime cap kubelet enforces. This post walks through the separation of the two, QoS classes (Guaranteed / Burstable / BestEffort), the difference between CPU throttling and OOMKilled, JVM/Go runtime cgroup awareness, and the pattern of setting namespace defaults via `LimitRange` — all in one cycle.

K8s Intermediate #3: Ingress and Ingress Controller — The External Entry Point
18 min read

K8s Intermediate #3: Ingress and Ingress Controller — The External Entry Point

[K8s Basics #5](/en/posts/k8s-basics-5) covered LoadBalancer as the standard external entry point, but when dozens of Services need external exposure, spinning up one cloud LoadBalancer per Service quickly inflates cost and management overhead. Routing by domain or path also can't be solved with a single LoadBalancer. This post follows the object that gathers that burden in one place — `Ingress` — and the Ingress Controller (nginx / Traefik / GKE Ingress / AWS ALB Controller) that turns those manifests into actual traffic, walking through the two-layer model, host/path-based routing, `pathType`, TLS termination, and `IngressClass` in one cycle.

K8s Intermediate #2: PV / PVC / StorageClass — The Persistent Data Model
18 min read

K8s Intermediate #2: PV / PVC / StorageClass — The Persistent Data Model

Through [K8s Basics #6](/en/posts/k8s-basics-6) we pulled config and secrets out of the manifest into external objects, but one dimension remains — the data itself. The container filesystem disappears with the container, but DB data, user uploads, and metric time series have to outlive the Pod. This post fills that gap with the triangle of PersistentVolume, PersistentVolumeClaim, and StorageClass — static and dynamic provisioning, accessModes, reclaimPolicy, volumeBindingMode, and what StatefulSet's volumeClaimTemplates from [#1](/en/posts/k8s-intermediate-1) actually produces on top of all this.

K8s Intermediate #1: StatefulSet / DaemonSet / Job / CronJob — Controllers Beyond Deployment
16 min read

K8s Intermediate #1: StatefulSet / DaemonSet / Job / CronJob — Controllers Beyond Deployment

The [Deployment](/en/posts/k8s-basics-4) from K8s Basics #4 sits on a stateless model — multiple identical Pods that come back the same way when they die. But databases that need identity and disks, agents that need exactly one per node, migrations that should run once, daily backups — none of these fit Deployment. This post covers the four controllers that fill those gaps in one pass: StatefulSet, DaemonSet, Job, CronJob.