#Kubernetes
136 posts

Observability
We organize the three axes that give a production cluster visibility — metrics (Prometheus + kube-state-metrics + node-exporter), logs (Loki), and traces (OpenTelemetry + Tempo) — together with the standard visualization stack (Grafana) and alerting (Alertmanager). We cover the ServiceMonitor · PrometheusRule pieces of kube-prometheus-stack, examples of PromQL · LogQL, and the operational guardrails of cardinality · retention · alert SNR · golden signals.

Operations Checklist
The last chapter of Part 4 (EKS in Production). Standing up a cluster reliably and operating it safely over a year are different kinds of work. We organize the EKS minor upgrade cycle, the node-group replacement pattern, RDS PITR and quarterly recovery drills, the path of taming cost with Karpenter + Spot, and the flow of regularizing security checks with kube-bench · Trivy · Kyverno. Finally, we bring together a retrospective on the 6 chapters of Part 4 (Chapters 21 ~ 26) and the 26 chapters of Parts 1 ~ 4.

RBAC / NetworkPolicy / ResourceQuota
A walkthrough of the three policy objects that create isolation for multi-tenant operations where several teams · environments live together in one cluster. RBAC's Role · ClusterRole · ServiceAccount · RoleBinding model, NetworkPolicy's default-deny pattern and CNI dependency, and the pairing of ResourceQuota and LimitRange — all in one chapter, closing Part 2.

RBAC / ServiceAccount in Depth
On top of the basics of Chapter 14's RBAC, we add another layer of depth you meet in a production cluster. We organize Aggregated ClusterRole that merges ClusterRoles by label, Impersonation that calls with another subject's permissions, the flow by which a ServiceAccount token moved from a permanent Secret to a projected token with expiry · audience · rotation, and the model that ties a Kubernetes ServiceAccount to cloud IAM via EKS's IRSA · GKE's Workload Identity.

Secret Operations
The third chapter of Part 5. Starting from the base64 limit of a K8s Secret and the meaning of etcd encryption-at-rest, it covers the secret lifecycle along the four axes of storage · rotation · injection · audit. It turns the comparison of sealed-secrets · external-secrets · SOPS, the zero-password operation combined with IRSA (IRSA for the AWS API, RDS IAM auth for the DB), the rotation difference of envFrom vs mount, separation per namespace with RBAC, and the audit viewpoint of the Audit log and GuardDuty into a practical operations manual.

The CRD and Operator Pattern
We cover the two axes of extending the K8s API into objects of your own domain. You define a new object kind with a CustomResourceDefinition, and a controller-runtime-based Operator hangs the reconcile loop from Chapter 1 over that object, extending K8s's declarative model all the way to your domain. We organize the three standard patterns of ownerReference · finalizer · status subresource and the build tools Kubebuilder · Operator SDK.

Upgrade Strategy
The last chapter of Part 5. An operations manual for safely keeping up with Kubernetes minor releases (14 months of support). It covers the order control plane → data plane (nodes) → add-ons, deprecated API detection (pluto · kubent · apiserver metric), the API-version migration of manifests / Helm / Operator CRs, the node group / Karpenter NodePool drift flow of EKS, the safety devices of node drain (PDB · terminationGracePeriodSeconds), minimizing the blast radius, rollback scenarios, choosing a backup per RPO / RTO, and the checklist for the week before, the day of, and the week after the upgrade.

Autoscaling
A walkthrough of the three dimensions of automatic adjustment that absorb a production cluster's load swings without human intervention. The roles of HPA (Pod count) · VPA (Pod resources) · Cluster Autoscaler (node count), the metrics-server prerequisite, HPA's autoscaling/v2 manifest and proportional algorithm, the scale-up · scale-down asymmetry, custom metrics and KEDA, VPA's updateMode and the HPA · VPA conflict, and Karpenter.
Certified Kubernetes Administrator (CKA) #10 Workloads 1: Deployment in Depth, ReplicaSet, Rolling Update and Rollback
The tenth post in the Certified Kubernetes Administrator (CKA) series. We look deep into the Deployment, the workload an operator handles most often. We walk through the Deployment→ReplicaSet→Pod hierarchy and the label selector that binds them, how to create and scale with kubectl, the conditions under which the rollingUpdate strategy (maxSurge/maxUnavailable) guarantees a zero-downtime update, and the rollback that lets you track versions and revert with kubectl rollout — all drilled until they are second nature.
Certified Kubernetes Application Developer (CKAD) #5 Workloads 1: Deployment, ReplicaSet, Rolling Update, and Rollback
The fifth post in the Certified Kubernetes Application Developer (CKAD) series. We create a Deployment imperatively—the heart of app delivery—and lay out the relationship and scaling of Deployment, ReplicaSet, and Pod. We will get hands-on with the meaning of rollingUpdate's maxSurge and maxUnavailable, the flow of shipping a new version with kubectl set image, and the rollback scenario of tracking state with kubectl rollout and reverting a failed version with undo.
Certified Kubernetes Security Specialist (CKS) #3: CIS benchmark (kube-bench), component security, Ingress TLS, binary verification
The third post in the Certified Kubernetes Security Specialist (CKS) series. It covers the remaining half of the Cluster Setup domain — hardening the cluster itself. We get hands-on, with commands and manifests, on what the CIS Kubernetes benchmark is, how to inspect the control plane and nodes with kube-bench and read the PASS/FAIL/WARN results and apply remediation, the procedure for changing dangerous apiserver and kubelet flags to safe values, how to attach TLS to an Ingress, and the flow for verifying a downloaded binary with sha256sum.

ConfigMap and Secret
Separate config and passwords from the manifest with ConfigMap and Secret. This is how Kubernetes solves 12-factor's "store config in the environment" principle, the three injection methods env · envFrom · volume, the fact that a Secret's base64 is not encryption, and why a Pod restart is needed when config changes.