#Infrastructure

300 posts

Kubernetes and Cloud Native Associate (KCNA) #6: Cloud Native Observability (8%) — Telemetry, Prometheus, Cost Management
11 min read

Kubernetes and Cloud Native Associate (KCNA) #6: Cloud Native Observability (8%) — Telemetry, Prometheus, Cost Management

The three pillars of telemetry (metrics, logs, traces), Prometheus pull-based metric collection with PromQL, Alertmanager, and Grafana, OpenTelemetry and distributed tracing, SLI/SLO/SLA and the golden signals, and FinOps cost management — a walk through KCNA Domain 4.

Red Hat Certified Engineer (RHCE) #14 RHCSA Automation 1: Users/Groups, Packages/Repositories
9 min read

Red Hat Certified Engineer (RHCE) #14 RHCSA Automation 1: Users/Groups, Packages/Repositories

The fourteenth post in the Red Hat Certified Engineer (RHCE) series. We automate the user/group creation and package/repository management you did by hand in RHCSA with Ansible modules. We work through the user and group modules, passwords handled safely with password_hash and Vault, the dnf module and module streams, the yum_repository module, and the exam-favorite pattern of creating many users at once with loop.

Red Hat Certified System Administrator (RHCSA) #11 Users/Groups: UID/GID, sudo, ACL, password policy
11 min read

Red Hat Certified System Administrator (RHCSA) #11 Users/Groups: UID/GID, sudo, ACL, password policy

The eleventh post in the Red Hat Certified System Administrator (RHCSA) series. We organize it around the exact tasks RHCSA puts on the practical exam: creating users with useradd and usermod and assigning UID/GID, groupadd and supplementary groups, granting sudo rights through /etc/sudoers and visudo, setting per-file ACLs with setfacl, and pinning down password expiry policy with chage.

AWS Certified CloudOps Engineer - Associate (SOA-C03) #6 Domain 2-2 Reliability — Backup, Restore, and Disaster Recovery (DR)
6 min read

AWS Certified CloudOps Engineer - Associate (SOA-C03) #6 Domain 2-2 Reliability — Backup, Restore, and Disaster Recovery (DR)

The sixth post of the SOA-C03 series covers data protection, the second axis of the reliability domain. It covers EBS snapshots and AMIs, RDS automated backups and snapshots, how to centrally manage backup policies with AWS Backup, the meaning of RPO and RTO, and the DR strategies that progress from backup to pilot light to warm standby to multi-site.

AWS Certified Developer - Associate (DVA-C02) #12 Domain 4-1 Troubleshooting and Optimization — Observability
4 min read

AWS Certified Developer - Associate (DVA-C02) #12 Domain 4-1 Troubleshooting and Optimization — Observability

The first post of the DVA-C02 troubleshooting domain. It covers, at the exam level, CloudWatch Logs (log groups,streams,Logs Insights) and Metrics (standard,custom,high-resolution), Alarms, X-Ray distributed tracing (segments,subsegments,service map,sampling), and how to extract metrics from logs with EMF (Embedded Metric Format). The key is the tools that trace failures and narrow down the cause.

Certified Kubernetes Administrator (CKA) #21 Helm and Kustomize: Managing Manifests
9 min read

Certified Kubernetes Administrator (CKA) #21 Helm and Kustomize: Managing Manifests

The twenty-first post in the Certified Kubernetes Administrator (CKA) series. We learn the two tools for managing manifests — Helm and Kustomize — with a focus on operational commands. Helm covers repo add/update, install/upgrade/rollback, value injection, and template rendering; Kustomize covers the base/overlays structure, patchesStrategicMerge, configMapGenerator, and kubectl apply -k. We lay out the difference between the two (template vs. overlay) in a table and pin down the CKA exam points.

Certified Kubernetes Application Developer (CKAD) #16 Resource Management: requests/limits, QoS Class, LimitRange
8 min read

Certified Kubernetes Application Developer (CKAD) #16 Resource Management: requests/limits, QoS Class, LimitRange

The sixteenth post in the Certified Kubernetes Application Developer (CKAD) series. It nails down requests and limits — which decide how much a Pod asks for and how much it may use — right down to the units, and shows how CPU throttling and memory OOMKilled diverge. We also work through the three QoS classes and eviction priority, plus LimitRange that enforces namespace defaults and ResourceQuota that caps the total, all with YAML examples.

Certified Kubernetes Security Specialist (CKS) #14: Image scan — Trivy, Kubesec, KubeLinter
9 min read

Certified Kubernetes Security Specialist (CKS) #14: Image scan — Trivy, Kubesec, KubeLinter

The fourteenth post in the Certified Kubernetes Security Specialist (CKS) series. We cover image vulnerability scanning, the heart of supply chain security. We compare in a table the role differences between Trivy — its image/filesystem/repo scans that find CVEs embedded in a container image's OS packages and language libraries, plus severity filtering and exit-code-based CI gates — Kubesec, which scores a manifest's securityContext settings, and KubeLinter, which statically analyzes manifests to catch anti-patterns. We also walk through, with command examples, the exam staple of finding and replacing an image that has a vulnerability of a given severity.

Kubernetes and Cloud Native Associate (KCNA) #5: Cloud Native Architecture (16%) — Autoscaling, Serverless, Community, Open Standards
13 min read

Kubernetes and Cloud Native Associate (KCNA) #5: Cloud Native Architecture (16%) — Autoscaling, Serverless, Community, Open Standards

The fifth post in the KCNA series. It walks through cloud native design philosophy (the CNCF definition, self-healing, resilience), autoscaling (HPA, VPA, Cluster Autoscaler, KEDA), serverless (Knative, FaaS), the CNCF community and project maturity levels, open standards (OCI, CRI, CNI, CSI, OpenTelemetry), and finishes with zero-downtime rollouts and immutable infrastructure.

Red Hat Certified Engineer (RHCE) #13: System roles (rhel-system-roles)
8 min read

Red Hat Certified Engineer (RHCE) #13: System roles (rhel-system-roles)

The 13th post in the Red Hat Certified Engineer (RHCE) series. We cover how rhel-system-roles — a set of validated roles Red Hat ships — abstracts away RHCSA tasks. We walk through installation (dnf and ansible-galaxy collection), where the docs live (/usr/share/doc/rhel-system-roles) and the example-playbook copy pattern, the timesync/firewall/selinux/storage/network/postfix roles and their variables, and the exam regulars — automating NTP, firewall, and SELinux — with worked examples.

Red Hat Certified System Administrator (RHCSA) #10 Basic Networking: NetworkManager (nmcli), hostname, /etc/hosts
9 min read

Red Hat Certified System Administrator (RHCSA) #10 Basic Networking: NetworkManager (nmcli), hostname, /etc/hosts

The tenth post in the Red Hat Certified System Administrator (RHCSA) series. We cover how NetworkManager manages networking on RHEL 9, how to create connections and set a static IP permanently with nmcli, how to change the hostname with hostnamectl and resolve names through /etc/hosts, and how to verify the result with the ip command — all typed out by hand. Setting a static IP so it survives a reboot is a perennial RHCSA exam task.

AWS Certified CloudOps Engineer - Associate (SOA-C03) #5 Domain 2-1 Reliability: Multi-AZ, Auto Scaling, and ELB Health Checks
5 min read

AWS Certified CloudOps Engineer - Associate (SOA-C03) #5 Domain 2-1 Reliability: Multi-AZ, Auto Scaling, and ELB Health Checks

The fifth post of the SOA-C03 series covers availability operations, the first topic of the Reliability domain (22%). It covers redundancy across Availability Zones, Auto Scaling group capacity, policies, and lifecycle hooks, health checks and connection draining per ELB type, and Route 53 health-check-based failover.