Hardware

Thursday, June 18, 2026 8 min read

Hardware Advanced #7: Firmware, BMC, and the Lifecycle — The Other Computer Inside Your Server

A look at the BMC, the management computer that stays on independently of the main CPU. It covers remote console and power control, IPMI and Redfish, the firmware stack and update operations, failure prediction with SMART and ECC counters, management-network security, and the lifecycle from warranty expiry to disk disposal — closing out the Hardware Advanced series.

Infrastructure Hardware

Wednesday, June 17, 2026 9 min read

Hardware Advanced #6: Data Center Cooling and Racks — Electricity Always Becomes Heat

Nearly all the power that enters a server comes back out as heat. Starting from the basic airflow contract of front intake and rear exhaust, this post maps out data center cooling end to end: hot/cold aisle containment, rack density and the limits of air cooling, liquid cooling with D2C and immersion, and how ASHRAE temperature guidelines tie into PUE.

Infrastructure Hardware

Tuesday, June 16, 2026 9 min read

Hardware Advanced #5: Datacenter Power — The Real Reason You Can't Rack More Servers

Even with empty slots in the rack, new servers get rejected — because of the power budget. This post walks the power environment a server lives in, from an operator's point of view: PSU redundancy and A/B feeds, per-rack kW contracts, PDUs and UPS, generators and ATS, PUE, and the power density that GPU servers have driven up.

Infrastructure Hardware

Monday, June 15, 2026 9 min read

Hardware Advanced #4: ZFS Deep Dive — When RAID and the Filesystem Become One

ZFS merged RAID, volume management, and the filesystem into a single layer, solving the structural problems of the traditional stack. This post walks through it all from an operations point of view: copy-on-write that eliminates the write hole, checksums that verify every read with self-healing, resilver that copies only live data, RAIDZ and the ARC, snapshots with send/recv, and lz4 compression.

Infrastructure Hardware

Sunday, June 14, 2026 9 min read

Hardware Advanced #3: Memory Deep Dive — Page Cache, THP, and Bandwidth

A tour inside the kernel memory machinery: the read and write paths through the page cache, the latency spikes THP creates, explicit hugepages and the TLB, how swappiness is actually implemented along with zswap, and the memory bandwidth bottleneck that keeps throughput flat even when cores sit idle.

Infrastructure Hardware

Saturday, June 13, 2026 8 min read

Hardware Advanced #2: eBPF Observability — Seeing the Tail the Average Hides

eBPF is a technology for tracing system events directly with small programs that run safely inside the kernel. This post covers reading the latency distributions and tails that averages hide with biolatency and runqlat, a map of the BCC tools, and the overhead caveats for production use.

Infrastructure Hardware

Friday, June 12, 2026 7 min read

Hardware Advanced #1: CPU Microarchitecture and perf — Why the Same 100% Isn't the Same

Two CPUs can both read 100% utilization while getting very different amounts of work done. This post uses IPC, cache misses, and branch mispredictions to read the microarchitecture behind the utilization number, and shows how to tell memory stalls from genuine compute saturation in perf stat output.

Infrastructure Hardware

Thursday, June 11, 2026 6 min read

Hardware Intermediate #9: Hands-On: Diagnosing a Slow Server — Series Finale

A diagnostic walkthrough that starts from a "the service is slow" report and narrows down through the four resources one by one. Define the symptom, check each resource, confirm the hypothesis, apply a fix, and re-measure. We close the Hardware Intermediate series with the principles of tuning.

Infrastructure Hardware

Wednesday, June 10, 2026 6 min read

Hardware Intermediate #8: GPUs and Accelerators — The Fifth Resource of the AI Era

The bottleneck of AI workloads often lies beyond the four resources. How a GPU works differently from a CPU, the VRAM and HBM that determine model capacity, reading nvidia-smi, and sharing a GPU with passthrough, vGPU, and MIG — all from an operator's perspective.

Infrastructure Hardware

Tuesday, June 9, 2026 6 min read

Hardware Intermediate #7: Storage Networking — iSCSI, FC, NVMe-oF, Multipath

Once the disk leaves the server, storage becomes a network problem. The trade-offs between iSCSI and FC, NVMe-oF for the NVMe era, multipath operations that take charge of path redundancy, and the connection to cloud block storage.

Infrastructure Hardware

Monday, June 8, 2026 6 min read

Hardware Intermediate #6: RAID in Operation — Rebuild, Scrub, and Backups

The real test of RAID begins after a disk dies. Why the rebuild is the most dangerous window, the URE problem that makes RAID5 risky in the era of large disks, what hot spares and scrubs do, the write cache and its battery, and why RAID is not a backup.

Infrastructure Hardware

Sunday, June 7, 2026 6 min read

Hardware Intermediate #5: Measuring Storage Performance — fio, Queue Depth, Inside SSDs

Catalog IOPS only makes sense under specific conditions. How to measure under the conditions of your own workload with fio, the trade-off between queue depth and latency, and the internals — write amplification and TRIM — that make the same SSD perform differently today than it did yesterday.