All posts

Advanced RAG #2: Chunking Strategies That Decide Retrieval Quality
5 min read

Advanced RAG #2: Chunking Strategies That Decide Retrieval Quality

The root of retrieval failures often lies not in retrieval itself but in the step before it: chunking. We cover the limits of fixed-size splitting, structure-based chunking, handling tables and code, metadata, and parent-child chunking that searches small and feeds large.

Advanced RAG #1: Start by Finding Where RAG Goes Wrong
5 min read

Advanced RAG #1: Start by Finding Where RAG Goes Wrong

When RAG gives a strange answer, blindly tweaking the prompt is gambling. We start by splitting failures into retrieval failures and generation failures, then build a golden set and a baseline so every improvement becomes measurable.

Hardware Advanced #7: Firmware, BMC, and the Lifecycle — The Other Computer Inside Your Server
8 min read

Hardware Advanced #7: Firmware, BMC, and the Lifecycle — The Other Computer Inside Your Server

A look at the BMC, the management computer that stays on independently of the main CPU. It covers remote console and power control, IPMI and Redfish, the firmware stack and update operations, failure prediction with SMART and ECC counters, management-network security, and the lifecycle from warranty expiry to disk disposal — closing out the Hardware Advanced series.

AI Agent Development #7: Capstone Project — An Issue Triage Agent
6 min read

AI Agent Development #7: Capstone Project — An Issue Triage Agent

Tie together every piece from the series and finish a triage agent that classifies GitHub issues and proposes labels and replies. We cover read tools, write tools behind an approval gate, and evaluation with a golden set.

Hardware Advanced #6: Data Center Cooling and Racks — Electricity Always Becomes Heat
9 min read

Hardware Advanced #6: Data Center Cooling and Racks — Electricity Always Becomes Heat

Nearly all the power that enters a server comes back out as heat. Starting from the basic airflow contract of front intake and rear exhaust, this post maps out data center cooling end to end: hot/cold aisle containment, rack density and the limits of air cooling, liquid cooling with D2C and immersion, and how ASHRAE temperature guidelines tie into PUE.

AI Agent Development #6: Building Your Own MCP Server
5 min read

AI Agent Development #6: Building Your Own MCP Server

In Part 11 of LLM App Development we connected to MCP servers someone else built. This time we build our own tools as an MCP server. We cover writing a server with FastMCP, wiring it into our agent loop, and the criteria for splitting tools out into a server.

Hardware Advanced #5: Datacenter Power — The Real Reason You Can't Rack More Servers
9 min read

Hardware Advanced #5: Datacenter Power — The Real Reason You Can't Rack More Servers

Even with empty slots in the rack, new servers get rejected — because of the power budget. This post walks the power environment a server lives in, from an operator's point of view: PSU redundancy and A/B feeds, per-rack kW contracts, PDUs and UPS, generators and ATS, PUE, and the power density that GPU servers have driven up.

AI Agent Development #5: Dividing Work with Subagents
5 min read

AI Agent Development #5: Dividing Work with Subagents

When one agent does everything, both its context and its responsibilities bloat. We cover why you delegate work to subagents, a delegate tool, the orchestrator-worker pattern with parallel execution, and rules to keep delegation from going too far.

Hardware Advanced #4: ZFS Deep Dive — When RAID and the Filesystem Become One
9 min read

Hardware Advanced #4: ZFS Deep Dive — When RAID and the Filesystem Become One

ZFS merged RAID, volume management, and the filesystem into a single layer, solving the structural problems of the traditional stack. This post walks through it all from an operations point of view: copy-on-write that eliminates the write hole, checksums that verify every read with self-healing, resilver that copies only live data, RAIDZ and the ARC, snapshots with send/recv, and lz4 compression.

AI Agent Development #4: Context Management for Long-Running Work
6 min read

AI Agent Development #4: Context Management for Long-Running Work

The longer an agent runs, the closer its conversation grows to the context limit. We cover techniques for surviving long-running work: capping tool results, clearing old results, summary compression and server-side compaction, and a file-based scratchpad.

Gin Basics #7 Project Structure and a Mini REST API
5 min read

Gin Basics #7 Project Structure and a Mini REST API

Split the code accumulated in one file into layers, separate out configuration, and finish the series with a mini REST API.

Hardware Advanced #3: Memory Deep Dive — Page Cache, THP, and Bandwidth
9 min read

Hardware Advanced #3: Memory Deep Dive — Page Cache, THP, and Bandwidth

A tour inside the kernel memory machinery: the read and write paths through the page cache, the latency spikes THP creates, explicit hugepages and the TLB, how swappiness is actually implemented along with zswap, and the memory bandwidth bottleneck that keeps throughput flat even when cores sit idle.