All posts
AI Agent Development #3: Planning and Self-Correction
To hand an agent multi-step work, it needs rules of behavior and a plan. We cover system prompt design, getting the agent to plan first, mid-task verification and retries, and tuning thinking depth with adaptive thinking.
Gin Basics #6 Database Integration (GORM)
Attach GORM and build a CRUD API that handles real data. Covers everything from model definition to create, read, update, and delete.
Hardware Advanced #2: eBPF Observability — Seeing the Tail the Average Hides
eBPF is a technology for tracing system events directly with small programs that run safely inside the kernel. This post covers reading the latency distributions and tails that averages hide with biolatency and runqlat, a map of the BCC tools, and the overhead caveats for production use.
AI Agent Development #2: Designing Good Tools
Most of the quality gap between agents comes from their tools. We cover the principles of tool design: the description the model reads, schema design, error messages, and classifying dangerous tools with a confirmation step.
Gin Basics #5 Middleware
How to bundle common processing applied across many handlers — logging, recovery, authentication — into middleware.
Hardware Advanced #1: CPU Microarchitecture and perf — Why the Same 100% Isn't the Same
Two CPUs can both read 100% utilization while getting very different amounts of work done. This post uses IPC, cache misses, and branch mispredictions to read the microarchitecture behind the utilization number, and shows how to tell memory stalls from genuine compute saturation in perf stat output.
How Does Google Maps Know Where Traffic Is? The Secret Behind Real-Time Traffic Data
Google Maps and Waze reroute you around a crash before it even makes the radio traffic report — and the secret is the cars themselves, each one reporting its speed. Probe data, segment speed averages, how ETAs are predicted, routing by time instead of distance, and why sending everyone down the same detour backfires, explained for non-developers.
AI Agent Development #1: Building a Robust Agent Loop
Take the minimal agent loop from LLM App Development up to production level. Handle every stop_reason, return tool errors as results, plus retries and logging. The starting point of this series.
Gin Basics #4 Responses — JSON, Status Codes, Errors
Covers response formats beyond JSON, how to handle status codes, and how to build consistent error responses.
Hardware Intermediate #9: Hands-On: Diagnosing a Slow Server — Series Finale
A diagnostic walkthrough that starts from a "the service is slow" report and narrows down through the four resources one by one. Define the symptom, check each resource, confirm the hypothesis, apply a fix, and re-measure. We close the Hardware Intermediate series with the principles of tuning.
How Do Google Translate and DeepL Work? Three Generations of Machine Translation
Point your phone camera at a foreign menu and the words change into your language right on the screen. How Google Translate and DeepL pull this off — the rule-based, statistical, and neural generations of machine translation, what camera and voice translation really are, and why translators still get things wrong, explained for non-developers.
LLM App Development #13: A Real-World Project — Internal Document Q&A Bot
Bring all the pieces together and build a Q&A bot that answers from internal documents, start to finish. A finale combining RAG, streaming, grounding prompts, and conversation memory.