AI

Wednesday, July 1, 2026 5 min read

LLM App Operations #7: Capstone — Taking the Document Q&A Bot to Production

We tie the five pillars of this series into an operations checklist and apply it to the document Q&A bot. Turning on instrumentation, routing, caching, batching, reliability, and security one by one, we watch how per-request cost and stability change, and close out the AI track that spans four series.

AI LLM Claude

Tuesday, June 30, 2026 6 min read

LLM App Operations #6: Security — Prompt Injection and Data Boundaries

Prompt injection is an attempt to change an app's behavior through input text, and in the era of RAG and agents it rides in through documents and tool results. We cover layered defenses instead of a single line, minimizing tool permissions, output validation, and the data boundaries of logging.

AI LLM Claude

Monday, June 29, 2026 6 min read

LLM App Operations #5: Reliability — Rate Limits, Retries, Fallbacks

429 and 529 are not outages, they are daily life. We build a structure that keeps running: how rate limits work (RPM and token limits), retries that respect retry-after, timeouts and streaming, and fallbacks for when nothing else works (model downgrade, queuing, graceful failure).

AI LLM Claude

Sunday, June 28, 2026 5 min read

LLM App Operations #4: Batching — Half Price for Non-Urgent Work

Are you still sending work that does not need an immediate answer through the real-time API? The Batches API processes bulk requests asynchronously in exchange for a 50% discount on every token. We cover picking batch-worthy work, submitting and collecting, and operational patterns.

AI LLM Claude

Saturday, June 27, 2026 6 min read

LLM App Operations #3: Prompt Caching in Practice

Caching the system prompt and tool definitions that repeat on every request cuts the input cost of that portion to one tenth. We cover the prefix-matching principle, stable prefix design, cache_control placement, and an audit for silent cache invalidation.

AI LLM Claude

Friday, June 26, 2026 6 min read

LLM App Operations #2: Cost — Token Accounting and Model Routing

The biggest lever for cutting cost is model choice. Measuring with count_tokens before sending, putting output on a diet, designing model routing by task difficulty, and tuning effort. We cover the order of operations for lowering cost while protecting quality.

AI LLM Claude

Thursday, June 25, 2026 5 min read

LLM App Operations #1: Between Demo and Production — A Map of Operations

An LLM app that works and an LLM app you can operate are different things. We draw a map of operations along five axes — cost, latency, reliability, quality, and security — and build per-request instrumentation, the starting point for everything.

AI LLM Claude

Wednesday, June 24, 2026 6 min read

Advanced RAG #7: Capstone Project — Upgrading the Document Q&A Bot

Upgrade the internal document Q&A bot from LLM App Development Part 13 step by step with the techniques from this series. From measuring the baseline through swapping chunking, hybrid search, reranking, and citations, we watch how the metrics move at every step.

AI LLM Claude

Tuesday, June 23, 2026 6 min read

Advanced RAG #6: Building a RAG Evaluation Pipeline

We grow Part 1's baseline into a full evaluation system. Retrieval is scored with recall@k and MRR, generation with an LLM judge, and a single evaluation script that also measures hallucination rate runs as the regression test for every change.

AI LLM Claude

Monday, June 22, 2026 5 min read

Advanced RAG #5: Reducing Hallucinations with Citations

We tackle generation failures, where answers go wrong even though the right chunks were provided. We implement prompts that keep answers inside the evidence, the right to answer "I do not know," and per-sentence source attribution with the citations feature in Claude.

AI LLM Claude

Sunday, June 21, 2026 6 min read

Advanced RAG #4: Query Transformation and Reranking

User questions are not good search queries. We reinforce both ends of retrieval: query rewriting that folds in conversation context, multi-query that asks from several angles, and reranking that precisely narrows a wide pool of candidates.

AI LLM Claude

Saturday, June 20, 2026 2 min read

Advanced RAG #3: Hybrid Search — Combining Vectors and Keywords

Semantic search is weak on product codes and proper nouns, and keyword search is weak on synonyms. We build BM25 keyword search and fuse it with vector search via RRF, implementing hybrid search where each side covers the weaknesses of the other.