Advanced RAG #2: Chunking Strategies That Decide Retrieval Quality

5 min read

In Part 1 we built a baseline. Now we fix retrieval failures. But the root of a retrieval failure often lies not in the retrieval algorithm but in the step before it: how the documents were split — that is, chunking, the act of cutting documents into the pieces that become the unit of search. No matter how good the retrieval is, it cannot bring back good results if the pieces themselves are bad.

The limits of fixed-size splitting #

In LLM App Development Part 8 we used fixed-size splitting: cut every 300 characters with a 50-character overlap. It is fine as a starting point, but its limits are clear. It cuts without regard for the document’s units of meaning.

For example, in a refund policy document, if the heading of the “Refund fees” section and the fee rate table end up in different chunks, neither chunk can answer the question on its own. Retrieval fetches those half-formed chunks, and the diagnosis from Part 1 rules “the answer is not in the chunk.” This is the typical way a chunking failure shows up as a retrieval failure.

Structure-based chunking — cutting along the document’s units #

The basic direction of improvement is to cut along the document’s structure rather than by character count. For Markdown, headings are the natural boundary; for plain documents, paragraphs.

structural_chunking.py
import re

def chunk_by_heading(markdown: str, max_chars: int = 1500) -> list:
    """Split by heading; only re-split overly large sections by paragraph."""
    sections = re.split(r'(?=^#{1,3} )', markdown, flags=re.M)
    chunks = []
    for sec in sections:
        sec = sec.strip()
        if not sec:
            continue
        if len(sec) <= max_chars:
            chunks.append(sec)
        else:
            for para in split_by_paragraph(sec, max_chars):
                chunks.append(para)
    return chunks

The key is the mapping “one section, one chunk.” The heading, the body, and the table live together in one chunk, so that single chunk can answer the question. Chunk sizes become uneven, and that is fine. Intact meaning matters far more to retrieval quality than uniform size.

The size cap (max_chars) is set by watching two things. Too large, and multiple topics mix in one chunk and blur the embedding; too small, and context breaks. Use “the unit that answers one question” in your documents as your reference for how long that typically runs, and tune by comparing before and after with the golden set from Part 1.

Tables and code stay whole #

The biggest victims of fixed-size splitting are tables and code blocks. When a table is cut in the middle, the header and the values are separated, and both chunks become useless. The rule is simple. Do not cut tables and code blocks — keep each whole in a single chunk. When structure-based chunking splits by paragraph, simply never treat the inside of a table or a code fence as a split boundary.

If a table is so large it exceeds the cap, instead of cutting it into two chunks, duplicate the header row into each chunk. Whichever chunk gets retrieved, the column names and values arrive together.

Metadata — writing each chunk’s origin on it #

Along with the body, store each chunk’s origin: which section of which document it came from.

chunk_with_metadata.py
{
    "text": "The refund fee is 10% of the payment amount. ...",
    "metadata": {
        "source": "refund-policy.md",
        "section": "Refund fees",
        "updated": "2026-05-01",
    },
}

Metadata is useful in three ways. First, prefixing the section path to the body when embedding (“Refund policy > Refund fees: …”) raises retrieval quality for short chunks. Second, it serves as a filter at retrieval time — conditions like “latest documents only” or “HR policies only.” Third, it is used as-is when showing sources in the answer. We meet it again in the citations of Part 5.

Parent-child chunking — search small, feed large #

Retrieval and generation make different demands on chunk size. Retrieval is accurate with small chunks whose topic is sharp and singular, while generation produces good answers when given large chunks that carry the surrounding context. The way to resolve this tension is parent-child chunking.

  • Split the document into large units (parents, whole sections), then split each parent into small units (children, paragraphs).
  • Embed and search with the children, and when a child is retrieved, feed its parent into the context.
parent_child.py
def search_with_parent(question: str, top_k: int = 5) -> list:
    children = vector_search(question, top_k=top_k)   # search with small chunks
    parent_ids = {c.metadata["parent_id"] for c in children}
    return [parents[pid] for pid in parent_ids]        # return large chunks

The implementation burden grows a little, but it has a direct effect on the failure type “retrieval was right, but the chunk was too short to build an answer.” If the diagnosis in Part 1 showed that pattern, this is worth moving up the priority list.

Re-chunk, then re-measure #

Changing the chunking means the embeddings must be regenerated, so rebuild the entire index. And always compare before and after with the golden set from Part 1. Chunking changes have a large effect, and that effect cuts both ways. Some question groups improve while others get worse, so changing without numbers leaves you with nothing but the illusion of improvement.

Where people commonly trip up #

  • Obsessing over uniform size — chunks neatly cut every N characters look tidy but have their meaning severed. Even if sizes are uneven, units of meaning come first.
  • Throwing away the heading — if you store only the cut paragraphs, the referents of pronouns like “it” or “this policy” disappear. Include the section heading in the chunk, or prefix it via metadata.
  • Postponing the index rebuild — if you change only the chunking code and test against the old index, nothing changes. A chunking change always comes bundled with re-indexing.

Wrapping up #

In this post we covered chunking, the foundation of retrieval quality.

  • Fixed-size splitting severs units of meaning. Cut along document structure such as headings and paragraphs, and keep tables and code whole.
  • Attach metadata like source and section to each chunk, and use it for embedding enrichment, filters, and source display.
  • Small chunks favor retrieval; large chunks favor generation. Parent-child chunking captures both.

Now that the chunks are better, the next topic is how we search. Semantic search alone misses questions about product codes and proper nouns. In the next post, “Advanced RAG #3: Hybrid Search — Combining Vectors and Keywords,” we combine the two kinds of search.

X