Advanced RAG #5: Reducing Hallucinations with Citations
Through Part 4 we refined retrieval. Now it is time for the other half of the failures we split out in Part 1: generation failures. These are the cases where the answer is wrong even though the correct chunk was placed in the context — and, nastier still, the cases where the model plausibly invents content that is not in the chunks. The latter is called hallucination. In RAG, hallucination is a matter of trust: answer well ten times, and one plausible lie still brings the service down.
Keeping answers inside the evidence #
The first line of defense is the system prompt. The key is to make two things explicit: do not use knowledge from outside the evidence, and if the evidence does not contain the answer, say so.
SYSTEM = """You are a Q&A bot grounded in internal company documents.
Rules:
- Base every answer strictly on the content of the provided document chunks. Do not supplement with general knowledge.
- If the chunks do not contain the answer, do not guess; reply "I could not find that in the provided documents."
- If the chunks contradict each other, report the contradiction as is.
"""The second rule matters most. If “I don’t know” is not allowed, the model will fill the blank even if it has to make something up. Explicitly granting the right to say it does not know is half of hallucination suppression. The rule also has a useful side effect for diagnosis: if “could not find” answers increase, that is a signal to fix retrieval, not generation.
citations — attaching a source to every sentence #
You can prompt the model to “show your sources,” but that creates a paradox: the model may hallucinate the source labels themselves. Claude has a citations feature that solves this structurally. Pass the chunks in as document blocks instead of plain text and turn citations on, and the API returns structured data showing which passage of which document each sentence of the response is grounded in.
def answer_with_citations(question: str, chunks: list):
documents = [
{
"type": "document",
"source": {"type": "text", "media_type": "text/plain", "data": c["text"]},
"title": f'{c["metadata"]["source"]} — {c["metadata"]["section"]}',
"citations": {"enabled": True},
}
for c in chunks
]
response = client.messages.create(
model="claude-opus-4-8",
max_tokens=2048,
system=SYSTEM,
messages=[{"role": "user", "content": documents + [{"type": "text", "text": question}]}],
)
return responseThe metadata we attached to chunks in Part 2 is used here as the title. In the response, citation information comes along with each text block.
def render(response) -> str:
out = []
for block in response.content:
if block.type != "text":
continue
out.append(block.text)
for cite in (block.citations or []):
out.append(f" [Source: {cite.document_title} — \"{cite.cited_text[:50]}…\"]")
return "\n".join(out)cited_text is not a label the model made up but the actual passage taken verbatim from the document. For users it makes the answer verifiable; for operators it works as a hallucination detector. An assertive sentence with no citation attached is a suspect.
Using citations as a quality gate #
Citation data is not just for display. You can use it as a check before sending an answer out.
def is_grounded(response, min_ratio: float = 0.5) -> bool:
"""Treat the answer as suspect if the ratio of cited text falls below the threshold."""
cited, total = 0, 0
for block in response.content:
if block.type != "text":
continue
total += len(block.text)
if block.citations:
cited += len(block.text)
return total == 0 or cited / total >= min_ratioInstead of sending out an answer below the threshold as is, you can replace it with “I could not find enough relevant documents,” or add a branch that regenerates with a more conservative prompt. It is a blunt instrument, but for filtering out the most dangerous kind of answer — assertions with no grounding — even this much is effective.
A caution when there are many chunks #
Another cause of generation failure is the amount of chunks placed in the context. Stuff in low-relevance chunks and the model may find plausible (but wrong) grounding among them. The reranking from Part 4 matters here too. Fewer but accurate chunks beat many mediocre ones for generation quality. Turning citations on makes this problem observable as well: if irrelevant chunks keep getting cited, it is time to reduce retrieval candidates or raise the reranking bar.
Where people commonly trip up #
- Blocking “I don’t know” answers — an instruction to “always answer” is a hallucination factory. Leave the path open to say it does not know, and use the frequency of those answers as a signal for retrieval improvement.
- Relying on the prompt alone for source labels — a “[Source: …]” string written by the model can itself be a hallucinated source. When you need verifiable sources, use a structured feature like citations.
- Exposing citation data raw —
cited_textand position information are raw material. On the user-facing screen, present them in a readable form such as footnotes or a collapsible source list.
Wrapping up #
In this post we covered generation failures and hallucination.
- Banning outside knowledge and granting the right to say “I don’t know” in the system prompt is the basic line of defense.
- The citations feature returns the actual source passage for each sentence. It is a verification tool for users and a hallucination detector for operators.
- Using the citation ratio as a quality gate filters out ungrounded assertions before they go out.
The improvement tools are now all in hand. What remains is the foundation that lets you repeat all these changes with confidence: systematic evaluation. In the next post, “Advanced RAG #6: Building a RAG Evaluation Pipeline,” we grow the baseline from Part 1 into a full evaluation system.