Lesson 7 · 10 min

The five most common RAG failure modes

Diagnosing a broken RAG is half the job. Here's the field guide.

1. Retrieval failure: the right chunk isn't there

The top-k returned doesn't contain the answer. Causes:

Bad chunking: relevant fact got split across chunk boundaries.
Bad embeddings: domain language drifts from the embedding model's training distribution.
Bad query: user phrased it in a way the embeddings can't match (use query rewriting / hypothetical document embeddings).

Diagnostic: log the retrieved chunks. Inspect them by hand for 20 known-good queries.

2. Generation failure: chunk is there but the LLM ignores it

The right chunk is in the context, but the answer is wrong or hallucinated. Causes:

Prompt isn't strict enough about "use only the provided context".
Chunks are too long; the LLM "loses the middle".
Retrieved chunks contradict each other; the LLM picks wrong.

3. The classic "lost in the middle"

LLMs attend more to the beginning and end of long contexts. If you stuff 20 chunks and the answer is at position 11, you may get wrong answers. Fix: rerank before stuffing, deduplicate, and put the best chunks at top and bottom.

4. Stale data

The corpus changed. The embeddings didn't. Add an ingest schedule and TTL.

5. Out-of-scope queries

The user asks something your corpus can't answer. Without a guardrail, the LLM hallucinates politely. Fix: add a "do you have enough context?" check, or a relevance threshold.