Skip to main content

Lesson 6 · 12 min

RAG evaluation — retrieval and answer quality

RAG systems fail in two different places: the retrieval step and the generation step. Evaluating both separately — with the right metrics for each — is what separates a stable RAG feature from one that mysteriously degrades.

RAG fails twice

A RAG pipeline has two failure modes that look identical to the user: a wrong answer.

Failure 1: Retrieval failure — the right document was not retrieved. The generator then answers from nothing or hallucinates.

Failure 2: Generation failure — the right documents were retrieved, but the generator ignored them or misread them.

If you only measure final answer quality, you can't tell which one broke — or fix it efficiently. Evaluate retrieval and generation separately.