Lesson 2 · 11 min
The trace — what to capture per request
A complete LLM trace covers prompt assembly, retrieval, tool calls, generation, and output validation. The schema that makes incident triage possible.
What goes in a trace
A single user request often produces 5-20 spans. The pattern that makes incidents debuggable:
- Top-level request span — feature ID, tenant ID, request ID, user ID (hashed), timing.
- Context-assembly span — system prompt version, tool definitions hash, prefix-cache hit/miss.
- Retrieval span(s) — query (or query rewrite), top-k chunks, reranker scores, retrieval-precision estimate if available.
- Generation span(s) — model + version, input tokens, output tokens, cost, finish reason, time-to-first-token, total latency.
- Tool-call spans — tool name, args, result, success/error, latency. One per call.
- Validation span — output passed schema validation? Refused? Filter triggered?
- User outcome — did the user follow up? Click through? Thumbs-up?