Skip to main content

Lesson 2 · 11 min

The trace — what to capture per request

A complete LLM trace covers prompt assembly, retrieval, tool calls, generation, and output validation. The schema that makes incident triage possible.

What goes in a trace

A single user request often produces 5-20 spans. The pattern that makes incidents debuggable:

  1. Top-level request span — feature ID, tenant ID, request ID, user ID (hashed), timing.
  2. Context-assembly span — system prompt version, tool definitions hash, prefix-cache hit/miss.
  3. Retrieval span(s) — query (or query rewrite), top-k chunks, reranker scores, retrieval-precision estimate if available.
  4. Generation span(s) — model + version, input tokens, output tokens, cost, finish reason, time-to-first-token, total latency.
  5. Tool-call spans — tool name, args, result, success/error, latency. One per call.
  6. Validation span — output passed schema validation? Refused? Filter triggered?
  7. User outcome — did the user follow up? Click through? Thumbs-up?