Skip to main content

Lesson 7 · 10 min

Observability for LLM applications

LLM-specific observability goes beyond request logs: you need traces that capture the full prompt, token budgets, tool call chains, and quality signals — all without leaking PII.

What standard observability misses

Traditional APM tools (Datadog, New Relic) track request latency, error rate, and throughput. That's necessary but not sufficient for LLM applications.

The LLM-specific signals that matter:

| Signal | Why it matters |

|---|---|

| Full prompt + response | Reproducible incident triage; you can't debug a wrong answer without seeing what was sent |

| Token breakdown | Input/output/cache tokens per call; find the 10x cost outliers |

| Latency by stage | Gateway → retrieval → LLM → formatting; pinpoint where latency lives |

| Tool call chain | In agent flows: what tools were called, in what order, with what arguments |

| Quality signals | LLM-as-judge scores, refusal rate, user thumbs down rate |