Lesson 6 · 9 min
Choosing an observability stack
Phoenix, Helicone, Braintrust, Langfuse, OpenTelemetry. The honest comparison and when each is the right pick.
The 2026 landscape
Phoenix (Arize) — open-source, self-hostable, OpenTelemetry-compatible. Strong on traces and evals. The default choice when you want to keep data on-prem.
Helicone — drop-in proxy. Add a base URL change, get traces, costs, prompt management. The lowest-friction option for a small team.
Braintrust — eval-first. Datasets, scorers, and regression diffs as the primary surface. Best when your team has eval discipline already and wants ergonomic tooling around it.
Langfuse — open-source, self-hostable, OTEL-friendly. Solid on tracing and prompt management. Strong community.
OpenTelemetry + your existing stack — if you have Datadog/Honeycomb/etc already, the OTEL semantic conventions for GenAI now exist. Your existing platform becomes the observability layer; you just need to emit the right spans.