Lesson 6 · 9 min

Choosing an observability stack

Phoenix, Helicone, Braintrust, Langfuse, OpenTelemetry. The honest comparison and when each is the right pick.

The 2026 landscape

Phoenix (Arize) — open-source, self-hostable, OpenTelemetry-compatible. Strong on traces and evals. The default choice when you want to keep data on-prem.

Helicone — drop-in proxy. Add a base URL change, get traces, costs, prompt management. The lowest-friction option for a small team.

Braintrust — eval-first. Datasets, scorers, and regression diffs as the primary surface. Best when your team has eval discipline already and wants ergonomic tooling around it.

Langfuse — open-source, self-hostable, OTEL-friendly. Solid on tracing and prompt management. Strong community.

OpenTelemetry + your existing stack — if you have Datadog/Honeycomb/etc already, the OTEL semantic conventions for GenAI now exist. Your existing platform becomes the observability layer; you just need to emit the right spans.