Skip to main content

Lesson 6 · 10 min

Streaming structured outputs

Stream JSON token-by-token and parse progressively — why it matters for UX, how to do it in Python and Next.js.

Why stream structured output?

Waiting for a complete JSON blob before doing anything costs you latency. For a 500-token response at ~50 tokens/second, that's 10 seconds before the user sees anything. Streaming lets you:

  • Show partial results as fields arrive (e.g. render a loading card that fills in)
  • Start processing the first array items while the rest are still generating
  • Detect early failures — if the first 50 tokens are garbled, abort and retry rather than waiting for the full response