Lesson 9 · 10 min

Reasoning models & test-time compute

Why "thinking out loud" before answering makes the model smarter — and when it doesn't.

The shift

For years, scaling LLMs meant training-time compute: bigger models, more data, longer training. Around 2024, a new frontier opened: *test-time* compute — letting the model think for longer at inference time.

Two flavors:

Chain-of-thought (CoT) — visible reasoning the model writes out before its answer.
Extended thinking / reasoning models — Claude Opus's extended thinking, OpenAI's o-series. The model produces a hidden reasoning trace and only the final answer is returned. Internally, the model burns tokens on reasoning.