Lesson 9 · 10 min
Reasoning models & test-time compute
Why "thinking out loud" before answering makes the model smarter — and when it doesn't.
The shift
For years, scaling LLMs meant training-time compute: bigger models, more data, longer training. Around 2024, a new frontier opened: *test-time* compute — letting the model think for longer at inference time.
Two flavors:
- Chain-of-thought (CoT) — visible reasoning the model writes out before its answer.
- Extended thinking / reasoning models — Claude Opus's extended thinking, OpenAI's o-series. The model produces a hidden reasoning trace and only the final answer is returned. Internally, the model burns tokens on reasoning.