DeepSeek V4 distill — open weights with frontier reasoning

DeepSeek's V4 distillation series brings frontier-model reasoning quality to ~30B-parameter open-weight models. Self-host changes again.

DeepSeek released a series of distilled reasoning models in March, ranging from 7B to 70B parameters. The 32B variant performs roughly comparable to Claude Sonnet 4.5 on math, code, and reasoning benchmarks.

For self-hosting reasoning workloads at scale, this is meaningful. The 32B fits on a single H100 with quantization. Throughput hits 50-70 tokens/sec for typical reasoning chains.

Caveat: distilled reasoning models inherit the bias and refusal patterns of their teacher. A few independent evals have shown surprising overconfidence on out-of-distribution cases. As always, run your own eval before swapping in.

Want the deep dive?

The lessons that ground this news in mechanics — not opinion.

Browse courses