DeepSeek released a series of distilled reasoning models in March, ranging from 7B to 70B parameters. The 32B variant performs roughly comparable to Claude Sonnet 4.5 on math, code, and reasoning benchmarks.
For self-hosting reasoning workloads at scale, this is meaningful. The 32B fits on a single H100 with quantization. Throughput hits 50-70 tokens/sec for typical reasoning chains.
Caveat: distilled reasoning models inherit the bias and refusal patterns of their teacher. A few independent evals have shown surprising overconfidence on out-of-distribution cases. As always, run your own eval before swapping in.