Apple Intelligence's on-device 3B-parameter model is now comparable to GPT-3.5 on most consumer tasks (summarization, rewriting, simple Q&A). The implications:
- Privacy-first AI features can ship without API calls.
- Latency is sub-100ms for simple tasks.
- Battery cost is real but tolerable for non-continuous use.
This matters most for teams building products that need to work offline, in regulated industries (healthcare, legal, finance), or at huge consumer scale where API costs would be prohibitive.
For everyone else: cloud frontier models still dominate on quality. But the gap is no longer "10x better" — it's "30% better on hard tasks", and the ratio shrinks every quarter.