What worked
- Constrained domains. Voice agents that handle 3-5 specific tasks (booking, order status, billing question) ship and work. Open-ended voice agents are still rough.
- Fast interruption handling. The user can cut off the bot mid-sentence; the bot recovers. This is the biggest UX upgrade since 2024.
- Hybrid escalation. "Let me transfer you to a human" working seamlessly when the agent is stuck. The fallback path is the product.
What failed
- Open-ended assistants. Without constraints, voice agents wander. Users lose patience by turn 4.
- Latency >800ms. Round-trip time matters more than text. Above 800ms, the conversation feels off; users hang up.
- Pretending to be human. Users figure it out. The teams that disclose "I'm an AI assistant; I can transfer you to a human" up front have higher CSAT than the ones who don't.
Stack consensus
Most production deployments converged on: realtime API (OpenAI / Anthropic / Hume), small-talk filler tokens, async tool calls during pauses, hard 3-second timeout on tool calls, escalation to human after N failed turns.