100 failed agents — what we keep learning

A retro on a year of production agent failures across consulting engagements. Top causes ranked.

From a retrospective on ~100 production agent deployments that didn't hit their goals:

Top failure modes

Should have been a workflow (32%). The "agent" decided steps that were always the same. A fixed pipeline would have been cheaper, faster, more reliable.
Prompt injection on retrieved content (18%). The agent read a doc with hidden instructions and obediently exfiltrated data.
Runaway loops on tool errors (15%). A flaky API caused the agent to retry indefinitely, burning budget.
Tool authorization gaps (12%). The agent called a destructive tool because nothing stopped it.
Eval theater (11%). Tested on "works on the demo" examples; production traffic exposed brittleness.
Latency UX collapse (8%). 30-second responses for what users expected to be 5-second answers.
Cost overrun (4%). No budget cap. One bad day cost more than the project saved in a month.

Build the workflow first. Reach for an agent only when the workflow gets uglier than the agent.
Tool privilege scoping (read-only by default).
Hard step + dollar caps with halt + alert.
Output filtering for prompt-injection patterns in retrieved content.
Real eval set, run weekly, track trends.