Skip to main content

Lesson 8 · 11 min

Agent safety and guardrails

Agents are LLMs with the ability to act. The blast radius is bigger. Defense is layered.

What can go wrong

The attack surface beyond a normal LLM:

  • Prompt injection escalation. A malicious doc the agent reads tells it to email your secrets out. Now it can.
  • Runaway loops. The agent gets stuck and burns through your billing.
  • Privilege escalation. A read-only agent finds a write tool and uses it.
  • Tool side-effect amplification. A single bad decision triggers an irreversible action.