Lesson 4 · 11 min

State and memory architecture

An LLM has no memory — everything the model knows must be in the request. Choosing where to store what (context window vs. cache vs. database vs. vector store) determines your application's cost, latency, and correctness.

The four storage tiers

AI applications have four places to keep state, each with different trade-offs:

|---|---|---|---|---|

Most bugs in production AI apps come from putting the wrong thing in the wrong tier.