Skip to main content

Lesson 4 · 13 min

Long document processing patterns

Process documents that exceed the context window with chunking, map-reduce, and hierarchical summarization — without losing coherence.

When the document is bigger than the window

A 500-page legal contract is ~400k tokens. A codebase is millions of tokens. Even with a 1M-token context window, naively dumping documents degrades quality and costs a fortune.

Three patterns handle large documents reliably: