Skip to main content
AI

What is model context window and why does it matter?

A model's context window is the maximum number of tokens (input plus output) it can process in a single request. It determines how much text, code, or conversation history the model can "see" at once. Larger context windows (100K–1M+ tokens) enable processing entire codebases or long documents but increase latency and cost proportionally.

Key Considerations

  • Context window size ≠ effective recall — most models degrade on information buried in the middle of very long contexts
  • RAG is often more cost-effective than stuffing everything into a massive context window
  • Pricing is per-token: a 200K-token prompt costs 20× more than a 10K-token prompt at the same rate
  • For production apps, measure "needle in a haystack" performance at your actual context lengths
  • Caching (prompt caching, KV-cache) can dramatically reduce cost for repeated long-context calls