Explainer 26 Feb 2026 7 min read

Context Windows Explained:
Why Bigger Is Not Always Better

Every AI model has a context window — a hard limit on how much text it can process in a single conversation. Some models handle 4,000 tokens. Others claim 1 million or more. The marketing suggests bigger is always better, but the reality is more nuanced than that.

What Is a Context Window?

A context window is measured in tokens. One token is roughly three-quarters of a word in English — so 128,000 tokens is approximately 96,000 words, or a 300-page book.

The context window includes everything: the system prompt, the conversation history, any documents you paste in, and the model's own response. It all has to fit within that limit.

Think of it as the model's working memory. Outside this window, the model has no access to what came before. It cannot scroll back. Once text falls out of the window, it is gone.

The Current Landscape

Model Context Roughly equivalent to
GPT-4.1 1M tokens ~6 novels
Gemini 2.5 Pro 1M tokens ~6 novels
Claude Opus 4 200K tokens ~1 novel
DeepSeek V3 128K tokens ~300 pages

The "Lost in the Middle" Problem

Research has shown that most language models struggle with information placed in the middle of long contexts. They handle the beginning well. They handle the end well. But details buried in the middle of a 100,000-token prompt are more likely to be missed or misrepresented.

This is not a theoretical concern. It means that a model with a 1 million token context window may not actually be processing all 1 million tokens with equal attention. Having the capacity to accept a long input is not the same as reliably using every piece of information in it.

Bigger Context = Higher Cost

Context window usage directly affects your bill. Most AI APIs charge per token — both input and output. Dumping an entire codebase into the context when you only need a single file means paying for tokens the model probably will not use effectively anyway.

A 1 million token prompt on GPT-4.1 costs around $2 in input tokens alone. Do that 100 times a day and you are spending $200/day just on context — before the model has generated a single word of output.

When Big Context Windows Genuinely Help

  • Analysing entire documents. Legal contracts, research papers, earnings reports — where you need the model to cross-reference different sections.
  • Long conversations. Extended back-and-forth where losing earlier context would break the flow.
  • Multi-file code review. When the model needs to understand how files relate to each other.
  • Translation of long documents. Where splitting the document would lose coherence.

When to Use Smaller Context Instead

  • Single-turn tasks. Summarise this paragraph. Fix this function. Translate this sentence. No context history needed.
  • When RAG is a better fit. Instead of pasting your entire knowledge base into the prompt, retrieve only the relevant chunks.
  • High-volume workloads. If you are processing 10,000 requests/day, smaller context = lower cost and faster responses.

How to Think About Context Windows

  1. Match the window to your task. 128K is enough for almost everything most people do.
  2. Put critical information at the start or end. Not the middle.
  3. Watch the "needle in a haystack" benchmarks. They test whether models can find a specific fact buried in a long context. These scores vary widely between models.
  4. Consider cost per task, not just capability. A model with a smaller context window at a lower price per token may be the better practical choice.

Compare Context Windows

See context window sizes across all models on our dedicated ranking page.