Models & ArchitectureCore

Context window

Definition
The maximum number of tokens a model can process in a single request, including both the prompt and the response. Larger context windows (100K-2M tokens) let models ingest entire codebases or documents at once.
Why it matters
Context window size determines what your AI can see and reason about in one shot. A 4K-token window limits you to a few pages; a 1M-token window lets you ingest entire codebases, legal documents, or research corpora. But bigger is not always better: models degrade in quality when context is packed with irrelevant information ('lost in the middle' problem), and longer contexts cost more in compute and money. The strategic question is not just how much context a model supports, but how well it uses that context. A model that reliably retrieves and reasons over 200K tokens beats one that theoretically supports 1M but misses key details.
In practice
Google's Gemini 1.5 Pro launched with a 1M-token context window in early 2024 and later expanded to 2M tokens. Anthropic's Claude offers 200K tokens standard, with extended context up to 1M for enterprise customers. OpenAI's GPT-4 Turbo moved from 8K to 128K tokens. In practice, most enterprise applications use 10-50K tokens per request, combining system prompts, retrieved documents, and conversation history. The 'Needle in a Haystack' test became the standard benchmark for measuring how well models actually use their full context window, revealing significant quality differences between providers.

We cover models & architecture every week.

Get the 5 AI stories that matter — free, every Friday.

Know the terms. Know the moves.

Get the 5 AI stories that matter every Friday — free.

Free forever. No spam.