latest / google
← all posts
// google · gemini

Gemini for developers: a million tokens of context in practice

Every frontier model can read your code. Gemini's distinctive lever is how much of it at once: a context window measured in the millions of tokens. That's not a spec-sheet flex — past a certain size, a longer window is a qualitatively different tool.

What a million tokens actually unlocks

The everyday workflow with most agents is retrieval: the tool decides which files are relevant and pulls those into the prompt. That works, but the retrieval step is where a lot of failures hide — pull the wrong files and the model reasons confidently about the wrong code.

A big enough window lets you sidestep retrieval for a whole class of problems:

  • Whole-module reasoning. Drop an entire legacy subsystem in and ask "where does this leak file handles?" — no chunking, no guessing which files matter.
  • Spec + code + tickets, together. Put the design doc, the implementation, and the bug reports in the same window and let the model cross-reference them. The contradictions fall out.
  • Giant artifacts. A 200k-line log, a massive generated schema, a year of changelog — things you'd never paginate through by hand.

A bigger window doesn't make the model smarter. It removes the lossy step where something else decided what the model gets to see.

The catch nobody puts on the slide

Long context is not free, and "it fits" is not the same as "it's used well":

  • Attention isn't uniform. Models reason better about the start and end of a long window than the murky middle — the "lost in the middle" effect. Burying the critical function at token 480,000 is asking for it to be skimmed. Put what matters where the model looks.
  • Cost scales with what you stuff in. A million-token prompt is a million-token bill, every turn. Cheap-per-token does not mean cheap-per-task. Cache aggressively; don't resend the unchanged world each turn.
  • Latency scales too. Big prompts are slow prompts. For an interactive loop that hurts; for a batch "analyze this whole thing once" job it's fine.

How to use it well

  • Reach for it when the problem is the context — a sprawling unfamiliar codebase, a cross-document investigation, an artifact too big to chunk sanely.
  • Don't reach for it for the small stuff. Finishing a function or fixing one file doesn't need a million tokens; you're paying latency and money for headroom you won't use. That's Copilot or Claude Code territory.
  • Curate even when you don't have to. "I can fit everything" tempts you to skip thinking about relevance. The model still reads better when the signal is dense and the key material is near the edges.
  • Cache the stable parts. The codebase doesn't change between turns; your question does. Prompt caching is the difference between a workable bill and a shocking one — see agent architecture.

Where it lands

Gemini's pitch is the inverse of Codex's long-horizon autonomy: this is long-context breadth. One is "let it work for a long time," the other is "let it see everything at once." For investigations and whole-system reasoning, the giant window is the right tool — just respect that fitting the data in the window is the easy half. Using all of it well is the half that decides whether the answer is any good. The benchmark tracks context size next to capability for exactly this reason.

#gemini#google#context