Stop burning tokens in GitHub Copilot
GitHub Copilot bills the agent and chat surfaces in premium requests, and the agent's appetite for context is where quota quietly evaporates. The good news: most of the waste is behavioral, and a handful of habits recover the bulk of it without giving up the tool. Here's the diet.
Where the tokens actually go
Copilot has three surfaces with very different cost profiles:
- Inline completions — cheap and fast; basically free-flowing. Not your problem.
- Chat — moderate, but it accumulates context: every follow-up resends the growing conversation plus whatever files it pulled in.
- The agent — the heavy hitter. It reads files, runs steps, iterates, and every loop is tokens. One sprawling agent task can cost more than a day of completions.
The bill is dominated by the agent and by long, unfocused chats. Fix those two and the rest is noise.
The model picker is a cost dial
Copilot is model-agnostic — the dropdown is a price/capability dial, and most people leave it on the most expensive option for everything.
- Cheap/base model for boilerplate, explanations, simple edits, test scaffolding. This is most requests.
- Frontier model only for the gnarly multi-file refactor or the subtle bug.
Premium-request weighting means the frontier model can cost several times a base request. Matching model to task difficulty — the same routing lever from the cost architecture — is the single biggest quota win.
Don't pay the frontier model's premium to rename a variable. The model picker is the cheapest cost control Copilot gives you, and it's one click.
Scope the agent like you scope a ticket
The agent's cost scales with how much it has to read and how many loops it takes. Both are downstream of how you brief it.
- Be specific about files. "Add validation to
routes/auth.tsandroutes/user.ts" beats "add validation to the API" — the latter makes the agent crawl the repo to figure out scope, on your dime. - Keep tasks small. A tightly-scoped change finishes in a few loops; a vague epic wanders for dozens. Small tasks are cheaper and review faster.
- Give acceptance criteria. "Tests in
auth.test.tspass" gives the agent a stop condition, so it stops looping the moment it's done instead of gold-plating. - Lean on
.github/copilot-instructions.md. A short repo conventions file means the agent doesn't burn a loop rediscovering how your project runs tests every single task.
Starve the chat of stale context
- Start fresh threads. A long-running chat resends its entire history every turn. When you switch topics, open a new conversation — don't drag 40 turns of unrelated context into a new question.
- Attach precisely. Add the two files that matter, not the whole folder. Every attached file is input tokens on every subsequent turn of that thread.
- Trust inline for the small stuff. If a completion will do, don't open chat. The cheapest request is the one you didn't escalate.
When you're still over budget
If you've matched models, scoped tasks, and trimmed chats and the bill is still high, you're into the architecture levers: a context-compression proxy in front of the agent (Copilot CLI is supported), and terser output habits. The capstone stacks all of it.
But start with behavior. Model picker, task scope, fresh threads — three free habits that recover most of the waste before you install anything. The tool is worth its premium; the trick is not paying premium prices for base-model work.