Caveman: why use many token when few token do trick
Output tokens cost 3–5× input, and a chatty agent spends most of them on words that carry no information: "Great question! Let me help you with that. First, I'll need to..." Caveman is a Claude Code skill with a blunt fix — make the agent talk like a caveman: drop the filler, keep the substance, use fragments instead of full sentences.
The pitch, and the surprising part
The tagline is the whole philosophy: "why use many token when few token do trick." It installs as a skill that instructs the agent to cut connective tissue while preserving every technical fact.
The surprising part is that brevity doesn't cost accuracy — it can improve it. The repo cites a 2026 paper finding that brevity constraints reverse performance hierarchies in language models: forced to be terse, models drop the hedging and waffle and get to the point, and the point is more often correct. You're not trading quality for cost; on a lot of tasks you're getting both.
Terseness isn't just cheaper. Stripped of room to ramble, the model commits to an answer instead of narrating its way around one.
The numbers
Across 10 real API tasks, Caveman reports an average 65% reduction in output tokens (range 22–87%):
| Task | Output saved |
|---|---|
| React re-render bug | 87% |
| Auth middleware fix | 83% |
| PostgreSQL connection pool | 84% |
And a second lever most people miss: caveman-compress rewrites your memory files to be terser, cutting ~46% of the input tokens those files cost on every future session. Your CLAUDE.md is read every turn — compressing it pays forever.
How you use it
curl -fsSL https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.sh | bash
Then it's levels, set per session:
/caveman lite— drop filler only./caveman full— default caveman mode./caveman ultra— telegraphic./caveman wenyan— classical Chinese, even denser.
Plus /caveman-commit (commits ≤50 chars), /caveman-review (one-line PR comments), /caveman-stats (real token + USD savings), and /caveman-compress <file>. Say "talk like caveman" to trigger, "normal mode" to stop. It works natively in Claude Code, Codex, and Gemini, and via manual install in Copilot, Cursor, and 30+ others.
Where it fits
Caveman is the output-side lever in the cost stack — the complement to Headroom's input compression. They're orthogonal: one shrinks what you send, the other shrinks what comes back, and they stack cleanly.
The honest caveat: caveman output is for you and your agent, not your end users — telegraphic style reads as terse to a human expecting prose. Use it for the working channel (the agent's reasoning, commits, reviews, internal notes), keep a normal voice for anything customer-facing. Within that boundary it's close to free money: ~65% off your most expensive tokens, and frequently better answers. The capstone pairs it with Ponytail — terse prose plus lazy code — for the full output-side cut.