# ai-learning-bits

> A professional blog on AI coding tools — Claude Code, GitHub Copilot, Codex, Gemini — agent architecture, local models, cost/token savings, and a sortable LLM benchmark.

Opinionated, practical writing on AI coding tools, agent architecture, local models, and cost/token discipline. Author: Jakub Jirák (ai@jakubjirak.com).

## Key pages

- [LLM coding benchmark](https://ai.jakubjirak.com/benchmark): sortable comparison of frontier and open coding models — context, SWE-bench Verified, and price.
- [About](https://ai.jakubjirak.com/about): what this blog is and who writes it.

## Articles

- [Claude Code, Copilot, Codex, Gemini: picking your pair-programmer in 2026](https://ai.jakubjirak.com/p/ai-coding-tools-2026): Four agents now sit between you and your editor. They are not interchangeable. A field guide to what each is actually good at — and where the seams show.
- [Claude Code: agentic coding from the terminal](https://ai.jakubjirak.com/p/claude-code): A planning loop, multi-file edits, and your test suite as the oracle. What the terminal-native agent gets right, and how to drive it.
- [GitHub Copilot in 2026: from autocomplete to background agent](https://ai.jakubjirak.com/p/github-copilot): Ghost-text was the gateway drug. The interesting Copilot now is the one that opens pull requests while you're at lunch.
- [Codex and GPT-5: OpenAI's autonomous coding stack](https://ai.jakubjirak.com/p/codex-gpt5): A CLI and a cloud agent tuned for long, unattended runs in a sandbox. What 'let it grind' actually buys you.
- [Gemini for developers: a million tokens of context in practice](https://ai.jakubjirak.com/p/gemini): The 1M-token window isn't a bigger version of the same tool. It changes what 'give it the codebase' means — and what breaks when you do.
- [AI agent architectures that don't fall over](https://ai.jakubjirak.com/p/agent-architecture): Context, tools, memory, and evals — the boring scaffolding that decides whether your agent is a product or a demo.
- [Running capable code models locally: Ollama, llama.cpp, vLLM](https://ai.jakubjirak.com/p/local-models): When the code can't leave the building, or you just want zero marginal cost. What's realistic on a laptop, a workstation, and a server in 2026.
- [What hardware actually runs these models — decently](https://ai.jakubjirak.com/p/hardware-for-local-llms): VRAM is the gate, quantization is the key, and Apple's unified memory quietly changed the math. A buyer's guide by model size, not by hype.
- [GLM-5.2 shipped without benchmarks — and that's the story](https://ai.jakubjirak.com/p/glm-5-2-no-benchmarks): Z.ai released GLM-5.2 the day after the US forced Anthropic to pull Fable 5 globally. A reaction: no-data is not good news, but the withdrawal is the lesson.
- [Apple Silicon, MLX, and Core ML for on-device LLMs](https://ai.jakubjirak.com/p/apple-mlx-coreml): Unified memory made the Mac a serious local-inference box. MLX and Core ML are the two ways to actually use it — and they're for different jobs.
- [RAG that actually retrieves the right thing](https://ai.jakubjirak.com/p/rag-that-retrieves): Most RAG systems fail at retrieval, not generation. The fixes are unglamorous: chunk with intent, rerank, and evaluate the retriever on its own.
- [Agentic architectures: the four topologies and where they break](https://ai.jakubjirak.com/p/agentic-architecture-patterns): Single agent, orchestrator-worker, evaluator loop, multi-agent. Most teams reach for the most complex one first. Here's when each earns its keep.
- [The architecture that cuts 99% of your LLM bill](https://ai.jakubjirak.com/p/99-percent-cost-architecture): Not one trick — five multiplicative levers. Cache, route, batch, compress, and shape output, and an order-of-magnitude bill becomes a rounding error.
- [Stop burning tokens in GitHub Copilot](https://ai.jakubjirak.com/p/copilot-token-diet): Premium requests, model pickers, and a chat that hoards context. A practical diet for getting Copilot's value without torching your quota.
- [Headroom: a compression layer between your agent and the model](https://ai.jakubjirak.com/p/headroom): Tool outputs, logs, and RAG chunks are mostly filler. Headroom compresses them before they hit the model — 60–95% fewer tokens, accuracy preserved.
- [Caveman: why use many token when few token do trick](https://ai.jakubjirak.com/p/caveman): A skill that makes your agent talk like a caveman — drop filler, keep substance. ~65% fewer output tokens, and the accuracy often goes up, not down.
- [Ponytail: the lazy senior dev inside your agent](https://ai.jakubjirak.com/p/ponytail): He looks at your fifty lines, says nothing, replaces them with one. Ponytail forces the laziest solution that works — 80–94% less code, 47–77% cheaper.
- [Stacking it all: ultra token savings at the same quality](https://ai.jakubjirak.com/p/ultra-token-savings): Caching, routing, compression, terse prose, lazy code. Wire all of them together and a real agent bill drops by an order of magnitude — without giving up output quality.
- [Vibe coding, honestly: what changes when the agent writes the code](https://ai.jakubjirak.com/p/vibe-coding-honestly): Strip the hype and 'vibe coding' is a real workflow shift with a real set of new failure modes. What actually changes, what doesn't, and why the harness beats the model.
- [Sandboxing the agent: letting AI run code without losing the building](https://ai.jakubjirak.com/p/sandboxing-coding-agents): An agent that can run a command can run the wrong command. Isolation, least privilege, and approval gates are the line between a teammate and an incident.
- [Is a subscription the wrong business model for AI coding tools?](https://ai.jakubjirak.com/p/subscription-wrong-for-ai): Flat-rate pricing assumes a human-sized appetite for compute. Agents don't have one. Why usage is eating subscriptions — and what pricing survives.
- [Observability for agents: you can't operate what you can't see](https://ai.jakubjirak.com/p/agent-observability): A coding agent in production is a nondeterministic, multi-step, tool-calling system. Traces, token accounting, and eval dashboards are how you keep it honest.
- [Governing skills at scale: progressive disclosure and software as memory](https://ai.jakubjirak.com/p/governing-skills-at-scale): Skills turn a general agent into a specialist. But a folder of prompts per developer is chaos. Central management, progressive disclosure, and institutional memory.
- [Long-running autonomous agents: letting it work while you sleep](https://ai.jakubjirak.com/p/long-running-autonomous-agents): The frontier of agentic coding isn't a smarter chat — it's an agent you can trust to grind unattended for an hour. Budgets, checkpoints, and knowing when to walk away.
- [Export controls and the geopolitics of your AI coding stack](https://ai.jakubjirak.com/p/ai-export-controls): The model behind your agent is also a geopolitical artifact. Export rules, open weights, and why where a model comes from is now an architecture decision.
- [Knowledge graphs vs vector RAG: when relationships beat similarity](https://ai.jakubjirak.com/p/knowledge-graphs-vs-rag): Vector search finds chunks that look like your query. Some questions need chunks that are connected to each other. A practical comparison — and the hybrid that wins.
- [Using AI to learn faster, not just to type faster](https://ai.jakubjirak.com/p/ai-for-learning): The biggest gain from these tools isn't the code they write — it's how fast they get you to competence in something you didn't understand yesterday. If you let them.
- [Advanced agent architecture: context is the scarce resource](https://ai.jakubjirak.com/p/advanced-agent-architecture): Past the basics, every hard agent problem is a context problem. Compaction, context editing, memory tiers, sub-agent isolation, and keeping intermediate results out of the window.
- [Local-first, last-mile-paid: the model cascade that runs mostly free](https://ai.jakubjirak.com/p/local-first-cascade): Do the bulk of the work on a free local model; escalate to Haiku, then Sonnet, then Opus only at the last mile where it's actually needed. The architecture and the triggers.