// featured · landscape

Claude Code, Copilot, Codex, Gemini: picking your pair-programmer in 2026

Four agents now sit between you and your editor. They are not interchangeable. A field guide to what each is actually good at — and where the seams show.

$ Jakub Jirák · Jun 21 · 11 min

// latest

Claude Code: agentic coding from the terminal

A planning loop, multi-file edits, and your test suite as the oracle. What the terminal-native agent gets right, and how to drive it.

Jakub Jirák · 9 min

GitHub Copilot in 2026: from autocomplete to background agent

Ghost-text was the gateway drug. The interesting Copilot now is the one that opens pull requests while you're at lunch.

Jakub Jirák · 8 min

Codex and GPT-5: OpenAI's autonomous coding stack

A CLI and a cloud agent tuned for long, unattended runs in a sandbox. What 'let it grind' actually buys you.

Jakub Jirák · 8 min

Gemini for developers: a million tokens of context in practice

The 1M-token window isn't a bigger version of the same tool. It changes what 'give it the codebase' means — and what breaks when you do.

Jakub Jirák · 8 min

AI agent architectures that don't fall over

Context, tools, memory, and evals — the boring scaffolding that decides whether your agent is a product or a demo.

Jakub Jirák · 10 min

Running capable code models locally: Ollama, llama.cpp, vLLM

When the code can't leave the building, or you just want zero marginal cost. What's realistic on a laptop, a workstation, and a server in 2026.

Jakub Jirák · 9 min

What hardware actually runs these models — decently

VRAM is the gate, quantization is the key, and Apple's unified memory quietly changed the math. A buyer's guide by model size, not by hype.

Jakub Jirák · 9 min

Apple Silicon, MLX, and Core ML for on-device LLMs

Unified memory made the Mac a serious local-inference box. MLX and Core ML are the two ways to actually use it — and they're for different jobs.

Jakub Jirák · 10 min

RAG that actually retrieves the right thing

Most RAG systems fail at retrieval, not generation. The fixes are unglamorous: chunk with intent, rerank, and evaluate the retriever on its own.

Jakub Jirák · 10 min

Agentic architectures: the four topologies and where they break

Single agent, orchestrator-worker, evaluator loop, multi-agent. Most teams reach for the most complex one first. Here's when each earns its keep.

Jakub Jirák · 11 min

The architecture that cuts 99% of your LLM bill

Not one trick — five multiplicative levers. Cache, route, batch, compress, and shape output, and an order-of-magnitude bill becomes a rounding error.

Jakub Jirák · 11 min

Stop burning tokens in GitHub Copilot

Premium requests, model pickers, and a chat that hoards context. A practical diet for getting Copilot's value without torching your quota.

Jakub Jirák · 8 min

Headroom: a compression layer between your agent and the model

Tool outputs, logs, and RAG chunks are mostly filler. Headroom compresses them before they hit the model — 60–95% fewer tokens, accuracy preserved.

Jakub Jirák · 8 min

Caveman: why use many token when few token do trick

A skill that makes your agent talk like a caveman — drop filler, keep substance. ~65% fewer output tokens, and the accuracy often goes up, not down.

Jakub Jirák · 6 min

Ponytail: the lazy senior dev inside your agent

He looks at your fifty lines, says nothing, replaces them with one. Ponytail forces the laziest solution that works — 80–94% less code, 47–77% cheaper.

Jakub Jirák · 7 min

Stacking it all: ultra token savings at the same quality

Caching, routing, compression, terse prose, lazy code. Wire all of them together and a real agent bill drops by an order of magnitude — without giving up output quality.

Jakub Jirák · 10 min

Vibe coding, honestly: what changes when the agent writes the code

Strip the hype and 'vibe coding' is a real workflow shift with a real set of new failure modes. What actually changes, what doesn't, and why the harness beats the model.

Jakub Jirák · 9 min

Sandboxing the agent: letting AI run code without losing the building

An agent that can run a command can run the wrong command. Isolation, least privilege, and approval gates are the line between a teammate and an incident.

Jakub Jirák · 9 min

Is a subscription the wrong business model for AI coding tools?

Flat-rate pricing assumes a human-sized appetite for compute. Agents don't have one. Why usage is eating subscriptions — and what pricing survives.

Jakub Jirák · 8 min

Observability for agents: you can't operate what you can't see

A coding agent in production is a nondeterministic, multi-step, tool-calling system. Traces, token accounting, and eval dashboards are how you keep it honest.

Jakub Jirák · 9 min

Governing skills at scale: progressive disclosure and software as memory

Skills turn a general agent into a specialist. But a folder of prompts per developer is chaos. Central management, progressive disclosure, and institutional memory.

Jakub Jirák · 9 min

Long-running autonomous agents: letting it work while you sleep

The frontier of agentic coding isn't a smarter chat — it's an agent you can trust to grind unattended for an hour. Budgets, checkpoints, and knowing when to walk away.

Jakub Jirák · 9 min

Export controls and the geopolitics of your AI coding stack

The model behind your agent is also a geopolitical artifact. Export rules, open weights, and why where a model comes from is now an architecture decision.

Jakub Jirák · 8 min

Knowledge graphs vs vector RAG: when relationships beat similarity

Vector search finds chunks that look like your query. Some questions need chunks that are connected to each other. A practical comparison — and the hybrid that wins.

Jakub Jirák · 9 min

Using AI to learn faster, not just to type faster

The biggest gain from these tools isn't the code they write — it's how fast they get you to competence in something you didn't understand yesterday. If you let them.

Jakub Jirák · 8 min

Advanced agent architecture: context is the scarce resource

Past the basics, every hard agent problem is a context problem. Compaction, context editing, memory tiers, sub-agent isolation, and keeping intermediate results out of the window.

Jakub Jirák · 11 min

Local-first, last-mile-paid: the model cascade that runs mostly free

Do the bulk of the work on a free local model; escalate to Haiku, then Sonnet, then Opus only at the last mile where it's actually needed. The architecture and the triggers.

Jakub Jirák · 11 min

GLM-5.2 shipped without benchmarks — and that's the story

Z.ai released GLM-5.2 the day after the US forced Anthropic to pull Fable 5 globally. A reaction: no-data is not good news, but the withdrawal is the lesson.

Jakub Jirák · 8 min