ai-learning-bits

ai-learning-bits https://ai.jakubjirak.com/ A professional blog on AI coding tools — Claude Code, GitHub Copilot, Codex, Gemini — agent architecture, local models, cost/token savings, and a sortable LLM benchmark. en Sun, 21 Jun 2026 12:00:00 +0000 Claude Code, Copilot, Codex, Gemini: picking your pair-programmer in 2026 https://ai.jakubjirak.com/p/ai-coding-tools-2026 https://ai.jakubjirak.com/p/ai-coding-tools-2026 Sun, 21 Jun 2026 12:00:00 +0000 landscape Four agents now sit between you and your editor. They are not interchangeable. A field guide to what each is actually good at — and where the seams show. Claude Code: agentic coding from the terminal https://ai.jakubjirak.com/p/claude-code https://ai.jakubjirak.com/p/claude-code Sat, 20 Jun 2026 12:00:00 +0000 agents A planning loop, multi-file edits, and your test suite as the oracle. What the terminal-native agent gets right, and how to drive it. GitHub Copilot in 2026: from autocomplete to background agent https://ai.jakubjirak.com/p/github-copilot https://ai.jakubjirak.com/p/github-copilot Fri, 19 Jun 2026 12:00:00 +0000 copilot Ghost-text was the gateway drug. The interesting Copilot now is the one that opens pull requests while you're at lunch. Codex and GPT-5: OpenAI's autonomous coding stack https://ai.jakubjirak.com/p/codex-gpt5 https://ai.jakubjirak.com/p/codex-gpt5 Thu, 18 Jun 2026 12:00:00 +0000 openai A CLI and a cloud agent tuned for long, unattended runs in a sandbox. What 'let it grind' actually buys you. Gemini for developers: a million tokens of context in practice https://ai.jakubjirak.com/p/gemini https://ai.jakubjirak.com/p/gemini Wed, 17 Jun 2026 12:00:00 +0000 google The 1M-token window isn't a bigger version of the same tool. It changes what 'give it the codebase' means — and what breaks when you do. AI agent architectures that don't fall over https://ai.jakubjirak.com/p/agent-architecture https://ai.jakubjirak.com/p/agent-architecture Tue, 16 Jun 2026 12:00:00 +0000 architecture Context, tools, memory, and evals — the boring scaffolding that decides whether your agent is a product or a demo. Running capable code models locally: Ollama, llama.cpp, vLLM https://ai.jakubjirak.com/p/local-models https://ai.jakubjirak.com/p/local-models Mon, 15 Jun 2026 12:00:00 +0000 local When the code can't leave the building, or you just want zero marginal cost. What's realistic on a laptop, a workstation, and a server in 2026. What hardware actually runs these models — decently https://ai.jakubjirak.com/p/hardware-for-local-llms https://ai.jakubjirak.com/p/hardware-for-local-llms Sun, 14 Jun 2026 12:00:00 +0000 hardware VRAM is the gate, quantization is the key, and Apple's unified memory quietly changed the math. A buyer's guide by model size, not by hype. GLM-5.2 shipped without benchmarks — and that's the story https://ai.jakubjirak.com/p/glm-5-2-no-benchmarks https://ai.jakubjirak.com/p/glm-5-2-no-benchmarks Sun, 14 Jun 2026 12:00:00 +0000 analysis Z.ai released GLM-5.2 the day after the US forced Anthropic to pull Fable 5 globally. A reaction: no-data is not good news, but the withdrawal is the lesson. Apple Silicon, MLX, and Core ML for on-device LLMs https://ai.jakubjirak.com/p/apple-mlx-coreml https://ai.jakubjirak.com/p/apple-mlx-coreml Sat, 13 Jun 2026 12:00:00 +0000 apple Unified memory made the Mac a serious local-inference box. MLX and Core ML are the two ways to actually use it — and they're for different jobs. RAG that actually retrieves the right thing https://ai.jakubjirak.com/p/rag-that-retrieves https://ai.jakubjirak.com/p/rag-that-retrieves Fri, 12 Jun 2026 12:00:00 +0000 rag Most RAG systems fail at retrieval, not generation. The fixes are unglamorous: chunk with intent, rerank, and evaluate the retriever on its own. Agentic architectures: the four topologies and where they break https://ai.jakubjirak.com/p/agentic-architecture-patterns https://ai.jakubjirak.com/p/agentic-architecture-patterns Thu, 11 Jun 2026 12:00:00 +0000 agents Single agent, orchestrator-worker, evaluator loop, multi-agent. Most teams reach for the most complex one first. Here's when each earns its keep. The architecture that cuts 99% of your LLM bill https://ai.jakubjirak.com/p/99-percent-cost-architecture https://ai.jakubjirak.com/p/99-percent-cost-architecture Wed, 10 Jun 2026 12:00:00 +0000 cost Not one trick — five multiplicative levers. Cache, route, batch, compress, and shape output, and an order-of-magnitude bill becomes a rounding error. Stop burning tokens in GitHub Copilot https://ai.jakubjirak.com/p/copilot-token-diet https://ai.jakubjirak.com/p/copilot-token-diet Tue, 09 Jun 2026 12:00:00 +0000 copilot Premium requests, model pickers, and a chat that hoards context. A practical diet for getting Copilot's value without torching your quota. Headroom: a compression layer between your agent and the model https://ai.jakubjirak.com/p/headroom https://ai.jakubjirak.com/p/headroom Mon, 08 Jun 2026 12:00:00 +0000 tooling Tool outputs, logs, and RAG chunks are mostly filler. Headroom compresses them before they hit the model — 60–95% fewer tokens, accuracy preserved. Caveman: why use many token when few token do trick https://ai.jakubjirak.com/p/caveman https://ai.jakubjirak.com/p/caveman Sun, 07 Jun 2026 12:00:00 +0000 tooling A skill that makes your agent talk like a caveman — drop filler, keep substance. ~65% fewer output tokens, and the accuracy often goes up, not down. Ponytail: the lazy senior dev inside your agent https://ai.jakubjirak.com/p/ponytail https://ai.jakubjirak.com/p/ponytail Sat, 06 Jun 2026 12:00:00 +0000 tooling He looks at your fifty lines, says nothing, replaces them with one. Ponytail forces the laziest solution that works — 80–94% less code, 47–77% cheaper. Stacking it all: ultra token savings at the same quality https://ai.jakubjirak.com/p/ultra-token-savings https://ai.jakubjirak.com/p/ultra-token-savings Fri, 05 Jun 2026 12:00:00 +0000 savings Caching, routing, compression, terse prose, lazy code. Wire all of them together and a real agent bill drops by an order of magnitude — without giving up output quality. Vibe coding, honestly: what changes when the agent writes the code https://ai.jakubjirak.com/p/vibe-coding-honestly https://ai.jakubjirak.com/p/vibe-coding-honestly Thu, 04 Jun 2026 12:00:00 +0000 vibecoding Strip the hype and 'vibe coding' is a real workflow shift with a real set of new failure modes. What actually changes, what doesn't, and why the harness beats the model. Sandboxing the agent: letting AI run code without losing the building https://ai.jakubjirak.com/p/sandboxing-coding-agents https://ai.jakubjirak.com/p/sandboxing-coding-agents Wed, 03 Jun 2026 12:00:00 +0000 security An agent that can run a command can run the wrong command. Isolation, least privilege, and approval gates are the line between a teammate and an incident. Is a subscription the wrong business model for AI coding tools? https://ai.jakubjirak.com/p/subscription-wrong-for-ai https://ai.jakubjirak.com/p/subscription-wrong-for-ai Tue, 02 Jun 2026 12:00:00 +0000 economics Flat-rate pricing assumes a human-sized appetite for compute. Agents don't have one. Why usage is eating subscriptions — and what pricing survives. Observability for agents: you can't operate what you can't see https://ai.jakubjirak.com/p/agent-observability https://ai.jakubjirak.com/p/agent-observability Mon, 01 Jun 2026 12:00:00 +0000 observability A coding agent in production is a nondeterministic, multi-step, tool-calling system. Traces, token accounting, and eval dashboards are how you keep it honest. Governing skills at scale: progressive disclosure and software as memory https://ai.jakubjirak.com/p/governing-skills-at-scale https://ai.jakubjirak.com/p/governing-skills-at-scale Sun, 31 May 2026 12:00:00 +0000 skills Skills turn a general agent into a specialist. But a folder of prompts per developer is chaos. Central management, progressive disclosure, and institutional memory. Long-running autonomous agents: letting it work while you sleep https://ai.jakubjirak.com/p/long-running-autonomous-agents https://ai.jakubjirak.com/p/long-running-autonomous-agents Sat, 30 May 2026 12:00:00 +0000 autonomy The frontier of agentic coding isn't a smarter chat — it's an agent you can trust to grind unattended for an hour. Budgets, checkpoints, and knowing when to walk away. Export controls and the geopolitics of your AI coding stack https://ai.jakubjirak.com/p/ai-export-controls https://ai.jakubjirak.com/p/ai-export-controls Fri, 29 May 2026 12:00:00 +0000 policy The model behind your agent is also a geopolitical artifact. Export rules, open weights, and why where a model comes from is now an architecture decision. Knowledge graphs vs vector RAG: when relationships beat similarity https://ai.jakubjirak.com/p/knowledge-graphs-vs-rag https://ai.jakubjirak.com/p/knowledge-graphs-vs-rag Thu, 28 May 2026 12:00:00 +0000 rag Vector search finds chunks that look like your query. Some questions need chunks that are connected to each other. A practical comparison — and the hybrid that wins. Using AI to learn faster, not just to type faster https://ai.jakubjirak.com/p/ai-for-learning https://ai.jakubjirak.com/p/ai-for-learning Wed, 27 May 2026 12:00:00 +0000 workflow The biggest gain from these tools isn't the code they write — it's how fast they get you to competence in something you didn't understand yesterday. If you let them. Advanced agent architecture: context is the scarce resource https://ai.jakubjirak.com/p/advanced-agent-architecture https://ai.jakubjirak.com/p/advanced-agent-architecture Tue, 26 May 2026 12:00:00 +0000 architecture Past the basics, every hard agent problem is a context problem. Compaction, context editing, memory tiers, sub-agent isolation, and keeping intermediate results out of the window. Local-first, last-mile-paid: the model cascade that runs mostly free https://ai.jakubjirak.com/p/local-first-cascade https://ai.jakubjirak.com/p/local-first-cascade Mon, 25 May 2026 12:00:00 +0000 cost Do the bulk of the work on a free local model; escalate to Haiku, then Sonnet, then Opus only at the last mile where it's actually needed. The architecture and the triggers.