// agents · claude-code

Claude Code: agentic coding from the terminal

$ Jakub Jirák · Jun 20 · 9 min

Claude Code's central idea is almost contrarian: the best place for a coding agent isn't a chat window bolted onto your IDE — it's the terminal, next to the tools that already define your project. git, your test runner, your linter, your build. The agent doesn't replace them; it drives them.

The loop

Under the hood it's an agentic loop with a small, sharp toolset: read files, search the tree, edit, run shell commands, observe output. The model plans a change, makes it, runs your tests, reads the failures, and tries again. You sit at the approval boundary for anything with side effects.

# the shape of a session
$ claude "migrate the auth module off the deprecated TokenStore API"
# -> greps for call sites, proposes a plan, edits across 7 files,
#    runs `pytest tests/auth`, reads 2 failures, fixes, re-runs -> green

What makes this work is the test suite as the oracle. The agent isn't guessing whether its change is correct; it's running the same check you would. Repos with real tests get dramatically better results — the feedback signal is the product.

What it's genuinely good at

Cross-file refactors. "Rename this concept everywhere and update the call sites" is the home turf. It holds the whole change in its head better than a human doing find-and-replace.
Greenfield-from-spec. Hand it a clear spec and a test harness and it will iterate to passing. Vague spec, no tests — it flails, same as a junior would.
Living in your toolchain. Because it shells out, it uses your formatter, your CI commands, your scripts. No translation layer.

The agent is only as good as the loop you close around it. Tests are not a nice-to-have here — they are the substrate that turns guessing into engineering.

How to actually drive it

Front-load context. A CLAUDE.md at the repo root — conventions, how to run tests, what not to touch — is the single highest-leverage thing you can write. The agent reads it like an onboarding doc.
Keep diffs reviewable. Ask for a plan first, approve it, then let it execute in steps. A reviewed 80-line diff beats a magic 600-line one.
Let it run the tests itself. Don't paste failures back manually; point it at the command and let it close the loop.
Use subagents for fan-out. Searching a big codebase or exploring several approaches in parallel is exactly what spawning helpers is for — keep the main thread's context clean.

Where the seams show

It is not magic. It will confidently walk into a wall if your project's "truth" lives somewhere it can't see — an undocumented staging step, a flaky test it learns to route around, an API whose real behavior contradicts its docstring. The failure mode is plausible wrong code, which is more dangerous than obviously-broken code. The fix is the same as for a human contributor: better docs, better tests, smaller steps.

Cost and control

Every action is token-metered, so a sprawling task has a real per-run cost. The discipline that keeps it cheap is the same that keeps it correct: tight scope, good context, small diffs. Don't add a budget dashboard before a task actually hurts — a --max-turns ceiling and a scoped prompt cover most of it.

If you only take one thing: Claude Code rewards repos that were already disciplined. Tests, docs, and conventions aren't prerequisites the marketing forgot to mention — they're the exact levers that turn the agent from a toy into a teammate. The landscape post puts it next to the others; the benchmark puts a number on it.

#claude-code#agents#anthropic