There’s a line from Boris Cherny, the head of Claude Code, that’s been quite present in the back of my mind for the past week. Paraphrasing: he doesn’t prompt the AI anymore. He writes loops, and the loops do the prompting. His job is to write the loops.
When the person who built the coding agent stops using it the way the rest of us do, that’s worth some brain cycles. So I sat with it.
Then I did the least glamorous possible thing with the idea, which is exactly why I’m writing this down.
What a loop actually is
Addy Osmani, a Google engineer with close ties to Gemini AI, has the best definition so far: a loop is a recursive goal. You define a purpose, and the AI iterates until it’s complete. You stop being the person who prompts the agent and become the person who designs the system that does the prompting for you.
For about two years, working with a coding agent meant one thing: you wrote a good prompt, shared enough context, read what came back, typed the next thing. The agent was a tool and you were holding it the entire time, one turn after another. Like a sculptor uses the chisel.
A loop hands that holding to a small system that finds the work, hands it out, checks it, writes down what’s done, and decides the next thing — and lets that poke the agent instead of you. It’s like your job is to create the sculptor now – or, to be more precise, a small army of sculptors.
Finding a test use case
I shipped 9 apps in the last 4 months, to bring my total in the App Store to ten. They cross-promote each other, which means every single one is supposed to have a “By the same dev” section in Settings, linking to the others. That’s the perfect candidate for a loop, a tiny problem spanning across multiple codebases.
To make things a bit more interesting, I decided to add something on top of that: a convention. Each app displays a version string in Settings that reads 1.4.2 (28) — semantic version, then the build number in parentheses. When you juggle 6-7 apps every day you kinda lose track of which build does what, so I wanted to make sure all apps display this accurately.
So, with these 2 simple things, I decided to write my first loop.
What I built
In total, the loop I built to solve those 2 problems is fifty lines of Python. It’s just a vanilla Python script that reads my app folders, grabs the necessary files (meaning the Settings.swift or SettingsView.swift), then sends them to a model with my two rules in plain English (the prompt), and prints a pass/fail verdict per app.
The most important part is this one:
for app in APPS:
state = collect_state(app) # read the files that matter
result = audit(app, state) # ask the model to judge them
print("✅" if result["ready"] else "❌", app)
It’s not very complicated: it has a folder map to which I can add more apps, my conditions in plain English (have the “By the same dev” section, and make sure the version is there in the specific format), and one small loop that feeds each app to the model and prints what came back.
The five building blocks of a loop (and where mine sits)
But a real loop has way more than that. There are many definitions, because these are a lot of moving parts, but in very simple terms, it’s something like this:
- Automations — something fires on a schedule and does the discovery for you, so you’re not the one going around checking. This is the heartbeat. It’s what makes a loop an actual loop instead of one run you did once.
- Worktrees — separate checkouts so two agents working in parallel don’t overwrite each other’s files. The moment you run more than one agent, this is the thing that stops parallel from turning into chaos. In my case, this one wasn’t needed, my loop was purely reporting, no code written.
- Skills — your project knowledge written down once (a
SKILL.mdfile) so the agent reads it every run instead of guessing. Without it, the loop re-derives your whole project from zero every cycle. - Connectors — the loop reaching your real tools: the issue tracker, the database, Slack, the App Store Connect API. The difference between an agent that says “here’s the fix” and a loop that opens the PR itself.
- Sub-agents — one agent has the idea, a different one checks it. Having models checking each other’s work is usually more effective.
And the sixth thing: memory. The model forgets everything between runs, so the memory has to be on disk.
My audit loop uses exactly one and a half of those five. It’s a hand-run automation (block 1, minus the schedule) with my conditions standing in for a skill (block 3, barely). No worktrees, no connectors, no sub-agents, no persistent memory.
But it was a very good learning experience.
What it caught
I ran the script across all my apps and most passed clean. The genuine findings were small and exactly the kind of thing I’d never catch by eye.
One of my oldest apps — the task manager that’s been around longest — has been shipping its Settings screen with the marketing version only. No build number. It reads 1.4.2 where every other app reads 1.4.2 (28). A one-line gap sitting there through who knows how many releases, because I’d long since stopped looking at that screen.
A real, useful catch. Also the least dramatic finding imaginable, which is precisely why a loop found it and I didn’t.
What it got wrong, and why that’s the actual lesson
Most “I built an AI loop!” posts skip this part, so it’s the part I want to dwell on. The loop lied to me several times, and every lie taught me something.
It complained about missing data I never sent. An app reported “no Settings screen, can’t verify anything.” There was a Settings screen. But my file collector was pointed at the wrong path and silently returned nothing, and the model dutifully reported on the nothing it received. A loop is only as honest as the state you feed it, and a script that reads zero files fails quietly — no error, just a confident wrong answer downstream. The fix wasn’t smarter prompting; it was printing the size of what I was about to send, so I could see when I was feeding it just empty air.
It flagged good code as broken. Several apps got dinged for “won’t compile, this property has no body.” None of it was true. I was truncating each file to save tokens, and the version code I cared about lived at the bottom — so I was decapitating the exact thing I was asking about, then watching the model correctly report that the fragment didn’t compile. Lesson: the thing under audit gets read whole. You truncate the supporting evidence, never the thing you’re judging.
It gatekept on wording. One app titles its cross-promo section “From the maker of…” instead of “By the same dev.” The loop failed it. The section was fine — I’d written my rule too literally and the model obeyed too literally. Fix: tell it to judge intent, not phrasing.
Not one of those was the model being dumb. Every single one was me handing it bad inputs or instructions, and the model faithfully reporting on exactly what I gave it. That’s the skill of loop engineering, and it isn’t prompting. It’s building inputs you can trust and knowing where the loop will be confidently wrong.
The senior-dev instinct, by the way, is the right one. Thirty-five years of shipping software taught me to distrust silent failures, and the reflex that says “this is just a for loop with an API call in it” is correct. That’s the feature, not the disappointment. The interesting part was never the loop — it was deciding what to check and learning where to distrust the answer.
Three things the loop still won’t do for you
Osmani names these better than I could, and they get sharper as the loop gets better, not easier:
Verification is still on you. A loop running unattended is also a loop making mistakes unattended. “Done” is a claim, not a proof. The whole reason you eventually split the checker sub-agent from the maker is to make the loop’s “it’s done” mean something — and even then your job is to ship code you confirmed works.
Your understanding rots if you let it. The faster the loop ships code you didn’t write, the bigger the gap between what exists and what you actually grasp. A smooth loop just grows that gap faster unless you read what it made.
The comfortable posture is the dangerous one. When the loop runs itself, it’s tempting to stop having an opinion and take whatever it hands back. Designing the loop is the cure when you do it with judgment and the accelerant when you do it to avoid thinking. Same action, opposite result. The loop doesn’t know the difference. You do.
That’s exactly why mine is deliberately read-only. It reports; it doesn’t touch a file. The worst it can do is have a wrong opinion in my console. The moment I let it edit code, every one of those false positives above becomes a wrecked Settings screen.
What’s next
The obvious next step, which I’m resisting on purpose: let the loop propose the fix, not just report the gap. That’s where a checker becomes an agent — and it’s where blocks four and five (a connector to act, a sub-agent to verify) finally earn their place. I’ll get there, but only behind a hard approval gate, because the false positives above are a perfect catalogue of what goes wrong unattended.
So I’m not uninstalling my IDE. I’m not running ten agents in parallel while I sleep. I took one repetitive chore, wrapped it in the smallest possible loop, and kept a human at the end of it. That’s the rung I’m on, and I’d argue it’s the right rung for most working developers right now — not because the higher rungs aren’t real, but because the discipline you build here is what makes them safe later.
Just start small, understand the implications and iterate. Do one sculptor first and make it carve a great statue. Then you can think about building a full army of sculptors.
dragos@dragosroua.com (Dragos Roua)
Source link
