I Let Claude Code Run for 4 Hours. It Built Something Nobody Asked For.

Your loop doesn't fail. Your brief does. 3 components that keep Claude from deciding what "done" means for you.

9 min read

I open the terminal. 4 hours. The dashboard is up on the private server, the API layer is still public. Roughly what I asked for. 😬

Also in the diff: 2 components extracted and moved to a file nobody asked it to open, a runtime version bump that silently changes the cloud build pipeline, and a commit pushed to main before the deployment was confirmed stable.

TLDR: In conversation, ambiguity produces a clarification question. In an autonomous loop, it produces hours of work in the wrong direction. This article covers the 3 ways loop briefs fail and the lines that prevent each one before you disappear.

Frazzled office worker staring at endless terminal output while a caped superhero casually points at a stop button; robotic lobster surfs across the desk.
When your AI assistant decides to rewrite your entire codebase unsolicited.

None of that is catastrophic. The session landed. But the 3-hour debug stretch that burned most of the compute came from a constraint missing from my first message. The task: move the order management dashboard off public hosting, onto a private server, internal-only. What I forgot to include: the API layer had to stay publicly accessible. External partners call it. Moving it would break everything downstream.

Claude asked, I answered, it adjusted. But by then the architecture was already sketched in the wrong direction, and the build conflict with the integration handler that followed took 200+ tool calls to untangle. In the post-mortem, Claude named the root cause itself: "If the first message had said 'dashboard goes internal but the API layer stays public,' I would have anticipated the build conflict. That divergence wasn't in the brief."

~600k tokens. The brief was 6 messages sent in 10 minutes while I had something else to publish.

Why Loops Break Differently Than Prompts

Everyone in the current loop hype wave is talking about /goal, sub-agents, max-turns flags, token cost. Nobody's talking about the brief.

In a conversation, ambiguity produces a clarification question. Claude says "did you mean X or Y?" and you correct in 10 seconds. In an autonomous loop, that same ambiguity produces hours of work in the wrong direction. The model doesn't stall when it hits an unclear constraint. It resolves the ambiguity itself and keeps going, which is actually the feature, right until it becomes the bug.

A LogRocket test of Claude Code with Ralph made this concrete. Brief: "Build a GitHub stats CLI tool. Make it good." The loop ran 5 minutes 41 seconds and delivered a functional tool with user profile fetching, language breakdown analysis, and rate-limit checking. None of it was requested. Same task with an explicit exit condition and scope fence: clean execution, no scope creep, aligned with the defined criteria on every point. The only difference was the brief.

A recent Medium breakdown of /goal mechanics named this pattern precisely. You hand Claude a long refactoring task, it runs dozens of steps, everything executes. Then you look at what was built. Technically correct. Just not what you wanted. "It drifted." Drift doesn't come from the model. It comes from the brief.

The mistake I kept making was assuming my prompting skills transferred to loop sessions automatically. The confidence that they do is the dangerous part, because the 2 skills feel identical from the inside. It's basically the "it works on my machine" of agent tooling: every supervised session ships cleanly, you've got the commit history to prove it, and then you fire off a loop brief with the same confidence and walk away.

Prompting is about giving Claude enough context to navigate the next few exchanges while you're watching, close enough to course-correct when it heads somewhere wrong. Loop briefing is about giving Claude enough constraints to navigate 50 or 80 steps without you present at all, where every ambiguity gets resolved by the model's best guess instead of your real intent, and those guesses compound across dozens of tool calls into something that can look completely correct on the surface while being structurally wrong for your actual needs. The vagueness that's tolerable in a prompt (the kind you resolve in 1 follow-up message) is fatal in a loop brief where there's no follow-up message, and the session has already burned 400k tokens by the time you look at what it built.

3 Drift Modes, 3 Lines That Fix

TITLE "The Loop Brief Stack" + subtitle "3 components · 3 failure modes · before you hit Enter". Metaphor: engineering control panel blueprint with 3 clearly labeled switches, each with OFF and ON position. Style: engineer blueprint aesthetic, white technical lines on dark navy background, precise annotations, blueprint-style font. Palette: navy #0A1628, blueprint-white #E8F0FF, yellow #FFD600, red #FF4444, green #00C853, black #111111. Content: 3 switch panels labeled SCOPE FENCE (left), EXIT CONDITION (center), ESCALATION CRITERIA (right). OFF state shows red indicator and failure mode label (scope creep, drift, silent loop). ON state shows green indicator and fix label (locked scope, binary check, auto-escalate). Highlight: EXIT CONDITION panel center-positioned and slightly enlarged, yellow border glow. Legend: sticky note bottom-left corner "OFF = loop decides / ON = brief decides". Footer: copyright rentierdigital.xyz. NOT flat corporate vector, NOT minimalist tech startup aesthetic, NOT stock infographic.
The Loop Brief Stack: Three Critical Control Components

There are 3 distinct ways loop briefs fail. Each maps to a missing component.

Scope creep: missing scope fence

Claude interprets qualitative instructions ("good", "clean", "complete") as an invitation to expand. It's not hallucinating, it's optimizing. The problem is its definition of "good" includes decisions you didn't authorize.

In my session: 2 components refactored and relocated during the debug phase, not because it was in the brief, but because the file structure was causing a build export error and fixing it was adjacent to the actual problem. The model solved a real issue. One I hadn't asked it to look at.

The fix is a scope fence: an explicit list of what doesn't get touched, as specific as what does.

Bad: "Refactor the auth module."

Good: "Refactor the auth module. Do not add new functions. Do not modify tests. Do not touch any file outside /src/auth."

The negative constraint does as much work as the positive one. I went deeper on building a CLI enforcement layer for autonomous agents in a previous piece, and the verification pattern maps directly to loop scope control.

Drift: absent exit condition

Execution runs clean, 30+ steps, no errors. But somewhere around step 12 the loop takes a fork you didn't intend, and by the time it delivers, you've got a technically correct solution to the wrong problem. In my session, the architecture fork (API layer stays public) was resolved in conversation during the first 15 minutes. But no exit condition encoded that constraint explicitly, so there was no ground truth for an independent checker to verify against.

Root cause: no exit condition, or one too vague to catch the wrong fork.

MindStudio's agentic loop guide puts it in 1 sentence: "Write the success condition before writing the prompt. If you can't define it in 1 sentence, scope the task down." Their companion rule: always pass --max-turns on autonomous tasks as a mechanical fallback when the exit condition isn't reached cleanly.

Bad: "When it looks good."

Good: "When all 47 existing tests pass, the dashboard responds on port 3013 on the internal network, and no file outside /src/dashboard has been modified."

Obviously, "when it looks good" can't be verified by anything that isn't you. An agent running in parallel has no access to your aesthetic judgment. The exit condition has to be binary and checkable without context.

Maybe I'm wrong on this, but I think the exit condition is harder to write than the scope fence. It forces you to commit to what "done" actually means before you've started. In my experience it's the piece that gets skipped first, and the one that costs most when it's missing.

Silent loop: no escalation criteria

Scope fence defined, exit condition defined. Still possible to burn your budget silently when the loop hits a state it can't resolve and keeps trying. One builder on X after 6 weeks of production loops: "The power is real but so is the cost when an agent silently enters a bad loop state... until your API bill arrives."

In my session this showed up as a zombie process serving an old build, producing a stretch of false-negative test runs before Claude correctly reported a blocker. The zombie wasn't responding to pkill. HAL 9000 energy: calm acknowledgment of every diagnostic query, active resistance to shutdown. Except it was a Unix socket and not a homicidal AI, which somehow made it harder to fix. With an escalation rule in the brief, it stops and reports instead of persevering.

Bad: nothing. Claude manages.

Good: "If the dashboard fails to load after 3 deploy attempts, stop immediately. Report the last error and the last 3 files modified. Do not attempt a 4th deploy."

That last line matters. Without it, Claude tries again, touches more files outside the scope fence to do so, and you come back to a codebase that's drifted further than where you left it.

The Other-Room Test

Before launching any autonomous loop, 1 question: could an agent that has never read my code verify this task is done?

If no, the exit condition is too vague.

Lance Martin from Anthropic's thread on loop design made the principle explicit: the grader has to be independent from the executor. Claude can't grade its own work. The verifier sub-agent receives the exit condition and returns a binary verdict. No access to context, intent, or conversation history. Just the condition and the current state. Pass or fail. This is what /goal is built around: a separate grader checking whether the defined success condition has been met, without reading your intent into it. The exit condition is the only interface between your intent and the verifier. If it's ambiguous, the verifier can't help you. (The RPG equivalent: asking the warrior to evaluate his own dungeon clear. He'll always say yes.)

There's a distinction worth drawing from the prompt contracts approach for supervised Claude Code sessions, where a precise brief structures a session you're watching and can steer in real time. Loop briefs are for sessions you won't be there to steer. The contract habit comes first. The loop brief extends it to autonomous sessions with a different constraint: no steering allowed.

Brief detour that has nothing to do with any of this: I've been running the same post-mortem prompt with human collaborators for a few months now, asking them "what in my brief, if different, would have changed your execution?" The answers are consistently more useful than any retro I've run. People don't naturally report the handoff failure. They describe what they built. Making them name the missing constraint produces something close to a real audit.

Brief First, Loop Second

The session cost: ~600k tokens, a runtime version bump that changed the cloud build environment without my review, 2 components moved in a codebase I wasn't planning to open, a commit on main before the deployment was stable, and 3 hours of debug from a constraint I'd left out of the first message. None of that is a model failure. All of it was predictable from the brief. The loop ran fine. The brief didn't, and that gap is what burned the 3 hours.

The 3 pieces that were missing: a scope fence listing what doesn't get touched, an exit condition verifiable by an independent agent, and an escalation rule that stops the loop before it burns through a bad state. Not in CLAUDE.md, which handles persistent behavior across sessions. In the brief itself, before each autonomous run.

A builder on X after burning a full session: "Lots of tokens... result was total crap. Had to start over." That's what an absent brief costs at scale.

If you're at the stage where loops run but don't always land, Vibe Coding, For Real (Amazon Kindle, free on Kindle Unlimited) covers the foundation: the method for going from "it runs" to "it ships reliably."


The 4 hours built roughly what I asked for, plus some things I didn't, and burned 3 hours of compute on a constraint that wasn't in the first message. In the post-mortem, Claude told me exactly which sentence would have changed it.

I have that sentence now. I'll write it first next time.

Go write the exit condition before the prompt.


Sources

This post may contain affiliate links. If you click them, I might earn a small commission (costs you nothing, and helps me keep shipping quality articles every day for your reading pleasure).


When Claude Code runs unsupervised, ambiguity doesn't trigger a clarification question—it triggers hours of work in the wrong direction. The production CLAUDE.md template in the welcome kit shows you how to write loop briefs that prevent drift before it compounds across 50+ tool calls.

→ Get the welcome kit