Compound engineering with Claude Code: chaining agents that ship

7 min read
  • Compound engineering Claude Code turns each completed work cycle into permanent agent memory, so every new session starts with stronger context than the last.
  • The method rests on a four-step loop: Plan, Work, Review, Compound, documented by Every's engineering team in 2025.
  • CLAUDE.md is the compounding layer (each cycle writes project knowledge back into the file Claude Code loads automatically at session start).
  • Gains plateau when teams skip the Compound step under deadline pressure; the improvement accumulates only when every cycle ends with a CLAUDE.md commit.

Two engineers. Throughput of fifteen. That is the number Every's team published in July 2025 to describe what compound engineering Claude Code produces in practice.

The method is not a prompt trick or a context hack. It is a writing discipline: after every session, the agent's findings go back into CLAUDE.md, so the next session starts with a stronger foundation than the one before. This article walks through the four-step loop, defines what belongs in the agent memory layer, and covers which multi-agent patterns hold up heading into 2026.

What compound engineering Claude Code actually means

The term was coined by Every's engineering team in 2025. The analogy is financial: just as compound interest accretes on accumulated gains rather than on the original principal, compound engineering Claude Code builds each session on knowledge deposited by the previous one.

Without that feedback loop, Claude Code sessions are stateless from a project-knowledge standpoint. The agent re-asks the same questions. Rediscovers the same edge cases. Relearns the same conventions. Flat returns per session.

With compounding, the agent memory layer grows. Every's 2025 write-up reported teams dropping from roughly eight clarification rounds per feature to one or two after four compound cycles. The mechanism is direct: anything the agent had to ask about, or got wrong, goes into CLAUDE.md before the session closes. The next agent session reads that file at startup and does not repeat the same mistakes.

This iterative AI workflow works with any model that reads a persistent context file, but Claude Code's CLAUDE.md convention is the natural host. The file loads automatically, lives in git alongside the code, and is writable by the agent during the Compound step.

The four-step loop: Plan, Work, Review, Compound

Plan: writing a spec the agent can execute without clarification

The Plan step produces a machine-readable task description: named files to create or modify, acceptance criteria stated as observable behaviors, known constraints, and the commands or sub-agents to invoke.

Vague input ("add a caching layer") generates clarifying questions. Precise input ("create lib/cache.ts exporting get(key: string): Promise<T | null> backed by Redis with a five-minute TTL, without touching existing callers in api/routes.ts") does not. Ten minutes spent on the spec saves forty minutes of back-and-forth during Work.

During Work, the agent executes against the spec. Claude Code handles file edits, terminal commands, and test runs. After Work, a dedicated Review invocation examines the diff against the original spec, focused on one dimension at a time: correctness, security, or performance. A narrow review context produces more actionable findings than a broad sweep across the whole codebase.

Compound: treating CLAUDE.md edits as first-class commits

After every cycle, Review findings go into CLAUDE.md: failure modes hit during Work, conventions confirmed, architectural constraints discovered. These edits ship in the same commit as the feature code, reviewed and approved like any diff.

That discipline is what makes the self-improving dev loop durable rather than a one-session experiment. The official compound engineering GitHub plugin, released in late 2025, enforces the Plan, Work, Review, Compound sequence as a prompt-template chain. It removes the risk of skipping the last step when shipping pressure builds 😤

CLAUDE.md as the compounding layer

The agent memory layer divides into two categories: what should survive context resets and what should not.

What belongs in CLAUDE.md:

  • Project-specific commands with exact syntax, tool rules, and environment setup steps.
  • Architectural constraints that are non-obvious from reading the code ("never call the payments API from a background job; queue via Redis").
  • Known failure modes and workarounds ("running full typecheck takes 90 seconds; use incremental mode against changed files only").
  • Test and deploy conventions that differ from framework defaults.

What stays ephemeral, in the session prompt rather than in CLAUDE.md: task-specific instructions, branch names, ticket IDs, anything that will be wrong next sprint.

Will Larson's January 2026 analysis on lethain.com observed that the CLAUDE.md and AGENTS.md pattern was being absorbed into Claude Code's default tooling as a first-class feature. Moving it from a community convention to a documented standard. That trajectory means the behavior is increasingly stable and supported.

File size is a practical ceiling. A CLAUDE.md that exceeds a few thousand tokens competes with the task prompt for the model's attention. Audit the file every two or three cycles: remove stale entries, consolidate overlapping rules, and reference large architecture documents with @path/to/ARCHITECTURE.md rather than inlining them.

Chaining agents across tasks: patterns that hold in 2026

Claude supports a 200k token context window. In practice, reliable instruction-following degrades above roughly 40,000 to 80,000 tokens based on multiple 2025 benchmarks. Chained AI agents that each hold a focused slice of context (one for planning, one for implementation, one for review) outperform a single agent trying to hold the entire problem space.

Structured handoffs reduce coordination cost. Pass JSON between agents rather than prose summaries. A review agent that returns { "issues": [...], "blockers": [...] } is easier to parse in the next step than a paragraph of natural language. Claude Code's tool-use output format supports this directly.

Define explicit stop conditions in the Plan step: how many iterations the automated coding cycle runs before escalating to human review. Three cycles on a single feature is a reasonable ceiling before coordination overhead outweighs the compound gain.

Where chained agents break down: tasks that require judgment calls the agent cannot resolve from CLAUDE.md context. These surface as repeated clarifying questions or as silent wrong assumptions that pass undetected through Review. The fix is a more precise Plan step, not more iterations.

Measuring the compound effect: numbers and honest caveats

The concrete data points available as of 2026:

Every reported in July 2025 (via uncharted.co) that two engineers matched the feature throughput of a 15-person team after an extended period of compound engineering Claude Code with consistent Compound-step discipline.

Every's 2025 detailed write-up documented teams dropping from roughly eight clarification rounds per feature to one or two after four compound cycles.

Community self-reports from 2025 discussions cite 30 to 50 percent reduction in back-and-forth prompt rounds after ten compound cycles. These figures are self-reported and uncontrolled; treat them as directional.

Honest caveats: gains are additive, not exponential. The improvement per cycle diminishes once CLAUDE.md stabilizes. One skipped Compound step is recoverable. A sustained pattern of skipping (which typically happens under deadline pressure) erodes accumulated context over weeks. Every's throughput claim also reflects many variables beyond the methodology: codebase design, tooling quality, and engineer capability.

What you can measure directly: clarification prompts per task, session length, and rework commits. All three decline measurably after five to ten compound cycles in a mature codebase.

The Claude Code documentation covers CLAUDE.md loading behavior and the agent memory model in full. The Claude Code GitHub repository tracks releases of the official tooling.

FAQ

Does compound engineering work with models other than Claude?

The loop works with any model that reads a persistent context file at session start: GPT-4o with a system-prompt file, Gemini with a context document, or Cursor with .cursorrules. Claude Code's advantage is that CLAUDE.md loading is automatic, versioned in git, and documented as a first-class feature. This reduces the setup friction for teams that want the pattern without custom scaffolding.

How long before CLAUDE.md becomes a maintenance burden?

Typically after fifteen to twenty compound cycles. Set a recurring audit to remove rules that no longer apply, consolidate duplicates, and move large reference sections to external files linked with @. Keep the active file under 3,000 tokens. A file that grows without pruning starts producing contradictions the agent resolves arbitrarily rather than correctly. This progressively degrades the quality of the loop.

Is the official plugin required?

No. The compound engineering GitHub plugin, first released in late 2025, automates the loop structure and reduces the risk of skipping the Compound step when time is short. The methodology works without it through a manual checklist in the Plan step and a consistent CLAUDE.md commit after every cycle. Teams with strong commit discipline often skip the plugin entirely and see comparable results.

Ce qu'il faut retenir

Compound engineering Claude Code is a writing practice as much as an AI practice. The agent improves only as fast as you write back to its context. Treat every Compound step as a commit, not a note.

Measure clarification prompts per task before and after four cycles: the reduction will be visible within a week. Skip the Compound step once and you defer the gain. Skip it under every deadline and you erase it 🔄