AI Code Generation Needs Better Error Handling in 2026

Until now, you picked a programming language based on what you knew, what your team knew, or what the project required. Python for data scripts, Go for backend services, C or assembly for drivers. The logic was simple: a human comfortable with a language is a human who ships.

This logic is dead.

Bun rewrote 960,000 lines of Zig to Rust in 6 days using Claude as the primary agent. 1,009,257 lines added, 6,755 commits, 99.8% of existing tests passing. The result: 13,044 unsafe blocks in the AI-generated Rust, against 73 in uv, Astral's Python package manager (350,000 lines of hand-written Rust, for comparison). The Rust compiler said yes to all of it. The humans said no. 13,044 times. The difference was not the agent. It was the andon cord.

The human doesn't write code anymore. They describe, and the agent writes. Claude Code, Cursor, Copilot Workspace (the wrapper doesn't matter). The agent doesn't find Rust intimidating. It doesn't need 3 years to internalize the borrow checker. It writes in any language at the same speed, with the same indifference.

So when you start a new project, the question shifts. Not which language you know but which language tells the agent what it did wrong, fast enough to stop it before 500 more lines land on top of the mistake?

Split-panel illustration comparing frantic office worker surrounded by error messages versus calm developer with real-time compiler feedback — Your debugging strategy: panic. The compiler's strategy: prevention. Guess who wins.

The CI Called It Slop

PR #30412 merged on May 14. 1,009,257 lines of new Rust. 4,024 deleted. 2,188 files changed. 6 days from first commit to main. Binary shrank 3 to 8 MB. 99.8% of the test suite passed on Linux x64.

GitHub's CI then auto-tagged the Zig deletion PR "ai slop." Nobody configured that rule manually.

The Hacker News discussion ran 742 comments, 667 points. Tech press covered the number that makes a good headline: 1 million lines in 6 days. What got less coverage was the structural footnote: 13,044 unsafe blocks in the AI-generated Rust, against 73 in uv, a comparable 350,000-line Rust project written entirely by hand. Roughly 178x more unsafe blocks in total. The density per line of code works out to about 62x.

The Rust compiler approved every single one of those lines.

Jarred Sumner, who built Bun, confirmed the team "hasn't been typing code ourselves for many months now." The Zig-to-Rust switch was partly forced: Zig's core team has an explicit no-AI-contributions policy, which became incompatible with Bun's workflow the moment Anthropic acquired the project in December 2025. Rather than fight the upstream culture, the team switched languages.

This is not a botched rewrite. The code works. It's a demonstration of what gets through when you generate at speed without the right rejection mechanism.

Everyone Says Rust. Nobody Says Why

Ask a dev why agents should use Rust and you get the performance answer. Faster binaries, memory safety, zero-cost abstractions. The answer is not wrong, it's just the wrong reason for agents specifically.

Rust has topped the Stack Overflow "most loved language" survey every year since 2016. The survey does not ask whether respondents are the ones actually writing it. For a long time "most loved" and "most used" were very separate lists (the borrow checker will do that to adoption curves). Agents don't have adoption curves. They don't have feelings about the borrow checker. They have a compile loop.

Runtime benchmark numbers don't change the agent's feedback loop. A Rust binary running 40% faster than Go at execution time is orthogonal to whether the agent writes better code during generation. The binary speed doesn't affect how fast the agent catches mistakes.

What matters is how fast, and how precisely, the environment tells the agent it screwed up.

The syntax is valid, the semantics are broken, and there's no signal until something fails at runtime. By then, 3 more functions have been written on top of the broken assumption. The chain never stopped. Rust tells the agent immediately: compile fails, error message, location, type mismatch, path to correction. The agent reads, corrects, reruns.

The Andon Cord

In Toyota factories in the 1950s, they installed a cord running along every production line. Any worker could pull it at any moment. The entire line stopped. A defective part arrived, the cord got pulled, the problem was fixed before the next component was attached on top of it.

They called it the andon cord. A 2-minute stop was cheaper than 40 minutes of rework at the end of the line. The constraint made the overall system faster, not slower.

The compiler is the andon cord for the agent. The loop works like this: the agent writes code, the compiler checks it, the compiler either lets the code through or pulls the cord and emits a structured diagnostic. The agent reads the diagnostic, fixes the issue, and reruns. Without the cord, the agent writes 500 lines on top of a broken assumption and the problem surfaces at runtime, 3 sessions later, in a stack trace that points to a symptom instead of the cause. With the cord, the problem surfaces in seconds, in compiler output specific enough for the agent to act on immediately.

This is the real variable in AI-assisted code quality: not which model you use, not how carefully you prompt, but whether the environment pulls the cord fast enough and with enough diagnostic precision that the agent can self-correct before the debt accumulates.

(Completely off topic: I've been watching old Hanna-Barbera cartoons with my kid this week and can't stop thinking about how the limited-frame-budget animation style became an aesthetic people still imitate long after the budget constraint that created it disappeared. A production limit hardwired into a medium. Nothing to do with compilers, just how my brain works.)

The richness of the cord matters as much as its existence. A compiler that says "error on line 42" gives the agent a location. A compiler that says "you're trying to multiply Option<&u32> by u32 on line 42, call .unwrap() or match on the Option first" gives the agent a location, both types, what was attempted, and a repair path. The agent doesn't need to infer anything. It reads, applies, reruns.

This is the same principle that explains why CLIs beat MCP for AI agents. The environment you choose determines how much signal comes back when something breaks. Language choice is that same decision at a lower level.

From No Cord to Full Cord

Same scenario across the spectrum. You have a dictionary. A key is missing. You try to use the value in arithmetic.

Python: no cord.

data = {"price": 100}
total = data["quantity"] * data["price"]

KeyError: 'quantity' hits at runtime, possibly 3 functions downstream, possibly in production. The agent had zero signal at generation time. The chain ran. The part was defective. Nobody pulled the cord because there was no cord to pull.

TypeScript with noUncheckedIndexedAccess: partial cord.

const data: Record<string, number> = { price: 100 };
const quantity = data["quantity"]; // type: number | undefined
const total = quantity * data["price"];
// TS2532: Object is possibly 'undefined'

Caught before execution. Short message, actionable: location and type constraint. TypeScript won't help with memory layout or thread safety, but for application-layer logic it catches this class of mistake reliably.

Go: syntactic cord, no semantic cord.

data := map[string]int{"price": 100}
total := data["quantity"] * data["price"]
fmt.Println(total) // prints 0, no error

Go refuses to compile unused imports or unused variables. Real hygiene. But map lookups on missing keys return the zero value silently. data["quantity"] returns 0. total is 0. The function continues. Something downstream gets a wrong number, and the error message surfaces 3 functions later pointing at a symptom. Stack Overflow calls this "just how Go works." Your agent calls it a bug.

Go compiles in about 2 seconds on a typical service codebase. Rust takes 30 seconds or more on comparable code. I think TypeScript strict mode actually edges Go for most web service use cases, but I could be wrong on that for teams with heavy concurrency requirements. Go's cord is real, it's just narrow: structure gets caught, semantics don't.

Rust: full cord.

use std::collections::HashMap;

let mut data = HashMap::new();
data.insert("price", 100u32);

let quantity = data.get("quantity"); // type: Option<&u32>
let total = quantity * data.get("price").unwrap_or(&0);

error[E0369]: cannot multiply `Option<&u32>` by `u32`
  --> src/main.rs:8:21
   |
8  |     let total = quantity * data.get("price").unwrap_or(&0);
   |                 ^^^^^^^^
   |                 Option<&u32>
help: use `Option::unwrap_or`, `Option::unwrap_or_else`,
      or match to handle the None variant before multiplying

Location, both types, what was attempted, and a repair path (4 lines). The agent reads, applies, reruns. The Rust compiler sounds like it has a personal stake in your success, and for an agent, that's exactly what you want from a tool.

Ada: maximum cord.

Ada was designed in 1983 so that errors wouldn't kill people in military embedded systems. Uninitialized variables, integer overflow, array bounds violations, implicit type conversions: all caught at compile time, by default, with diagnostics precise enough to feel confrontational. The Mars rover runs Ada. The James Webb Space Telescope runs Ada. The compilers in question have never once asked whether a human felt like dealing with this today.

The industry largely rejected Ada for general software use because the strictness was too painful for human developers. Too much ceremony. Too many things requiring explicit annotation.

Ada: too strict for humans. Agents don't care.

Speed Without a Cord Is Debt

Seatbelts became mandatory when cars got fast, not when they got slow. Circuit breakers were added to financial markets after algorithmic trading started executing thousands of orders per second with nothing to stop them. The pattern: generation speed needs rejection infrastructure at matching scale.

The 13,044 unsafe blocks in Bun's rewrite are not a failure of Claude's code generation. They are the places where the agent stepped around the cord deliberately, using Rust's unsafe keyword to bypass the borrow checker on semantically complex sections. The cord was there. The agent chose to disconnect it in those spots. The debt is structural, auditable, and the Bun team will work through it. But it exists because generation speed outran the feedback loop.

Your vibe coding stack runs the same pattern at smaller scale. What Claude Code tutorials miss about production includes these environment-level decisions: which compiler, which strictness settings, which type system (set before the first prompt).

For a Next.js SaaS: TypeScript with strict: true and noUncheckedIndexedAccess enabled. Catches the class of errors agents generate most often at application layer.

For backend services or CLIs: Go or TypeScript depending on performance constraints. Go's 2-second compile loop makes iteration fast even with weaker semantic guarantees.

For system software, edge runtimes, anything that touches memory directly: Rust. Not for the performance. For the compiler.

For missile guidance software: Ada. (No one's asking, but the answer is Ada.)

2 prompts for the next time you start a project or audit an existing codebase:

I'm starting a project where AI agents will write most of the code.
I want the language that gives the agent the richest compile-time feedback
when it makes mistakes. Ignore my personal familiarity with the language.
Project type: [saas app / CLI tool / system service / other].
Recommend a language and its strictest compiler/type configuration,
optimized for agent error signal quality, not human developer comfort.

I have a [language] codebase where AI agents generate most of the code.
What compiler flags, type checker settings, and linter rules should I
enable to catch more errors at compile time before they hit runtime?
Give me a prioritized list from easiest to enable to most aggressive.

I use Claude Code every day. Bun is the runtime underneath it. I didn't know until last week that this runtime runs on 1M lines written by Claude in 6 days, with 13,044 unsafe blocks waiting for audit.

It doesn't scare me. The tests pass. Jarred Sumner is not the type to leave a live grenade in prod.

What it made me do is look at my own pipelines. The places where I left room for the agent to generate fast without a net. TypeScript running without strict: true, schema validation sitting in a comment instead of a constraint (everywhere the compiler doesn't pull the cord, bugs collect under different names).

In your codebase, they don't show up as unsafe blocks. They show up as prod bugs, 6 weeks later.

Sources

Bun Rewrites 960K Lines of Zig to Rust Using Claude, AI Weekly, May 2026
Bun Rust Rewrite Merged: The 13,000 Unsafe Block Problem, ByteIota, May 14, 2026
Rewrite Bun, EarlyTerms, May 2026
Anthropic's Bun Rust Rewrite Merged at Speed of AI, The Register, May 14, 2026

This post may contain affiliate links. If you click them, I might earn a small commission (costs you nothing, and helps me keep shipping quality articles every day for your reading pleasure).

Bun rewrote 960,000 lines to Rust in 6 days with Claude, but shipped 13,044 unsafe blocks the compiler approved anyway. The real win wasn't the language, it was the feedback loop. The production CLAUDE.md template in the kit shows you how to build that andon cord into your own stack before the agent ships broken code to prod.

→ Get the welcome kit

Your Vibe Coding Stack Has No Andon Cord. That's Why It Breaks.

Every line your agent generates without a compiler catching mistakes is silent debt. Ask Bun's 13,044 unsafe blocks.