← all posts

The Code Looks Fine. That's the Problem.

I spend less time reviewing code now and more time reviewing the assumptions underneath it. The bugs got cheaper. The decisions got more expensive.

Since AI became part of how everyone writes code, I've had a nagging feeling that I'm wasting my time in code review.

Not because the code is perfect. Because the obvious mistakes have become rare.

The code usually looks fine. The functions are organized. The naming is reasonable. The tests exist. The implementation is often cleaner than what the person submitting it would have written by hand a year ago. I read it, I don't find the off-by-one, the missing null check, the unhandled error (the things review used to be for), and I sign off with a vague sense that I checked the wrong thing.

Because I did.

AI didn't remove the mistakes. It moved them.

I see fewer bad implementations now. I see more bad decisions.

  • A perfectly reasonable implementation of a questionable architecture.
  • A clean abstraction drawn around the wrong boundary.
  • A well-tested solution to a problem that shouldn't exist.

None of these trip a linter. None of them show up in a diff as something wrong. Every line is defensible on its own. The mistake isn't in any line. It's in the decision the lines are faithfully carrying out. And generated code is exceptionally good at being faithful to a bad premise.

That's the part that took me a while to name. The model doesn't push back on the premise. You hand it a flawed boundary and it will give you a beautiful, tested, well-documented implementation of that flawed boundary. The polish that used to be a signal of care is now free, which means it's no longer a signal of anything.

What I actually review now

The diff is still on my screen, but it's not what I'm reading. I'm spending the time on the layer underneath it:

  • Assumptions: what does this code believe about the world, and is any of it true?
  • Architecture: is this the right shape, or just a clean version of the wrong one?
  • Ownership boundaries: who is responsible for this data, this failure, this decision?
  • Operational impact: what does this do to the system at 3am, under load, when something upstream is already broken?
  • Recovery paths: when this fails, how do we get back to a known good state?
  • Failure modes: what are all the ways this is quietly wrong even when every test passes?

Almost none of that is visible in the code. It lives in the gap between what was built and what should have been built, and that gap is exactly where AI doesn't help you, because it never had an opinion about what should be built in the first place.

Why this is the expensive trade

Here's the asymmetry that makes this worth caring about:

Fixing code is usually easy. Fixing a system built on the wrong idea is expensive.

A bad implementation is an afternoon. You see it, you fix it, you move on. A bad decision compounds. By the time it's obvious, there's data shaped around it, other services depending on it, a migration standing between you and the correction. The cost of a wrong decision doesn't sit still while you decide whether to fix it. It accrues interest.

AI dramatically lowered the cost of producing implementations. It did nothing to lower the cost of a wrong decision. If anything it raised it, because now the wrong decision arrives wearing the clothes of a right one: tested, formatted, plausible, fast. It's easier than ever to build the wrong thing well, and harder than ever to notice from the diff alone.

This is "amplifies, doesn't absolve," seen from the reviewer's chair

I've written before that AI is a tool like every abstraction before it: it makes strong engineers faster and weak engineers more visible. It amplifies. It does not absolve.

Code review is where you actually feel that. The tool amplified everyone's ability to produce code, which means the bottleneck moved to the one thing it can't produce: the judgment about whether the code should exist in that form at all. The reviewer who's still hunting for syntax errors is auditing the part the machine already got right. The reviewer who's interrogating the assumptions is doing the part the machine can't do, and that part didn't get smaller. It got relatively larger, because everything around it got cheap.

So the job didn't disappear. It moved up the stack. Less time on "is this code correct," more time on "is this the right thing, built the right way, that we can operate and recover when it breaks."

AI didn't eliminate engineering judgment. It made it the bottleneck, and code review is where you find out whether anyone in the room has it.

The one-line version

AI made the obvious mistakes rare and the expensive ones invisible. Stop reviewing whether the code is correct and start reviewing whether the decision was, because the line you can fix in an afternoon was never the one that was going to hurt you.