I'm a product manager. My code merges without an engineer reading the diff

If your team uses coding agents, you've probably watched an engineer scroll through a 2,000-line pull request that an agent wrote in four minutes, wondering what "reviewing" it is even supposed to mean. Every team that's adopted agents hits some version of this wall, because producing code got fast and cheap while the review process is still the one we designed for code humans typed.

That's why a few weeks ago we published a manifesto arguing that AI is not your peer. When an agent writes your code, the "peer review" of that PR is really just an audit of a machine. In fact, the artifact humans should be reviewing is the plan behind the change. Well, this is what that looks like in practice, from the unlikely perspective of a product manager, and how it changed the way I work.

The problem with being a PM who can suddenly "code"

Agents made it easy for me to produce code, but a PM generating code is old news by now. The story starts when my code met the team. The old loop went like this: I'd write a PRD, an engineer would interpret it, and what shipped was a translation of that translation.

When I started using agents to skip the translation and open PRs myself, I hit a different, more formidable wall: code review. My PRs collected comments that were, if I'm honest, reasonable. They just weren't about decisions I'd made. They were about implementation choices I didn't have, and didn't want to have, opinions about: which libraries, which patterns, or how they'd have structured it.

The engineers weren't wrong to comment, that's what code review asks of them. And it's the way engineering teams have been working for the last 15+ years. Look at a diff and form an opinion on every line. But most of those lines weren't really mine. They were the agent's guesses about details I'd never specified, and as a team we were now spending our most expensive conversations on the part of the work neither of us had actually decided.

This only compounded as I shipped more. A PR I opened in the morning might not get reviewed until the next day, and by then I'd be deep in other work or product and sales calls. Answering a comment stopped costing five minutes and started costing the half hour of loading the whole problem back into my head. And that's not just because I'm forgetful, but because in a startup you wear a lot of hats, and taking them all off to climb back into a decision you'd already agonised over is its own tax that was getting increasingly frustrating and inefficient.

Moving the review to the artifact I can defend

Around the same time it turned out that everyone else on my team was having their own separate pain with code reviews. So after a team trip to France where we all sat around and tried to figure out how to fix this, we ended up changing where review happens. This is a simplified version of what I, and my team, now do:

The old loop versus the new loop — review didn't go away, it moved

The old loop vs the loop now: review didn't go away — it moved.

1. I write a plan, not a ticket. I sit with a reasoning model, currently Opus, for sometimes hours in Cursor. We research the codebase, dig through customer data, and poke at the edge cases. I can get it to explain things simply or in a way that I'd understand. The output we produce is a plan which covers the business context, what's going to change, why, what it affects. Most importantly, our plan format forces every implementation detail that matters to be labelled explicitly in the plan. Nothing that matters is hidden in a diff. If an implementation detail matters, which library, which pattern, what happens on failure, it goes in the plan, where it's visible and cheaper to change / iterate on.

2. An engineer reviews the plan. This is the peer review. A human who knows the system reads the plan and pushes back on the approach and its blast radius, and catches the things I didn't know I didn't know. These conversations are better than any PR thread I've been in, because we're arguing about the decision before anyone has spent days building it.

3. An agent implements the approved plan. Because we have spent time working on the plan, I trust that I can use a faster and cheaper model (which my boss likes), currently Composer 2.5, to do the doing. The thinking has already happened. I don't really care about the keystrokes. Plus we have safety nets built in for this in the next section.

4. Agents review the code. We use Bugbot internally to review the code for bugs. We also created an internal tool, Brent, that provides a second check that compares the code against the approved plan and flags any deviations: anywhere the implementation drifted from what was agreed in the plan. Most Bugbot comments and deviations get resolved automatically by another agent in a loop and I review anything that gets escalated. Sometimes there's a back-and-forth of three or four rounds while the agent works through the missing or changed bits before it's finally approved.

5. Humans review deviations, not diffs. If the code matches the approved plan, it auto-merges, and it's an amazing feeling. If it deviates, I provide a justification and then someone in my team (a human) looks at just those deviations, not the whole diff. Which leads us nicely onto the next section.

A Brent deviation review comment showing two explained differences

A deviation review: "Acknowledged — ready for your review. 2 differences, both explained." Each change shows the author's justification and the exact code, with the reviewer asked only to vouch for the two deltas.

"So engineers never see your code?"

You're probably thinking this sounds like an excellent workflow for a PM to dodge engineering scrutiny of your code, and you're right in thinking that.

And before I tell you why it's the opposite, let me admit what we did give up, because it's not nothing. Code review was never only about catching mistakes. It was also where people learned, where juniors got better because seniors pushed back, where two people argued about an approach and both came away smarter. When the author is an agent, that second half doesn't really work anymore. The agent doesn't get better because you commented on its PR; it's exactly as good tomorrow as it was today. That part of review, the part that made us better at our jobs, is going away, we don't need to pretend anymore.

Except what actually happened is that more of my work gets scrutinised now than ever before, by the same engineers, just earlier, when their scrutiny can still change the outcome. Every decision I make exists in a reviewed, versioned document with an engineer's sign-off on it. The opinions that used to arrive as PR comments now have a proper home: if an implementation detail matters to the team, it belongs in the plan, and the plan review is exactly where to raise it. And if something isn't in the plan, the team has explicitly agreed it's a detail the agent gets to decide and the agent reviewers get to check.

We didn't remove review. We removed the renegotiation of intent that used to happen inside a pull request, where (as the manifesto put it) the thoughtful decisions and the mechanical guesses get fused together so tightly that nobody can review one without relitigating the other.

What it feels like

I'm loving it. I can jump off a customer call and go fix the thing they were just complaining about, myself. When the team is pushing to get a milestone over the line, I can actually pitch in instead of watching. People might assume the appeal is that I finally get to write code, but what I really got is a different limit on what I'm useful for: how well I understand the problem, rather than my job title or my ability to defend a diff I didn't really author.

The engineers I work with get a better deal too. Nobody is auditing an agent's output at 10pm anymore. They read my thinking once, early, where their experience counts, and after that an agent (Brent) checks that the code matches what we agreed.

After years of writing tickets for other people to build, watching a plan of mine turn into shipped backend code still feels slightly wrong, like I've got away with something. But the actual job was always understanding the problem well enough to plan the solution. That part is still entirely human. We just review it properly now.

We've been running this workflow on ourselves for months, and Brent — the tool that runs the loop, the plan reviews, the deviation checks, the audit trail — is what we're building. If you want to follow along: notyourpeer.com.

What the numbers look like

Accepted lines and agent requests by month, December to June

Jun is a partial month (to the 8th). Accepted lines from May onward are inflated by the telemetry change.

Seven months, in numbers:

~190,000 lines of AI-suggested code accepted into my editor
~124,000 lines landed in git, 388 commits
4,400+ agent requests (up roughly 5× from December to May)
Four stacks: Terraform, TypeScript, documentation systems, and Go backend services

I'll be straight about one number, because this piece is aimed at people who check: the accepted-lines figure flatters me. A telemetry change in April means rejected suggestions stopped being counted from then on, so the git column is the conservative one. It's still a hundred and twenty-four thousand lines.

The shape of the data tells the story better than the totals anyway. February was my biggest code month, about 38,000 lines committed. Then March collapsed to 2,500, because March was a planning month and the output was plans and specs. Those plans became the 23,000 lines of backend Go that shipped in May.

Data appendix

Month	Agent requests	Accepted lines	Lines committed to git	Commits	Dominant work	Planning model
Dec 2025	219	5,863	9,212	31	Go + Terraform (onboarding)	Opus 4.5 high-thinking
Jan 2026	759	20,859	25,464	124	Terraform + TypeScript	Opus 4.5 high-thinking
Feb 2026	583	25,072	38,018	85	TypeScript / frontend	Opus 4.6 high-thinking
Mar 2026	671	16,310	2,500	13	Markdown / docs & specs	Opus 4.6 high-thinking
Apr 2026	852	30,965	9,704	15	Markdown / docs	Opus 4.6 → 4.7 high-thinking
May 2026	1,167	61,065	23,050	73	Go (backend)	Opus 4.7 (high / xhigh)
Jun 2026*	131	28,779	15,749	47	Go (backend)	Opus 4.8 high-thinking

Partial month, to 8 June.

◼ END OF TAPE

PLEASE BE KIND — REWIND