The automatable middle | ticketyboo.dev

Who this is for: Engineering leaders and developers making decisions about where AI fits in their workflows — and where it doesn't.

Key observations:

Coding and testing is 25–35% of the development lifecycle. AI automates that slice. The other 65–75% is unchanged.
Experienced developers on real codebases can be slowed down by AI tools, not sped up. Context-dependent, not universal.
Half of AI-generated code that passes automated tests gets rejected by human maintainers. It compiles. That's not the same as shipping.
The bottleneck has moved upstream. Intent quality — what you're trying to build and why — is the new constraint.

There are two questions at the core of any software project. The first is asked before a line of code is written: what do I want this to do? The second is asked when it's done: did it build the way I intended?

Everything between those questions — scaffolding, boilerplate, test generation, pull request review, dependency updates, documentation, routine debugging — is the automatable middle. Not "will be automatable someday." Already is. The evidence is in the adoption numbers: 85% of developers now report generating at least some of their code with AI. 64% of companies generate a majority of their code that way.^[1]

The interesting question isn't whether this has happened. It has. The interesting question is what it changes.

The 25–35% problem

Bain ran an analysis of what the development lifecycle actually looks like in practice. Writing and testing code — the part AI is genuinely good at — accounts for 25–35% of it.^[2] The rest is requirements, design, review, deployment, maintenance, incident response, and the countless meetings and alignment work that surround all of those.

If you speed up 30% of a process by 50%, you've made the whole process about 15% faster. That's useful. It's not transformative. And it moves the bottleneck somewhere harder to see.

25–35% Of the lifecycle is coding + testing

10% Actual measured productivity gain across 40 companies

50% Of AI code passing automated tests gets rejected by maintainers

19% Slower — experienced OSS devs on real codebases with AI tools

Sources: Bain 2025 / DevPlan; Abi Noda, 40 companies Nov 2024–Feb 2026; METR / The Decoder, March 2026; METR / Stanford DEL, March 2026

The 10% figure comes from a clean analysis of 40 companies over 15 months tracking whether teams shipped more pull requests as AI adoption increased. The number is real. It's just not the one that gets into the headlines.

The contradiction that proves the thesis

Google ran a randomised controlled trial with 96 engineers. AI features cut task completion time by about 21%.^[3] That's the number that gets cited. The number that's less cited: METR ran a similar study with experienced open-source developers on their own codebases. With access to frontier AI tools, those developers took about 19% longer.^[4]

They also predicted they'd be 24% faster. They believed, as the study ran, that they were 20% faster. They were wrong about the direction.

The explanation isn't that AI is bad at coding. It's that what AI helps with is strongly context-dependent. Junior engineers on unfamiliar greenfield tasks: faster. Experienced engineers on complex, constraint-laden, context-rich existing systems: not faster, and sometimes slower. The tool is good at the parts that weren't the hard parts.

Fred Brooks put the essential insight in "No Silver Bullet" in 1987. He distinguished between accidental complexity — the difficulty we introduce through our tools, processes, and representations — and essential complexity: the irreducible difficulty of the problem itself. His argument was that no technology would ever give a 10x productivity gain, because accidental complexity is only a fraction of the work.

That argument holds. AI is the most powerful accidental complexity eliminator ever built. It can generate the boilerplate, write the tests, produce the documentation. The hardest part of building software, as Brooks wrote, is "deciding precisely what to build." That hasn't changed.

Roads and speed

Speed is a function of the road as much as the vehicle. You can go 200mph in a car. Whether you should depends entirely on whether the road was built for it.

AI-assisted development is the vehicle. Governance — explicit standards, reviewable decisions, clear architectural principles — is the road. Carnegie Mellon published a large-scale analysis of GitHub projects in March 2026 showing that initial speed gains from AI coding tools often disappeared due to downstream quality problems.^[5] The code shipped. The problems arrived later.

The subprime liability problem: AI-generated codebases without human engineering judgment behind them have been described as "subprime liabilities" — they compile, they pass tests, they look fine. The risk is invisible until it isn't. Speed without governance doesn't reduce risk; it concentrates it downstream.

This is why governance as code matters — not as bureaucracy, but as road engineering. Explicit rules about what good looks like, applied before anything ships, mean the speed gain from AI doesn't come with a debt that comes due later. The Gatekeep patterns on this site work this way: they run before code ships, not after.

Diagram comparing unrouted AI task execution (all tasks sent to expensive model at ~$1.20 average) versus routed execution (classified by complexity, ~$0.08 average). 70% of tasks route to the cheapest capable model. — Real routing data from the Trinity stack. Deterministic classification before model selection. The work is the same, the cost isn't.

The quality gap

METR published a study in March 2026 showing that roughly half of AI solutions that pass SWE-bench automated tests would be rejected by actual project maintainers.^[6] The failure modes are specific: coding style, repository standards, complex project logic. AI passes syntax. It doesn't understand the unwritten rules of a codebase — the conventions that never made it into documentation because they felt obvious to the people who built it.

That gap is exactly where human judgment lives. Not in the code that follows the documented pattern. In the code that navigates the undocumented constraint.

The implication isn't that AI-generated code is unusable. Most of it isn't in that 50%. The implication is that the review layer — the human who understands the context — becomes more important, not less. You're reviewing more code. The quality bar for that review needs to be higher.

Where the bottleneck moved

As execution compresses from weeks to minutes, the constraint migrates from the ability to produce to the quality of thought. The intent layer, what should exist and why, is now where the hard work happens.

The automatable middle compresses. Intent and design absorb the freed capacity. The constraint moves upstream.

This has a name: spec-driven development. It's not new as a concept. It becomes unavoidable as an approach when the cost of building wrong drops to near zero. If you can rebuild in hours, vague requirements become expensive not because of the implementation cost, but because of the direction cost.

There's a direct application to how AI coding agents work. They're very good at executing a well-specified problem. They're unreliable at inferring a poorly-specified one. The quality of the output is bounded by the quality of the input. That was always true. It just didn't matter as much when implementation was the expensive part.

This is what the ticketyboo.dev build demonstrated — not as a showcase, just as a data point. 43 repositories, built substantially by agents in 70 days. The governance layer arrived on day 2, before most of the implementation. Not because governance is fun, but because without it, high-velocity AI-assisted development produces a very consistent mess very quickly. The patterns that worked were the ones where intent was made explicit before execution began.

What actually changes

The automatable middle disappearing doesn't reduce craft. It relocates it. The people who matter most in an AI-native team are those who ask the right questions at either end of the pipeline — not those who can produce the most code in the middle of it.

Concretely: intent quality becomes the primary leverage point. A vague requirement produces vague software at 10x the speed. Architectural judgment — the shape of the system, the data model, the failure modes — is still irreducibly human work. And the scarcity of genuinely critical thinking doesn't reduce as AI gets better at the routine. If anything, it increases, because the routine is no longer where time gets spent.

The Anthropic study on AI pair programming found something interesting: AI boosted expert developers' speed, but hurt novice comprehension. The experts had the mental model already. They used AI to execute faster against it. The novices used AI to skip building the mental model. In the short term, they shipped code. In the medium term, they couldn't maintain it.

The implication isn't that novices shouldn't use AI. It's that the model — the understanding of what you're building and why — still has to be built. AI can execute against it. AI can't substitute for it.

The direction cost: If you can go faster, a wrong direction becomes more expensive. Speed increases the value of getting the starting question right, not less. The organisations that will benefit most from AI-assisted development are the ones that treat intent quality as a first-class engineering concern — not an afterthought.

None of this is pessimism about AI in software development. The 10% productivity gain is real and compounding. The question is where to invest the saved attention. The answer, based on the evidence, is earlier in the process — not later.

References

↩ "AI coding tools are doubling output, with code quality holding up" — Business Insider / Gitclear, March 2026. source
↩ "The AI advantage: how to put the artificial intelligence revolution to work" — Bain & Company / DevPlan, 2025. source
↩ "AI speeds up coding tasks, DORA study finds" — Google DORA / Morning Overview, 2025. source
↩ "Measuring the impact of AI coding tools on developer productivity" — METR / Stanford DEL, March 2026. source
↩ "Speed at what cost? AI coding tools downstream quality analysis" — Carnegie Mellon, March 2026. related reading: No Silver Bullet revisited
↩ "Half of AI-generated code that passes SWE-bench rejected by maintainers" — METR, March 2026. source

Working on something like this?

Fractional CTO and transformation leadership for situations that aren't working. Bring a problem — thirty minutes, no obligation.

Bring a problem → or scan a repo first →