Sylvain's Blog

A new study from METR takes a hard look at how modern AI tools actually affect productivity for seasoned open-source developers. Contrary to industry hopes, the results are sobering: when given access to state-of-the-art AI like Claude 3.5/3.7 and Cursor Pro, experienced developers became 19% slower, not faster, at completing real coding tasks.

This post breaks down the study’s design, results, and implications. If you think AI is a guaranteed productivity boost, think again.

What Makes This Study Different

Most productivity studies rely on synthetic tasks, vague metrics, or junior developers. This one does not. Researchers recruited 16 veteran open-source contributors (each with 5+ years of experience in their codebase) and tracked 246 real issues across mature GitHub repositories (averaging 1.1 million lines of code).

Every task was randomly assigned to allow or disallow AI usage. The outcome? Actual time to completion, no guessing, no gaming.

Why AI Slowed Experts Down

Before the study, developers predicted AI would speed them up by 24%. Afterward, they still believed it had helped (20% estimated speedup). But the hard data told a different story: AI made them 19% slower.

Here’s why:

Overconfidence: Developers assumed AI would help, and continued using it even when it slowed them down.
High Expertise: When devs already know the repo inside-out, AI has little to add and plenty to mislead.
Low Reliability: Less than 44% of AI generations were accepted. Many required cleanup or were discarded.
Missing Context: AI struggled with tacit knowledge, like where things “should” go or subtle conventions.
Friction and Latency: Prompting, waiting, reviewing, and re-editing all consumed time without clear benefit.

How Developers Actually Used AI

Screen recordings revealed that AI shifted how developers spent their time. Instead of coding, they were:

Prompting the model
Waiting on responses
Reviewing suggestions
Fixing or reverting bad output

Time actively writing code dropped. Time idle or managing the AI pipeline rose. For high-performing devs, the tradeoff just didn’t pay off.

What This Means for Teams and Tools

The key lesson? AI is not always a force multiplier. In some cases, it’s a tax. For teams working in large, complex codebases with deeply experienced engineers, AI adoption may need to be selective and strategic, not blind.

Better prompting, agent frameworks, and repo specific tuning may help. But as it stands, even frontier tools don’t consistently accelerate expert work.

Important Caveats

The study focused on a narrow setting: experienced open-source devs in familiar codebases. In greenfield projects, new teams, or junior roles, AI may still shine. The authors also note that smarter scaffolding, higher-reliability models, or more tokens per inference could change the picture.

Still, this study sends a strong signal: when it comes to real world productivity, appearances can be deceiving. Just because AI looks helpful doesn’t mean it is.

Read the full paper here

The Productivity Mirage: When AI Slows Down Experienced Developers

What Makes This Study Different

Why AI Slowed Experts Down

How Developers Actually Used AI

What This Means for Teams and Tools

Important Caveats