Many developers say AI coding assistants make them more productive, but a recent study measuring their output found no significant gains.
According to a study from Uplevel, a company providing insights from coding and collaboration data using GitHub, Copilot also introduced 41 per cent more bugs.
The study measured pull request (PR) cycle time, or the time to merge code into a repository, and PR throughput, the number of pull requests merged. It found no significant improvements for developers using Copilot.
Uplevel, using data generated by its customers, compared the output of about 800 developers using GitHub Copilot over three months to their production in three months before adoption.
In addition to measuring productivity, the Uplevel study looked at factors in developer burnout and found that GitHub Copilot hasn’t helped. The amount of working time spent outside of standard hours decreased for both the control and test groups using the coding tool, but it decreased more when the developers weren’t using Copilot.
Uplevel product manager and data analyst Matt Hoffman said the company’s study was driven by curiosity over claims of significant productivity gains as AI coding assistants become ubiquitous. A GitHub survey published in August found that 97 per cent of software engineers, developers, and programmers reported using AI coding assistants.
“We’ve seen different studies of people saying, ‘This is helpful for our productivity. “We’ve also seen some people saying, ‘You know what? I’m having to be more of a [code] reviewer,” he said.
The Uplevel team also went into its study expecting to see some productivity gains, Hoffman says.
“Our team hypothesised that we thought that PR cycle time would decrease,” Hoffman says. “We thought that they would be able to write more code, and we thought that defect rate might go down because you’re using these gen AI tools to help you review your code before you even get it out there.”
Hoffman acknowledges there may be more ways to measure developer productivity than PR cycle time and PR throughput, but Uplevel sees those metrics as a solid measure of developer output.
Uplevel isn’t suggesting that organisations stop using coding assistants because the tools are advancing rapidly.
“We heard that people are ending up being more reviewers for this code than in the past, and you might have some false faith that the code is doing what you expect it to,” Hoffman adds. “You just have to keep a close eye on what is being generated; does it do what you expect it to do?” Hoffman said.