I hope it continues to improve

Posted Jun 27, 2025 6:44 UTC (Fri) by wtarreau (subscriber, #51152)
Parent article: Supporting kernel development with large language models

I've been a long-time supporter of AUTOSEL. Not necessarily the implementation itself or initial quality but the project in general and the pursued goals. I've seen it criticized a lot regarding the supposed low accuracy, but people are seeing it as a computer program and do not accept mistakes that they would happily forgive from a human. Such tools are not regular, predictable programs. They work because they're as bogus as a human, but more constant. When well tuned, they can get better at such tasks.

I think that only those who have dealt with huge patch reviews know how unreliable we become after reading a few hundred patches. Your attention declines, sometimes you respond yes or no mechanically, then suddenly you realize you did it without thinking, just because the patch looked like a previously selected one etc. When I was maintaining the extended 3.10 kernel, I had to review around 6000 patches in a week-end that I was already picking from the previous stable branch (i.e. they had already been considered for stable by someone more knowledgeable, I was not reviewing mainline). It was a super difficult task. Sometimes I had to stop to go eat something, walk around, listen to the radio for a moment, before coming back ot the task. Tasks like this are pushing humans to their limits, and that's precisely where such tools can be great: not only they save you from suffering, but they can help you be better at what you're doing.

For haproxy, we maintain stable branches and have to periodically review mainline patches and consider whether or not they're suitable for backporting. The task is the same and we figured over the years that the sole reason for not doing a stable release for a long time was the huge number of patches having to be reviewed. I have developed a preselection and review process in the same spirit as autosel (we exchanged a little bit with Sasha about our respective projects a while ago), and similarly it gives me a yes/no/uncertain/wait verdict with a short justification for the choice. It has become extremely helpful, to the point that I occasionally prefer to turn to that page to understand the purpose of a series because it gives me concise summaries and impact evaluation. Obviously sometimes it's wrong, that's why I can re-adjust the verdict myself. But it turned what used to be a day-long of really painful work into a tens of minutes task with much higher accuracy than before. While I hoped that about 2/3 of selected patches would be good, I was amazed to see that the level of overlap between my manual choices and the bot is around 98-99%! It's exactly like having someone else do that job for you where you just have to glance over it and check if nothing obvious seems to be missing. And thanks to this we can now emit stable releases more often. Also in terms of energy, since some are wondering and it's legit, our bot consumes roughly 30s of CPU per patch. For our recent 3.2 release, this means 10 hours of CPU over 6 months (or 20 hours a year). It's only a Ryzen 5800, no GPU involved, we all consume way more than this by compiling or others by playing games. It's even possible that it actually saves energy by reducing the number of required debugging / bisecting of stable releases!

It's important to make people understand that these tools are assistants which never get bored and for which you don't need to have any form of compassion. The human must have the final choice of course, but seeing the mechanical boring job being mostly done by the bot is super helpful, and significantly improves the work quality and conditions for those in charge of the task. I.e. it's no longer that much a punishment to work on a stable release.

Keep up the good work Sasha!