False positive identification rate
False positive identification rate
Posted Apr 1, 2026 18:00 UTC (Wed) by Paf (subscriber, #91811)In reply to: False positive identification rate by iabervon
Parent article: The role of LLMs in patch review
I suggest reading the previous articles on Sashiko or checking out their pages on it - it was extensively validated by checking patches which later received a Fixes: label. It caught about 50% of the bugs in that testing - so patches that were merged, then later fixed. So, bugs that were not caught in human review, and were important enough to later receive a fix.
From my perspective as a user, Google's most valuable contribution here is not the harness, it is the results-validated process. I could vibe code up a (less good, but workable) harness in a few days, most likely. It is the specific detailed sequence of multi-stage review with specific prompts and the concrete testing against real known bugs that is the most valuable part.
