No disclosure for LLM-generated patch?
No disclosure for LLM-generated patch?
Posted Jun 27, 2025 6:55 UTC (Fri) by drago01 (subscriber, #50715)In reply to: No disclosure for LLM-generated patch? by lucaswerkmeister
Parent article: Supporting kernel development with large language models
People tend to overreacting, an LLM is just a tool.
No one discloses which IDE or editor has been used or whether autocomplete has been used or not etc.
In the end of the day what matters is whether the patch is correct or not, which is what reviews are for. The tools used to write it are not that relevant.
Posted Jun 27, 2025 9:11 UTC (Fri)
by Funcan (subscriber, #44209)
[Link] (5 responses)
I vaguely remember some llm providers include legal wavers for copyright where they take on the liability, but I can't find one for e.g. copilot right now
Posted Jun 27, 2025 10:51 UTC (Fri)
by mb (subscriber, #50428)
[Link] (3 responses)
If you as a human learn from proprietary code and then write Open Source with that knowledge, it's not copying unless you actually copy code sections. Same goes for LLMs. If it produces a copy, then it copied. Otherwise it didn't.
Posted Jun 27, 2025 11:47 UTC (Fri)
by laarmen (subscriber, #63948)
[Link] (1 responses)
Posted Jun 27, 2025 12:57 UTC (Fri)
by mb (subscriber, #50428)
[Link]
It's in no way required to avoid copyright problems.
And you can also use that concept with LLMs, if you want.
Posted Jul 1, 2025 9:51 UTC (Tue)
by cyphar (subscriber, #110703)
[Link]
You could just as easily argue that LLMs produce something equivalent to a generative collage of all of their training data, which (given the current case law on programs and copyright) would mean that the copyright status of the training data would be transferred to the collage. You would thus need to make an argument for a fair use exemption for the output, which your example would not pass muster.
However, this is not the only issue at play here -- to submit code to Linux you need to sign the DCO, which the commit author did with their Signed-off-by line. However, none of the sections of the DCO can be applied to LLM-produced code, and so the Signed-off-by is invalid regardless of the legal questions about copyright and LLM code.
Posted Jun 27, 2025 16:57 UTC (Fri)
by geofft (subscriber, #59789)
[Link]
https://blogs.microsoft.com/on-the-issues/2023/09/07/copi...
"Specifically, if a third party sues a commercial customer for copyright infringement for using Microsoft’s Copilots or the output they generate, we will defend the customer and pay the amount of any adverse judgments or settlements that result from the lawsuit, as long as the customer used the guardrails and content filters we have built into our products."
See also https://learn.microsoft.com/en-us/legal/cognitive-service... . The exact legal text seems to be the "Customer Copyright Commitment" section of https://www.microsoft.com/licensing/terms/product/ForOnli...
Posted Jun 27, 2025 10:45 UTC (Fri)
by excors (subscriber, #95769)
[Link] (3 responses)
One problem is that reviewers typically assume the patch was submitted in good faith, and look for the kinds of errors that good-faith humans typically make (which the reviewer has learned through many years of experience, debugging their own code and other people's code).
If e.g. Jia Tan started submitting patches to your project, you wouldn't say "I know he deliberately introduced a subtle backdoor into OpenSSH and he's probably a front for a national intelligence service, but he also submitted plenty of genuinely useful patches while building up trust, so let's welcome him and just review all his patches carefully before accepting them". You'd understand that your review process is not infallible and he's going to try to sneak something past it, with malicious patches that look as non-suspicious as possible, so it's not worth the risk and you would simply ban him. Linux banned a whole university for a clumsy version of that: https://lwn.net/Articles/853717/. The source of a patch _does_ matter.
Similarly, LLMs generate code with errors that are not what a good-faith human would typically make, so they're not the kind of errors that reviewers are looking out for. A human isn't going to hallucinate a whole API and write a professional-looking well-documented patch that calls it, but an LLM will eagerly do so. In the best case, it'll waste reviewers' time as they try to figure out what the nonsense means. In the worst case there will be more subtle inhuman bugs that get missed because nobody is thinking to look for them.
At the same time, the explicit goal of generating code with LLMs is to make developers more productive at writing patches, meaning there will be more patches to review and reviewers will be under even more pressure. And in the long term there will be fewer new reviewers, because none of the junior developers who outsourced their understanding of the code to an LLM will be learning enough to take on that role. I think writing code is already the easiest and most enjoyable part of software development, so it seems like the worst part to be trying to automate away.
Posted Jun 27, 2025 13:38 UTC (Fri)
by drago01 (subscriber, #50715)
[Link] (1 responses)
If you don't trust that person and don't review his submission in detail the problem is unrelated to whether a LLM has been used or not.
Posted Jun 27, 2025 15:10 UTC (Fri)
by SLi (subscriber, #53131)
[Link]
Posted Jun 27, 2025 14:02 UTC (Fri)
by martinfick (subscriber, #4455)
[Link]
Posted Jun 29, 2025 15:30 UTC (Sun)
by nevets (subscriber, #11875)
[Link] (2 responses)
I was going to even ask Sasha if this came from some new tool. I think I should have. And yes, it would have been nice if Sasha mentioned that it was completely created by an LLM because I would have taken a deeper look at it. It appears (from comments below) that it does indeed have a slight bug. Which I would have caught if I had know this was 100% generated, as I would not have trusted the patch as much as I did thinking Sasha did the work.
Posted Jun 29, 2025 23:31 UTC (Sun)
by sashal (✭ supporter ✭, #81842)
[Link] (1 responses)
For that matter, the reason I felt comfortable sending this patch out is because I know hashtable.h.
Maybe we should have a tag for tool generated patches, but OTOH we had checkpatch.pl and Coccinelle generated patches for over a decade, so why start now?
Is it an issue with the patch? Sure.
Am I surprised that LWN comments are bikeshedding over a lost __read_mostly? Not really...
Posted Jun 30, 2025 2:26 UTC (Mon)
by nevets (subscriber, #11875)
[Link]
The missing "__read_mostly" is a red herring. The real issue is transparency. We should not be submitting AI generated patches without explicitly stating how it was generated. As I mentioned. If I had known it was 100% a script, I may have been a bit more critical over the patch. I shouldn't be finding this out by reading LWN articles.
No disclosure for LLM-generated patch?
No disclosure for LLM-generated patch?
No disclosure for LLM-generated patch?
No disclosure for LLM-generated patch?
Just don't copy and then you are safe.
Learning is not copying.
Just feed the output from one LLM into the input of another LLM and you basically get the same thing as with two human clean-room teams.
No disclosure for LLM-generated patch?
No disclosure for LLM-generated patch?
No disclosure for LLM-generated patch?
No disclosure for LLM-generated patch?
No disclosure for LLM-generated patch?
No disclosure for LLM-generated patch?
No disclosure for LLM-generated patch?
No disclosure for LLM-generated patch?
No disclosure for LLM-generated patch?