No disclosure for LLM-generated patch?

Posted Jun 27, 2025 10:51 UTC (Fri) by mb (subscriber, #50428)
In reply to: No disclosure for LLM-generated patch? by Funcan
Parent article: Supporting kernel development with large language models

LLMs don't usually copy, though.

If you as a human learn from proprietary code and then write Open Source with that knowledge, it's not copying unless you actually copy code sections. Same goes for LLMs. If it produces a copy, then it copied. Otherwise it didn't.

No disclosure for LLM-generated patch?

Posted Jun 27, 2025 11:47 UTC (Fri) by laarmen (subscriber, #63948) [Link] (1 responses)

This is not as simple as you make it out to be, at least in the eyes of some people. That's why you have clean-room rules such as https://gitlab.winehq.org/wine/wine/-/wikis/Clean-Room-Gu...

No disclosure for LLM-generated patch?

Posted Jun 27, 2025 12:57 UTC (Fri) by mb (subscriber, #50428) [Link]

Clean-room is a *tool* to make accidental/unintentional copying less likely.

It's in no way required to avoid copyright problems.
Just don't copy and then you are safe.
Learning is not copying.

And you can also use that concept with LLMs, if you want.
Just feed the output from one LLM into the input of another LLM and you basically get the same thing as with two human clean-room teams.

No disclosure for LLM-generated patch?

Posted Jul 1, 2025 9:51 UTC (Tue) by cyphar (subscriber, #110703) [Link]

Copyright law isn't this simple. For one, it is established law that only humans can create copyrightable works and the copyright of the output of programs is based on the copyright of the input (if the input is sufficiently creative). Copyright in its current form is entirely based on "human supremacy" when it comes to capability of artistic expression, and so comparing examples where humans do something equivalent is not (in the current legal framework) actually legally equivalent. Maybe that will change in the future, but that is the current case law in the US AFAIK (and probably most other countries).

You could just as easily argue that LLMs produce something equivalent to a generative collage of all of their training data, which (given the current case law on programs and copyright) would mean that the copyright status of the training data would be transferred to the collage. You would thus need to make an argument for a fair use exemption for the output, which your example would not pass muster.

However, this is not the only issue at play here -- to submit code to Linux you need to sign the DCO, which the commit author did with their Signed-off-by line. However, none of the sections of the DCO can be applied to LLM-produced code, and so the Signed-off-by is invalid regardless of the legal questions about copyright and LLM code.