No disclosure for LLM-generated patch?

Posted Jun 27, 2025 9:11 UTC (Fri) by Funcan (subscriber, #44209)
In reply to: No disclosure for LLM-generated patch? by drago01
Parent article: Supporting kernel development with large language models

Is there a need for it? Given the legal uncertainty around the copyright status of llm output (see Disney's massive lawsuit for example), I'd say 'yes', and that it might be legally similar to copying code from a proprietary licensed kernel into Linux and passing it off as your own work.

I vaguely remember some llm providers include legal wavers for copyright where they take on the liability, but I can't find one for e.g. copilot right now

No disclosure for LLM-generated patch?

Posted Jun 27, 2025 10:51 UTC (Fri) by mb (subscriber, #50428) [Link] (3 responses)

LLMs don't usually copy, though.

If you as a human learn from proprietary code and then write Open Source with that knowledge, it's not copying unless you actually copy code sections. Same goes for LLMs. If it produces a copy, then it copied. Otherwise it didn't.

No disclosure for LLM-generated patch?

Posted Jun 27, 2025 11:47 UTC (Fri) by laarmen (subscriber, #63948) [Link] (1 responses)

This is not as simple as you make it out to be, at least in the eyes of some people. That's why you have clean-room rules such as https://gitlab.winehq.org/wine/wine/-/wikis/Clean-Room-Gu...

No disclosure for LLM-generated patch?

Posted Jun 27, 2025 12:57 UTC (Fri) by mb (subscriber, #50428) [Link]

Clean-room is a *tool* to make accidental/unintentional copying less likely.

It's in no way required to avoid copyright problems.
Just don't copy and then you are safe.
Learning is not copying.

And you can also use that concept with LLMs, if you want.
Just feed the output from one LLM into the input of another LLM and you basically get the same thing as with two human clean-room teams.

No disclosure for LLM-generated patch?

Posted Jul 1, 2025 9:51 UTC (Tue) by cyphar (subscriber, #110703) [Link]

Copyright law isn't this simple. For one, it is established law that only humans can create copyrightable works and the copyright of the output of programs is based on the copyright of the input (if the input is sufficiently creative). Copyright in its current form is entirely based on "human supremacy" when it comes to capability of artistic expression, and so comparing examples where humans do something equivalent is not (in the current legal framework) actually legally equivalent. Maybe that will change in the future, but that is the current case law in the US AFAIK (and probably most other countries).

You could just as easily argue that LLMs produce something equivalent to a generative collage of all of their training data, which (given the current case law on programs and copyright) would mean that the copyright status of the training data would be transferred to the collage. You would thus need to make an argument for a fair use exemption for the output, which your example would not pass muster.

However, this is not the only issue at play here -- to submit code to Linux you need to sign the DCO, which the commit author did with their Signed-off-by line. However, none of the sections of the DCO can be applied to LLM-produced code, and so the Signed-off-by is invalid regardless of the legal questions about copyright and LLM code.

No disclosure for LLM-generated patch?

Posted Jun 27, 2025 16:57 UTC (Fri) by geofft (subscriber, #59789) [Link]

> I vaguely remember some llm providers include legal wavers for copyright where they take on the liability, but I can't find one for e.g. copilot right now

https://blogs.microsoft.com/on-the-issues/2023/09/07/copi...

"Specifically, if a third party sues a commercial customer for copyright infringement for using Microsoft’s Copilots or the output they generate, we will defend the customer and pay the amount of any adverse judgments or settlements that result from the lawsuit, as long as the customer used the guardrails and content filters we have built into our products."

See also https://learn.microsoft.com/en-us/legal/cognitive-service... . The exact legal text seems to be the "Customer Copyright Commitment" section of https://www.microsoft.com/licensing/terms/product/ForOnli...