|
|
Subscribe / Log in / New account

Parts of Debian dismiss AI-contributions policy

Parts of Debian dismiss AI-contributions policy

Posted May 13, 2024 16:53 UTC (Mon) by mb (subscriber, #50428)
In reply to: Parts of Debian dismiss AI-contributions policy by bluca
Parent article: Debian dismisses AI-contributions policy

> Data mining on publicly available datasets is not performed under whatever license the dataset had,
> but under the copyright exception granted by the law, which trumps any license you might attach to it.

Ok. Got it now. So

>the fact that something that had been there in the input is no longer there in the output after a processing step.

is true after all.
The input was copyright protected and the special exception made it non-copyright-protected because of reasons.
And for whatever strange reason that only applies to AI algorithms, because the EU says so.


to post comments

Parts of Debian dismiss AI-contributions policy

Posted May 13, 2024 17:15 UTC (Mon) by farnz (subscriber, #17727) [Link]

>the fact that something that had been there in the input is no longer there in the output after a processing step.

is true after all. The input was copyright protected and the special exception made it non-copyright-protected because of reasons. And for whatever strange reason that only applies to AI algorithms, because the EU says so.

No, this is also false.

Copyright law says that there are certain actions I am capable of taking, such as making a literal copy, or a "derived work" (a non-literal copy), which the law prohibits unless you have permission from the copyright holder. There are other actions that copyright allows, such as reading your text, or (in the EU) feeding that text as input to an algorithm; they may be banned by other laws, but copyright law says that these actions are completely legal.

The GPL says that the copyright holder gives you permission to do certain acts that copyright law prohibits as long as you comply with certain terms. If I fail to comply with those terms, then the GPL does not give me permission, and I now have a copyright issue to face up to.

The law says nothing about the copyright protection on the output of the LLM; it is entirely plausible that an LLM will output something that's a derived work of the input as far as copyright law is concerned, and if that's the case, then the output of the LLM infringes. Determining if the output infringes on a given input is done by a comparison process between the input and the output - and this applies regardless of what the algorithm that generated the output is.

Further, this continues to apply even if the LLM itself is not a derived work of the input data; it might be fine to send you the LLM, but not to send you the result of giving the LLM certain prompts as input, since the result of those prompts is derived from some or all of the input in such a way that you can't get permission to distribute the resulting work.

Parts of Debian dismiss AI-contributions policy

Posted May 13, 2024 17:15 UTC (Mon) by bluca (subscriber, #118303) [Link] (2 responses)

> The input was copyright protected and the special exception made it non-copyright-protected because of reasons.

No, because what you are stubbornly refusing to understand, despite it having been explained a lot of times, is:

> Now, again as it has already been explained, whether the output of a prompt is copyrightable, and whether it's a derived work of existing copyrighted material, is an entirely separate question that depends on many things, but crucially, not on which tool happened to have been used to write it out.

This is a legal matter, not a programming one. The same paradigms used to understand software cannot be used to try and understand legal issues.

Parts of Debian dismiss AI-contributions policy

Posted May 13, 2024 17:33 UTC (Mon) by mb (subscriber, #50428) [Link] (1 responses)

> The same paradigms used to understand software cannot be used to try and understand legal issues.

Yes. That is the main problem. It does not have to make logical sense for it to be "correct" under law.

> stubbornly

I am just applying logical reasoning. The logical chain obviously is not correctly implemented. Which is often the case in law, of course. Just like the logical reasoning chain breaks if the information goes through a human brain. And that's Ok.

Just saying that some people claiming here things like "it's *obvious* that LLMs are like this and that w.r.t. Copyright" are plain wrong. Nothing is obvious in this context. It's partly counter-logical and defined with contradicting assumptions.

But that's Ok, as long as a majority agrees that it's fine.
But that doesn't mean I personally have to agree. Copyright is a train wreck and it's only getting worse and worse.

Parts of Debian dismiss AI-contributions policy

Posted May 14, 2024 5:11 UTC (Tue) by NYKevin (subscriber, #129325) [Link]

I hesitate to wade into a thread that looks like it has long since become unproductive, but at this point, I think it might be helpful to remember that copyright is a "color" in the sense described at https://ansuz.sooke.bc.ca/entry/23.

Unfortunately, while that article is very well-written and generally illuminates the right way to think about verbatim copying, it can be unintentionally misleading when we're talking about derivative works. The "colors" involved in verbatim copying are relatively straightforward - either X is a copy of Y, or it is not, and this is purely a matter of how you created X. But when we get to derivative works, there are really two* separate components that need to be considered:

- Access (a "color" of the bits, describing whether the defendant could have looked at the alleged original).
- Similarity (a function of the bits, and not a color)

The problem is, if you've been following copyright law for some time, you might be used to working in exclusively one mode of analysis at a time (i.e. either the "bits have color" mode of analysis or the "bits are colorless" mode of analysis). The problem is, access is a colored property, and similarity is a colorless property. You need to be prepared to combine both modalities, or at least to perform each of them sequentially, in order to reason correctly about derivative works. You cannot insist that "it must be one or the other," because as a matter of law, it's both.

* Technically, there is also the third component of originality, but that only matters if you want to copyright the derivative work, which is an entirely different discussion altogether. That one is also a "color" which depends on how much human creativity has gone into the work.

Parts of Debian dismiss AI-contributions policy

Posted May 13, 2024 22:18 UTC (Mon) by mirabilos (subscriber, #84359) [Link]

> The input was copyright protected and the special exception made it non-copyright-protected because of reasons.

No, wrong.

They’re copyright-protected, but *analysing* copyright-protected works for text and data mining is an action permitted without the permission of the rights holders.

See my other post in this subthread. This limitation of copyright protection does not extend to doing *anything* with the output of such models.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds