Parts of Debian dismiss AI-contributions policy

Posted May 13, 2024 22:00 UTC (Mon) by mirabilos (subscriber, #84359)
In reply to: Parts of Debian dismiss AI-contributions policy by bluca
Parent article: Debian dismisses AI-contributions policy

Fun that you mention data mining: in preparation of a copyright and licence workshop I’m holding at $dayjob, I re-read that paragraph as well.

Text and data mining are opt-out, and the opt-out must be machine-readable. but this limitation of copyright only applies to doing automated analysēs of works to obtain information about patterns, trends and correlations.

(I grant that creating an LLM model itself probably falls under this clause.)

But not only are the copies of works made for text and data mining to be deleted as soon as they are no longer necessary for this (which these models clearly don’t do, given how it’s possible to obtain “training data” by the millions), it also does not allow you to reproduce the output of such models.

Text and data mining is, after all, only permissible to obtain “information about especially patterns, trends and correlations”, not to produce outputs as genAI does.

Therefore, the limitation of copyright does NOT apply to LLM output, and therefore the normal copyright rules (i.e. mechanically combined of its inputs, whose licences hold true) apply.

Parts of Debian dismiss AI-contributions policy

Posted May 13, 2024 22:44 UTC (Mon) by bluca (subscriber, #118303) [Link] (2 responses)

> But not only are the copies of works made for text and data mining to be deleted as soon as they are no longer necessary for this (which these models clearly don’t do, given how it’s possible to obtain “training data” by the millions),

It doesn't say that it must be deleted, it says:

> Reproductions and extractions made pursuant to paragraph 1 may be retained for as long as is necessary for the purposes of text and data mining.

Not quite the same thing. I don't know whether it's true that verbatim copies of training data are actually stored as you imply, as I am not a ML expert - it would seem strange and pointless, but I don't really know. But even assuming that was true, if that's required to make the LLM work, then the regulation clearly allows for it.

> it also does not allow you to reproduce the output of such models.

Every LLM producer treats such instances as bugs to be fixed. And they are really hard to reproduce, judging from how contrived and tortured the sequence of prompts need to be to make that actually happen. The NYT had to basically copy and paste themselves portions of their articles in the prompt to make ChatGPT spit them back, as showed in their litigation vs OpenAI.

> Therefore, the limitation of copyright does NOT apply to LLM output, and therefore the normal copyright rules (i.e. mechanically combined of its inputs, whose licences hold true) apply.

And yet, the NYT decided to sue in the US, where the law is murky and based on fair use case-by-case decisions, rather than in the EU where they have an office and it would have been a slam dunk, according to you. Could it be that you are wrong? It's very easy to test it, why don't you sue any of the companies that publishes an LLM and see what happens?

Parts of Debian dismiss AI-contributions policy

Posted May 13, 2024 23:16 UTC (Mon) by mirabilos (subscriber, #84359) [Link] (1 responses)

Yes, they treat that as bugs PRECISELY because they want to get around to do illegal copyright laundering.

Doesn’t change the fact that it is possible, and in sufficiently amount to consider that models contain sufficient amounts from their input works for the output to be a mechanically produced derivative of them.

Parts of Debian dismiss AI-contributions policy

Posted May 13, 2024 23:57 UTC (Mon) by bluca (subscriber, #118303) [Link]

> Yes, they treat that as bugs PRECISELY because they want to get around to do illegal copyright laundering.

They are treated as bugs because they are bugs, despite the contrived and absurd ways that are necessary to reproduce them. Doesn't really prove anything.

> Doesn’t change the fact that it is possible, and in sufficiently amount to consider that models contain sufficient amounts from their input works for the output to be a mechanically produced derivative of them.

Illiterate FUD. Go to court and prove that, if you really believe that.