Parts of Debian dismiss AI-contributions policy

Posted May 13, 2024 12:26 UTC (Mon) by anselm (subscriber, #2796)
In reply to: Parts of Debian dismiss AI-contributions policy by farnz
Parent article: Debian dismisses AI-contributions policy

the EU is silent on what copyright implications there are to the output of that algorithm

Elsewhere in copyright law, the general premise is that copyright is only available for works which are the “personal mental creation” of a human being. Speciesism aside, something that comes out of an LLM is obviously not the personal mental creation of anyone, and that seems to take care of that, even without the EU pronouncing on it in the context of training AI models.

Parts of Debian dismiss AI-contributions policy

Posted May 13, 2024 13:26 UTC (Mon) by kleptog (subscriber, #1183) [Link] (4 responses)

> Elsewhere in copyright law, the general premise is that copyright is only available for works which are the “personal mental creation” of a human being. Speciesism aside, something that comes out of an LLM is obviously not the personal mental creation of anyone, and that seems to take care of that, even without the EU pronouncing on it in the context of training AI models.

LLMs are prompted, they don't produce output out of thin air. Therefore the output is the creation of the human that triggered the prompt. Now whether that person was pressing buttons on a device that sent network packets to a server that processed all those keystrokes into a block of text to be sent to an LLM in the cloud is irrelevant. Somewhere along the way a human decided to invoke the LLM and controlled which input to send to it and what to do with the output. That human being is responsible for respecting copyright. Whether the output is copyrightable depends mostly on how original the prompt is.

The idea that LLM output cannot be copyrighted is silly. That would be like claiming that documents produced by a human typing into LibreOffice cannot be "the personal mental creation of anyone". LLMs, like LibreOffice, are tools, nothing more. There's a human at the keyboard who is responsible. Sure, most of the output of an LLM isn't going to be original enough to be copyrightable, but that's quite different from saying *all* output from LLMs is not copyrightable.

As with legal things in general, it depends.

Parts of Debian dismiss AI-contributions policy

Posted May 13, 2024 13:54 UTC (Mon) by mb (subscriber, #50428) [Link] (1 responses)

>LLMs are prompted, they don't produce output out of thin air.
>Therefore the output is the creation of the human that triggered the prompt.

Ok, so if I enter wget into my shell prompt to download some copyrighted music, it makes me the creator?

Parts of Debian dismiss AI-contributions policy

Posted May 13, 2024 14:18 UTC (Mon) by farnz (subscriber, #17727) [Link]

You are the creator of that copy, and in as far as there is anything copyrightable in creating that copy, you own that copyright.

However, that copy is (in most cases) either an exact copy of an existing work, or a derived work of an existing work; if it's an exact copy, then there is nothing copyrightable in the creation of the copy, so you own nothing.

If it's a derived work, then you own copyright in the final work thanks to the creative expression you put in to create the copy, but doing things with that work infringes the copyright in the original work unless you have appropriate permission from the copyright holder on the original work, or a suitable exception in copyright law.

Parts of Debian dismiss AI-contributions policy

Posted May 13, 2024 21:51 UTC (Mon) by mirabilos (subscriber, #84359) [Link] (1 responses)

😹😹😹

> LLMs are prompted, they don't produce output out of thin air. Therefore the output is the creation of the human that triggered the prompt.

This is ridiculous. The “prompt” is merely a tiny parametrisation of a query that extracts from the huge database of (copyrighted) works.

Do read the links I listed in https://lwn.net/Comments/973578/

> The idea that LLM output cannot be copyrighted is silly.

😹😹😹😹😹😹😹

You’re silly.

This is literally enshrined into copyright law. For example:

> Werke im Sinne dieses Gesetzes sind nur persönliche geistige Schöpfungen.

“Works as defined by this [copyright] law are only personal intellectual creations that pass threshold of originality.” (UrhG §2(2))

Wikipedia explains the “personal” part of this following general jurisprudence:

> Persönliches Schaffen: setzt „ein Handlungsergebnis, das durch den gestaltenden, formprägenden Einfluß eines Menschen geschaffen wurde“ voraus. Maschinelle Produktionen oder von Tieren erzeugte Gegenstände und Darbietungen erfüllen dieses Kriterium nicht. Der Schaffungsprozeß ist Realakt und bedarf nicht der Geschäftsfähigkeit des Schaffenden.

“demands the result of an act from the creative, form-shaping influence of a human: mechanical production or things or acts produced by animals do not fulfill this criterium (but legal competence is not necessary).” (<https://de.wikipedia.org/wiki/Urheberrecht_(Deutschland)#Schutzgegenstand_des_Urheberrechts:_Das_Werk>)

So, yes, LLM output cannot be copyrighted (as a new work/edition) in ipso.

And to create an adaption of LLM output, the human doing so must not only invest significant *creativity* (not just effort / sweat of brow!) to pass threshold of originality, but they also must have the permission of the copyright (exploitation rights, to be precise) holders of the original works to do so (and, in droit d’auteur, may not deface, so the authors even if not holders of exploitation rights also have something to say).

This has gone on for a while

Posted May 13, 2024 22:24 UTC (Mon) by corbet (editor, #1) [Link]

While this discussion can be seen as on-topic for LWN, I would also point out that we are not copyright lawyers, and that there may not be a lot of value in continuing to go around in circles here. Perhaps it's time to wind it down?

Parts of Debian dismiss AI-contributions policy

Posted May 13, 2024 13:40 UTC (Mon) by farnz (subscriber, #17727) [Link] (1 responses)

That's looking at the other end of it - the question here is not whether an LLM's output can be copyrighted, but whether an LLM's output can infringe someone else's copyright. And the general stance elsewhere in copyright law is that the tooling used is irrelevant to whether or not a given tool output infringed copyright on that tool's inputs. It might, it might not, but that depends on the details of the inputs and outputs (and importantly, not on the tool in question).

Parts of Debian dismiss AI-contributions policy

Posted May 13, 2024 14:55 UTC (Mon) by Wol (subscriber, #4433) [Link]

I think there might also be a problem here caused by "parts of the verb".

The output of an LLM cannot be copyrightED. That is, there is no original creative contribution BY THE LLM worthy of copyright.

But the output of an LLM can be COPYRIGHT. No "ed" on the end of it. The mere fact of feeding stuff through an LLM does not
automatically cancel any pre-existing copyright.

Again, we get back to the human analogy. There is no restriction on humans CONSUMING copyrighted works. European law explicitly extends that to LLMs CONSUMING copyrighted works.

And just as a human can regurgitate a copyrighted work in its entirety (Mozart is famous for doing this), so can an LLM. And both of these are blatant infringements if you don't have permission - although copyright was in its infancy when Mozart did it so I have no idea of the reality on the ground back then ...

Cheers,
Wol

Parts of Debian dismiss AI-contributions policy

Posted May 13, 2024 13:52 UTC (Mon) by mb (subscriber, #50428) [Link] (8 responses)

>LLM is obviously not the personal mental creation of anyone

Well, that is not obvious at all.

Because the inputs were mental creations.
At which point did the data loose the "mental creation" status traveling through the algorithm?
Will processing the input with 'sed' also remove it, because the output is completely processes by a program, not a human being?
What level or processing do we need for the "mental creation" status to be lost? How many chained 'sed's do we need?

Parts of Debian dismiss AI-contributions policy

Posted May 13, 2024 21:39 UTC (Mon) by mirabilos (subscriber, #84359) [Link] (7 responses)

Chained sed isn’t going to solve it.

Even “mechanical” transformation by humans does not create a work (as defined by UrhG, i.e. copyright). It has to have some creativity.

Until then, it’s a transformation of the original work(s) and therefore bound to the (sum of their) terms and conditions on the original work.

If you have a copyrighted thing, you can print it out, scan it, compress it as JPEG, store it into a database… it’s still just a transformation of the original work, and you can retrieve a sufficiently substantial part of the original work from it.

The article where someone reimplemented a (slightly older version of) ChatGPT in a 498-line PostgreSQL query showed exactly and easily understandable how this is just a lossy compression/decompression: https://explainextended.com/2023/12/31/happy-new-year-15/

There are now feasible attacks obtaining “training data” from prod models in large scale, e.g: https://not-just-memorization.github.io/extracting-training-data-from-chatgpt.html

This is sufficient to prove that these “models” are just databases with lossily compressed, but easily enough accessible, copies of the original, possibly (probably!) copyrighted, works.

Another thing I would like to point out is the relative weight. For a work which I offer to the public under a permissive licence, attribution is basically the only remuneration I can ever get. This means failure to attribute so has a much higher weight than for differently licenced or unlicenced stuff.

Parts of Debian dismiss AI-contributions policy

Posted May 13, 2024 21:55 UTC (Mon) by bluca (subscriber, #118303) [Link] (6 responses)

> This is sufficient to prove that these “models” are just databases with lossily compressed, but easily enough accessible, copies of the original, possibly (probably!) copyrighted, works.

While the AI bandwagon exaggerates greatly the capability of LLMs, let's not fall into the opposite trap. ChatGPT&al are toys, real applications like Copilot are very much not "just databases". A database is not going to provide you with autocomplete based on the current, local context open in your IDE. A database is not going to provide an accurate summary of the meeting that just finished, with action items and all that.

Parts of Debian dismiss AI-contributions policy

Posted May 13, 2024 22:20 UTC (Mon) by mirabilos (subscriber, #84359) [Link] (5 responses)

Oh, it totally is. Please *do* read the explainextended article: it shows you exactly how precisely the context is what parametrises the search query.

Parts of Debian dismiss AI-contributions policy

Posted May 13, 2024 22:44 UTC (Mon) by bluca (subscriber, #118303) [Link] (4 responses)

No, it totally isn't, because it's not about reproducing existing things, which is the only thing a database query can do.

Parts of Debian dismiss AI-contributions policy

Posted May 13, 2024 23:14 UTC (Mon) by mirabilos (subscriber, #84359) [Link] (3 responses)

Just read that.

Consider a database in which things are stored lossily compressed and interleaved (yet still retrievable).

Parts of Debian dismiss AI-contributions policy

Posted May 13, 2024 23:58 UTC (Mon) by bluca (subscriber, #118303) [Link] (2 responses)

A database query doesn't work differently depending on local context. You very clearly have never used any of this, besides playing with toys like chatgpt, and it shows.

Parts of Debian dismiss AI-contributions policy

Posted May 14, 2024 0:28 UTC (Tue) by mirabilos (subscriber, #84359) [Link] (1 responses)

Just read the fucking explainextended article, which CLEARLY explains all this, or go back to breaking unsuspecting peoples’ nōn-systemd systems, or whatever.

I don’t have the nerve to even try and communicate with systemd apologists who don’t even do the most basic research themselves WHEN POINTED TO IT M̲U̲L̲T̲I̲P̲L̲E̲ ̲T̲I̲M̲E̲S̲.

Second try

Posted May 14, 2024 1:26 UTC (Tue) by corbet (editor, #1) [Link]

OK, I'll state it more clearly: it's time to bring this thread to a halt, it's not getting anywhere.

That's all participants should stop, not just the one I'm responding to here.

Thank you.