Creator, or proof reader ?

Posted May 12, 2024 20:10 UTC (Sun) by Wol (subscriber, #4433)
In reply to: Creator, or proof reader ? by mb
Parent article: Debian dismisses AI-contributions policy

I think the obvious difference is that you can write a decompiler to retrieve the original source. Or to put it mathematically, your "compilation with extra steps" or obfuscator does not falsify the basic "2 * 12 = 18 + 6" relationship between source and output.

Can you write an anti-LLM, that given the LLM's output, would reverse it back to the original question?

Cheers,
Wol

Creator, or proof reader ?

Posted May 12, 2024 20:36 UTC (Sun) by mb (subscriber, #50428) [Link] (5 responses)

>I think the obvious difference is that you can write a decompiler to retrieve the original source.
> Or to put it mathematically, your "compilation with extra steps" or obfuscator does not falsify the basic "2 * 12 = 18 + 6"

No. That's not possible.

You can't reverse 18+6 into 2*12, because it could also have been 4*6 or anything else that fits the equation. There is an endless number of possibilities.
It's not a 1:1 relation.
Of course my hypothetical obfuscator also would not produce a 1:1 relation between input and output. It's pretty easy to do that.

So, does is the output still a derived work of the input? If so, why is an LLM different?

Creator, or proof reader ?

Posted May 12, 2024 21:01 UTC (Sun) by gfernandes (subscriber, #119910) [Link] (4 responses)

You're sort of missing the point here.

Who uses am obfuscator? The producer of the works because said producer wants an extra layer/hurdle to protect *their* copyright of their original works.

Who uses an LLM? Obviously *not _just_* the producer of the LLM. And *because* of this, the LLM is fundamentally different as far as copyright goes.

The user can cause the LLM to leak copyrighted training material that the _producer_ of the LLM did not license!

This is impossible in the context of an obfuscator.

In fact there is an ongoing case which might bring legal clarity here - NYT v OpenAI.

Creator, or proof reader ?

Posted May 13, 2024 5:58 UTC (Mon) by mb (subscriber, #50428) [Link] (3 responses)

> Who uses am obfuscator? The producer of the works

Nope. I use it on foreign copyrighted work to get public domain work out of it. LLM-style.

Creator, or proof reader ?

Posted May 13, 2024 6:17 UTC (Mon) by gfernandes (subscriber, #119910) [Link] (2 responses)

Then that's already a copyright violation unless you have the permission of the copyright holder to reissue copyrighted material as public domain works.

Creator, or proof reader ?

Posted May 13, 2024 8:56 UTC (Mon) by mb (subscriber, #50428) [Link] (1 responses)

I agree.
So, why is it different, if I process the input data with an LLM algorithm instead of with my algorithm?

Creator, or proof reader ?

Posted May 13, 2024 9:51 UTC (Mon) by farnz (subscriber, #17727) [Link]

It's not different - the output of an LLM may be a derived work of the original. It may also be a non-literal copy, or a transformative work, or even unrelated to the input data.

There's a lot of "AI bros" who would like you to believe that using an LLM automatically results in the output not being a derived work of the input, but this is completely untested in law; the current smart money suggests that "generative AI" output (LLMs, diffusion probabilistic models, whatever) will be treated the same way as human output - it's not automatically a derived work just because you used an LLM, but it could be, and it's on the human operator to ensure that copyright is respected.

It's basically the same story as a printer in that respect; if the input to the printer results in a copyright infringement on the output, then no amount of technical discussion about how I didn't supply the printer with a copyrighted work, I supplied it with a PostScript program to calculate π and instructions on which digits of π to interpret as a bitmap will get me out of trouble. Same currently applies to LLMs; if I get a derived work as output, that's my problem to deal with.

This, BTW, is why "AI bros" would like to see the outputs of LLMs deemed as "non-infringing"; it's going to hurt their business model if "using an AI to generate output" is treated, in law, as equivalent to "using a printer to run a PostScript program", since then their customers have to do all the legal analysis to work out if a given output from a prompt has resulted in a derived work of the training set or not.