Creator, or proof reader ?
Creator, or proof reader ?
Posted May 12, 2024 20:10 UTC (Sun) by Wol (subscriber, #4433)In reply to: Creator, or proof reader ? by mb
Parent article: Debian dismisses AI-contributions policy
Can you write an anti-LLM, that given the LLM's output, would reverse it back to the original question?
Cheers,
Wol
Posted May 12, 2024 20:36 UTC (Sun)
by mb (subscriber, #50428)
[Link] (5 responses)
No. That's not possible.
You can't reverse 18+6 into 2*12, because it could also have been 4*6 or anything else that fits the equation. There is an endless number of possibilities.
So, does is the output still a derived work of the input? If so, why is an LLM different?
Posted May 12, 2024 21:01 UTC (Sun)
by gfernandes (subscriber, #119910)
[Link] (4 responses)
Who uses am obfuscator? The producer of the works because said producer wants an extra layer/hurdle to protect *their* copyright of their original works.
Who uses an LLM? Obviously *not _just_* the producer of the LLM. And *because* of this, the LLM is fundamentally different as far as copyright goes.
The user can cause the LLM to leak copyrighted training material that the _producer_ of the LLM did not license!
This is impossible in the context of an obfuscator.
In fact there is an ongoing case which might bring legal clarity here - NYT v OpenAI.
Posted May 13, 2024 5:58 UTC (Mon)
by mb (subscriber, #50428)
[Link] (3 responses)
Nope. I use it on foreign copyrighted work to get public domain work out of it. LLM-style.
Posted May 13, 2024 6:17 UTC (Mon)
by gfernandes (subscriber, #119910)
[Link] (2 responses)
Posted May 13, 2024 8:56 UTC (Mon)
by mb (subscriber, #50428)
[Link] (1 responses)
Posted May 13, 2024 9:51 UTC (Mon)
by farnz (subscriber, #17727)
[Link]
It's not different - the output of an LLM may be a derived work of the original. It may also be a non-literal copy, or a transformative work, or even unrelated to the input data.
There's a lot of "AI bros" who would like you to believe that using an LLM automatically results in the output not being a derived work of the input, but this is completely untested in law; the current smart money suggests that "generative AI" output (LLMs, diffusion probabilistic models, whatever) will be treated the same way as human output - it's not automatically a derived work just because you used an LLM, but it could be, and it's on the human operator to ensure that copyright is respected.
It's basically the same story as a printer in that respect; if the input to the printer results in a copyright infringement on the output, then no amount of technical discussion about how I didn't supply the printer with a copyrighted work, I supplied it with a PostScript program to calculate π and instructions on which digits of π to interpret as a bitmap will get me out of trouble. Same currently applies to LLMs; if I get a derived work as output, that's my problem to deal with.
This, BTW, is why "AI bros" would like to see the outputs of LLMs deemed as "non-infringing"; it's going to hurt their business model if "using an AI to generate output" is treated, in law, as equivalent to "using a printer to run a PostScript program", since then their customers have to do all the legal analysis to work out if a given output from a prompt has resulted in a derived work of the training set or not.
Creator, or proof reader ?
> Or to put it mathematically, your "compilation with extra steps" or obfuscator does not falsify the basic "2 * 12 = 18 + 6"
It's not a 1:1 relation.
Of course my hypothetical obfuscator also would not produce a 1:1 relation between input and output. It's pretty easy to do that.
Creator, or proof reader ?
Creator, or proof reader ?
Creator, or proof reader ?
Creator, or proof reader ?
So, why is it different, if I process the input data with an LLM algorithm instead of with my algorithm?
Creator, or proof reader ?