Creator, or proof reader ?
Creator, or proof reader ?
Posted May 11, 2024 4:16 UTC (Sat) by drago01 (subscriber, #50715)In reply to: Creator, or proof reader ? by Wol
Parent article: Debian dismisses AI-contributions policy
Ok general: technology comes and it's either useful and stays and gets improved or it isn't and goes away.
Banning is not really a solution.
Posted May 11, 2024 11:10 UTC (Sat)
by josh (subscriber, #17465)
[Link] (22 responses)
Posted May 11, 2024 13:51 UTC (Sat)
by Paf (subscriber, #91811)
[Link] (21 responses)
Posted May 11, 2024 14:07 UTC (Sat)
by josh (subscriber, #17465)
[Link] (20 responses)
If AI-generated text is *not* copyrightable, then AI becomes a means of laundering copyright: Open Source and/or proprietary code goes in, public domain code comes out. If the law or jurisimprudence of some major jurisdictions decide to allow that, that's a disaster for Open Source licensing.
Posted May 12, 2024 1:23 UTC (Sun)
by Paf (subscriber, #91811)
[Link] (4 responses)
Posted May 12, 2024 7:55 UTC (Sun)
by Wol (subscriber, #4433)
[Link] (3 responses)
If I get permission (or use a Public Domain work) to set a piece of music in, let's say, lilypond, I can quite legally slap my copyright on it, and forbid people from copying.
Okay, the notes themselves are not copyright - I can't stop people taking my (legally acquired) copy, and re-typsetting it, but I can stop them sticking it in a photocopier.
One of the major points (and quite often a major problem) of copyright is that I only have to make minor changes to a work, and suddenly the whole work is covered by my copyright. A tactic often used by publishers to try and prevent people copying Public Domain.
Cheers,
Posted May 12, 2024 9:05 UTC (Sun)
by mb (subscriber, #50428)
[Link] (1 responses)
Huh? Is that really how (I suppose) US Copyright works? I make a one-liner change to the Linux kernel and then I have Copyright on the whole kernel? I doubt it.
Posted May 12, 2024 10:29 UTC (Sun)
by kleptog (subscriber, #1183)
[Link]
The key to understanding this is that copyright covers "works". So if you take the kernel source, make some modifications and publish a tarball, you own the copyright on the tarball ("the work"). That doesn't mean that you own the copyright to every line of code inside that tarball. Someone could download your tarball, delete your modifications add different one and create a new tarball and now their tarball has nothing to do with yours.
Just cloning a repo doesn't create a new work though, because they're no creativity involved.
In fact, one of the features of open-source is that the copyright status of a lot of code is somewhat unclear but that it doesn't actually matter because open-source licences mean you don't actually need to care. If you make a single line patch, does that construe a "work" that's copyrightable? If you work together with someone else on a patch, can you meaningfully distinguish your copyrighted code, your coauthor's or the copyright of the code you modified?
Copyright law has the concept of joint-ownership and collective works, but copyright law doesn't really have a good handle on open-source development.
Posted May 14, 2024 1:12 UTC (Tue)
by mirabilos (subscriber, #84359)
[Link]
So, in this specific example, the bar is a bit higher, but yeah, the point stands.
Posted May 12, 2024 14:44 UTC (Sun)
by drago01 (subscriber, #50715)
[Link] (14 responses)
If a LLM does that, why would it be? Same goes for any other content, as long as it doee not generate copies of the original.
Posted May 12, 2024 15:15 UTC (Sun)
by mb (subscriber, #50428)
[Link] (12 responses)
So, it is also Ok to use a Non-AI code obfuscator to remove Copyright, as long as the output does not look like the input anymore?
Posted May 12, 2024 16:04 UTC (Sun)
by bluca (subscriber, #118303)
[Link] (11 responses)
Posted May 12, 2024 16:18 UTC (Sun)
by mb (subscriber, #50428)
[Link] (10 responses)
What amount of sparkle dust is needed for a computer program that takes $A and produces $B out of $A not to be considered "compilation with extra steps"?
LLMs are computer programs that produce an output for given inputs. There is no magic involved. It's a mapping of inputs+state => output.
Posted May 12, 2024 18:10 UTC (Sun)
by kleptog (subscriber, #1183)
[Link] (2 responses)
The law is run by humans not computers so this question is irrelevant. All that matters is: does the tool produce an output that somehow affects the market of an existing copyrighted work? How it is done is not relevant.
So an obfuscator doesn't remove copyright because removing copyright isn't a thing. Either the output is market-substitutable for some copyrighted work, or it isn't.
LLMs do not spontaneously produce output, they are prompted. If an LLM reproduces a copyrighted work, then that is the responsibility of the person who made the prompt. It's fairly obvious that LLM do not reproduce copyrighted works in normal usage so you can't argue that LLMs have a fundamental problem with copyright.
(I guess you could create an LLM that, without a prompt, reproduced the entire works of Shakespeare. You could argue such an LLM would violate Shakespeare's copyright, if he had any. That's not a thing with the LLMs currently available though. In fact, they're going to quite some effort to ensure LLMs do not reproduce entire works, because that is an inefficient use of resources (ie money), they don't care about copyright per se.)
Posted May 12, 2024 19:33 UTC (Sun)
by mb (subscriber, #50428)
[Link] (1 responses)
That is not obvious at all.
By that same reasoning my code obfuscator would be Ok to use.
But the output of the obfuscator obviously is a derived work of the input. Right?
Or does using a more complex mixing algorithm suddenly make it not a derived work of the input?
Posted May 13, 2024 7:30 UTC (Mon)
by kleptog (subscriber, #1183)
[Link]
> That is not obvious at all.
Have you actually used one?
> But the output of the obfuscator obviously is a derived work of the input. Right?
Not at all. "Derived work" is a legal term not a technical one. Running a copyrighted work through an algorithm does not necessarily create a derived work. In copyright law, a derivative work is an expressive creation that includes major copyrightable elements of a first, previously created original work (the underlying work). If you hash a copyrighted file, the resulting hash is not a derived work simply because it's lost everything that is interesting about the original work.
If your obfuscator has a corresponding deobfuscator that can return the original retaining the major copyrightable elements, then there may be no copyright on the obfuscated file, but as soon as you deobfuscate it, the copyright returns.
Honestly, this feels what "What colour are your bits?"[1] all over again. Are you aware of that article? Statements like this:
> Or does using a more complex mixing algorithm suddenly make it not a derived work of the input? What amount of token stirring is needed?
seem to indicate you are not.
Posted May 12, 2024 20:10 UTC (Sun)
by Wol (subscriber, #4433)
[Link] (6 responses)
Can you write an anti-LLM, that given the LLM's output, would reverse it back to the original question?
Cheers,
Posted May 12, 2024 20:36 UTC (Sun)
by mb (subscriber, #50428)
[Link] (5 responses)
No. That's not possible.
You can't reverse 18+6 into 2*12, because it could also have been 4*6 or anything else that fits the equation. There is an endless number of possibilities.
So, does is the output still a derived work of the input? If so, why is an LLM different?
Posted May 12, 2024 21:01 UTC (Sun)
by gfernandes (subscriber, #119910)
[Link] (4 responses)
Who uses am obfuscator? The producer of the works because said producer wants an extra layer/hurdle to protect *their* copyright of their original works.
Who uses an LLM? Obviously *not _just_* the producer of the LLM. And *because* of this, the LLM is fundamentally different as far as copyright goes.
The user can cause the LLM to leak copyrighted training material that the _producer_ of the LLM did not license!
This is impossible in the context of an obfuscator.
In fact there is an ongoing case which might bring legal clarity here - NYT v OpenAI.
Posted May 13, 2024 5:58 UTC (Mon)
by mb (subscriber, #50428)
[Link] (3 responses)
Nope. I use it on foreign copyrighted work to get public domain work out of it. LLM-style.
Posted May 13, 2024 6:17 UTC (Mon)
by gfernandes (subscriber, #119910)
[Link] (2 responses)
Posted May 13, 2024 8:56 UTC (Mon)
by mb (subscriber, #50428)
[Link] (1 responses)
Posted May 13, 2024 9:51 UTC (Mon)
by farnz (subscriber, #17727)
[Link]
It's not different - the output of an LLM may be a derived work of the original. It may also be a non-literal copy, or a transformative work, or even unrelated to the input data.
There's a lot of "AI bros" who would like you to believe that using an LLM automatically results in the output not being a derived work of the input, but this is completely untested in law; the current smart money suggests that "generative AI" output (LLMs, diffusion probabilistic models, whatever) will be treated the same way as human output - it's not automatically a derived work just because you used an LLM, but it could be, and it's on the human operator to ensure that copyright is respected.
It's basically the same story as a printer in that respect; if the input to the printer results in a copyright infringement on the output, then no amount of technical discussion about how I didn't supply the printer with a copyrighted work, I supplied it with a PostScript program to calculate π and instructions on which digits of π to interpret as a bitmap will get me out of trouble. Same currently applies to LLMs; if I get a derived work as output, that's my problem to deal with.
This, BTW, is why "AI bros" would like to see the outputs of LLMs deemed as "non-infringing"; it's going to hurt their business model if "using an AI to generate output" is treated, in law, as equivalent to "using a printer to run a PostScript program", since then their customers have to do all the legal analysis to work out if a given output from a prompt has resulted in a derived work of the training set or not.
Posted May 12, 2024 18:06 UTC (Sun)
by farnz (subscriber, #17727)
[Link]
The question you're reaching towards is "at what point is the LLM's output a derived work of the input, and at what point is it a transformative work?".
This is an open question; it is definitely true that you can get LLMs to output things that, if a human wrote them, would clearly be derived works of the inputs (and smart money says that courts will find that "I used an LLM" doesn't get you out of having a derived work here). Then there's a hard area, where something written by a human would also be a derived work, but proving this is hard (and this is where LLMs get scary, since they make it very quick to rework things such that no transformative step has taken place, and yet it's not clear that this is a derived work, where humans have to spend some time on it).
And then we get into the easy case again, where the output is clearly transformative of the set of inputs, and therefore not a copyright infringement.
Posted May 14, 2024 2:31 UTC (Tue)
by viro (subscriber, #7872)
[Link]
As far as I'm concerned, anyone caught at using that deserves the same treatment as somebody who engages in any other form of post-truth - "not to be trusted ever after in any circumstances".
Creator, or proof reader ?
Creator, or proof reader ?
Creator, or proof reader ?
Creator, or proof reader ?
Creator, or proof reader ?
Wol
Creator, or proof reader ?
>make minor changes to a work, and suddenly the whole work is covered by my copyright.
Creator, or proof reader ?
Creator, or proof reader ?
Creator, or proof reader ?
Creator, or proof reader ?
> Same goes for any other content, as long as it doee not generate copies of the original.
Creator, or proof reader ?
Creator, or proof reader ?
How many additional input parameters ("when", "where", "purpose", etc...) to the algorithm are needed to cross the magic barrier?
Why is that different from my obfuscator, that produces an output for given inputs? Why can't it cross the magic barrier, without being called LLM?
Creator, or proof reader ?
Creator, or proof reader ?
The output is obviously not a copy of the input. You can compare it and it looks completely different.
And I don't see why this would be different for an LLM.
What amount of token stirring is needed?
Creator, or proof reader ?
Creator, or proof reader ?
Wol
Creator, or proof reader ?
> Or to put it mathematically, your "compilation with extra steps" or obfuscator does not falsify the basic "2 * 12 = 18 + 6"
It's not a 1:1 relation.
Of course my hypothetical obfuscator also would not produce a 1:1 relation between input and output. It's pretty easy to do that.
Creator, or proof reader ?
Creator, or proof reader ?
Creator, or proof reader ?
Creator, or proof reader ?
So, why is it different, if I process the input data with an LLM algorithm instead of with my algorithm?
Creator, or proof reader ?
Creator, or proof reader ?
Creator, or proof reader ?