Debian dismisses AI-contributions policy
Debian dismisses AI-contributions policy
Posted May 10, 2024 21:45 UTC (Fri) by kleptog (subscriber, #1183)Parent article: Debian dismisses AI-contributions policy
Ok, but then we should also ban contributions by any users that have ever read/watched copyrighted material without the copyright holder's consent. You don't know how they might be contaminated. /s
In general I'm against having rules you cannot enforce in any meaningful way and therefore open to abuse. Apparently the term for this is "void for vagueness" (I asked ChatGPT). The best you can do is some kind of statement whereby you say that Debian developers are responsible for their work and LLMs should be used with caution. But if you can't enforce it then it's basically useless to say so.
Posted May 11, 2024 11:50 UTC (Sat)
by josh (subscriber, #17465)
[Link] (37 responses)
AI, on the other hand, should be properly considered a derivative work of the training material. The alternative would be to permit AI to perform laundering of copyright violations.
Posted May 11, 2024 12:09 UTC (Sat)
by bluca (subscriber, #118303)
[Link] (21 responses)
Posted May 11, 2024 13:20 UTC (Sat)
by josh (subscriber, #17465)
[Link] (19 responses)
Posted May 11, 2024 15:13 UTC (Sat)
by Wol (subscriber, #4433)
[Link] (2 responses)
And that is a *complete* misunderstanding. I'm in Europe, and if an AI chucked my code *out*, I don't see any problem suing (apart from the lack of money, of course ...)
Feeding stuff INTO an LLM is perfectly legal - YOU DON'T NEED TO GET PERMISSION. So as a copyright holder I can't complain, unless they ignored my opt-out.
But the law says nothing about my works losing copyright status, or copyright not applying, or anything of that sort. My work *inside* the LLM is still my work, still my copyright. And if the LLM regurgitates it substantially intact, it's still my work, still my copyright, and anybody *using* it does NOT have my permission, nor the law's, to do that.
Cheers,
Posted May 11, 2024 15:23 UTC (Sat)
by mb (subscriber, #50428)
[Link] (1 responses)
Most of the time the outputs are different enough from the training data so that it's hard to say which part of the input data influenced the output to what degree. But at the same time it's clear that only the training data (plus the prompt) can influence the output.
You can do the same thing manually.
LLMs - apparently with some unicorn sparkle magic - remove copyright, by doing the same thing. How does that make sense?
Posted May 13, 2024 7:07 UTC (Mon)
by gfernandes (subscriber, #119910)
[Link]
Clearly not true.
Posted May 11, 2024 16:55 UTC (Sat)
by bluca (subscriber, #118303)
[Link] (15 responses)
Posted May 11, 2024 17:17 UTC (Sat)
by mb (subscriber, #50428)
[Link] (6 responses)
Yes, in general I agree.
But we must be cautious, that the holes are not equipped with check valves [1]. The holes shall benefit free work as well and not only proprietary work.
When do we see the LLM trained on all of Microsoft's internal code? When do we use that to improve Wine? That wouldn't be a problem, right? No material is transferred through the LLM, after all. Right?
Posted May 11, 2024 17:24 UTC (Sat)
by bluca (subscriber, #118303)
[Link] (5 responses)
The end goal is that there's no proprietary software, just software. We don't get there by making copyright even more draconian that it is now, and it's already really bad.
> When do we see the LLM trained on all of Microsoft's internal code?
As a wild guess, probably when it gets out of the atrociously horrendous internal git forge it lives in right now and into Github. Which is not anytime soon, or likely ever, sadly, because it would cost an arm and a leg and throw most of the engineering org into utter chaos. One can only wish.
Posted May 11, 2024 17:38 UTC (Sat)
by mb (subscriber, #50428)
[Link] (3 responses)
Yes. But we also don't get there my circumventing all Open Source licenses and installing a check valve into the direction of proprietary software.
The end goal of having "just software" actually means that everything is Open Source. (The other option would be to kill Open Source).
Not only the end goal is important, but also the intermediate steps.
Posted May 11, 2024 17:41 UTC (Sat)
by bluca (subscriber, #118303)
[Link] (2 responses)
You'll be delighted to know that's not how any of this works, then - it's just autocomplete with some extra powers and bells and whistles, it doesn't circumvent anything
Posted May 11, 2024 21:22 UTC (Sat)
by flussence (guest, #85566)
[Link] (1 responses)
Posted May 12, 2024 11:39 UTC (Sun)
by bluca (subscriber, #118303)
[Link]
* (when only fully trusted workloads are executed, no malware allowed, pinky swear)
Posted May 13, 2024 10:16 UTC (Mon)
by mathstuf (subscriber, #69389)
[Link]
> As a wild guess, probably when it gets out of the atrociously horrendous internal git forge it lives in right now and into Github. Which is not anytime soon, or likely ever, sadly, because it would cost an arm and a leg and throw most of the engineering org into utter chaos. One can only wish.
Alas, this is kind of my baseline for believing Microsoft's stance that Copilot doesn't affect copyright: eat the same damn cake you're forcing down everyone else's throats (IIRC you work at Microsoft, but the "you" here is at Microsoft PR/lawyers, not bluca specifically).
If Copilot really can only be trained on code accessible over the Github API and not raw git repos, that seems a bit short-sighted, no?
Posted May 12, 2024 10:48 UTC (Sun)
by MarcB (subscriber, #101804)
[Link] (7 responses)
Not necessarily. However, should you lose - or should your lawyers consider this a likely outcome - you can use your huge amount of money to strike a licensing deal with the plaintiff. Even if you overpay, you still win, because you now have exclusive access.
See the Google/Reddit deal.
Posted May 12, 2024 11:38 UTC (Sun)
by bluca (subscriber, #118303)
[Link] (6 responses)
Posted May 12, 2024 11:55 UTC (Sun)
by josh (subscriber, #17465)
[Link] (5 responses)
When copyright ceases to exist, all software will be free to copy, modify, and redistribute. Until then, AI training should have to respect Open Source licenses just like everyone else does.
Posted May 12, 2024 12:00 UTC (Sun)
by bluca (subscriber, #118303)
[Link]
Posted May 12, 2024 19:55 UTC (Sun)
by Wol (subscriber, #4433)
[Link]
I'm not a fan of copyright, but "never" is just as bad as "for ever".
The US got it approximately right with its "fourteen years". The majority of any value, for any work, is extracted in the first 10 years or so. Beyond that, most works are pretty worthless.
So let's make it a round 15 - all works are automatically protected for 15 years from publication - but if you want to avail yourself of that protection you must put the date on the work. After that, if the work has value you (as a real-person author, or the "heir alive at publication") can renew the copyright on a register for successive 15-year intervals (PLURAL) for a smallish fee.
And for people like Disney, Marvel, etc etc, you can trademark your work to keep control of your valuable universe if you wish.
So this will achieve the US aim of "encouraging works into the Public Domain" and works won't "rot in copyright" because people won't pay to extend it.
Cheers,
Posted May 12, 2024 20:05 UTC (Sun)
by Wol (subscriber, #4433)
[Link]
And this is the whole point of Berne. Different countries, different rules (same basic framework). But works have to be protected identically regardless of nationality. Which comes as a result of the American abuse of copyright pre-1984'ish. One only has to look at the AT&T/BSD suit, where AT&T removed copyright notices and effectively placed OTHER PEOPLES' COPYRIGHTED WORK into the Public Domain.
Going back, Disney's Fantasia where they used European works and completely ignored copyright. Go back even further to Gilbert & Sullivans "The Pirates of New York" where they had to go to extreme lengths to prevent other people copyrighting their work and preventing them from performing their own work in the US.
THERE IS NO SPECIAL EXCEPTION FOR AI. Not even in Europe. As a person, you are free to study copyrighted works for your own education. European Law makes it explicit that "educating" an AI is the same as educating a person. Presumably the same rules apply to an AI regurgitating what it's learnt, as to a human. If it's indistinguishable from (or clearly based on) the input, then it's a copyright violation. The problem is, be it human or AI, what do you mean by "clearly based on".
Cheers,
Posted Jun 10, 2024 15:42 UTC (Mon)
by nye (subscriber, #51576)
[Link] (1 responses)
I'm practically speechless that you would lie so brazenly as to say this in the same thread as espousing approximately the most maximalist possible interpretation of copyright.
Honestly, it's threads like this one that remind me of how ashamed I am that I once considered myself part of the FOSS community. It's just... awful. Everyone here is awful. Every time I read LWN I come away thinking just a little bit less of humanity. Today, you've played your part.
Posted Jun 10, 2024 16:34 UTC (Mon)
by corbet (editor, #1)
[Link]
Posted May 13, 2024 9:50 UTC (Mon)
by LtWorf (subscriber, #124958)
[Link]
Posted May 11, 2024 13:56 UTC (Sat)
by Paf (subscriber, #91811)
[Link] (9 responses)
AI, on the other hand, should be properly considered a derivative work of the training material. The alternative would be to permit AI to perform laundering of copyright violations.”
I would like to understand better why this is. Plenty of things in my brain are in fact covered by copyright and I could likely violate quite a bit of copyright from memory. Instead it’s entirely about how much of the input material is present in the output.
If we’re just saying “humans are different”, it would be nice to understand *why* in detail and if anything non human could ever clear those hurdles. I get the distinct sense a lot of these arguments actually boil down to “humans are special and nothing else is like a human, because humans are special”
Posted May 11, 2024 14:38 UTC (Sat)
by willy (subscriber, #9762)
[Link] (2 responses)
Posted May 12, 2024 1:29 UTC (Sun)
by Paf (subscriber, #91811)
[Link] (1 responses)
Posted May 13, 2024 1:38 UTC (Mon)
by raven667 (subscriber, #5198)
[Link]
Sci-fi is also a fictional scenario that swaps people for aliens or AI or whatever to be able to talk about power dynamics and relationships without existing bias creeping in, but that doesn't mean that LLMs are "alive" or "moral agents" in any way, they are no where near complex and complete enough for that to be a consideration. People see faces in the side of a mountain or a piece of toast, and in the same way perceive the output of LLMs, mistaking cogent-sounding statistical probability with intelligence. There is no there there because while an LLM might in some small way approximate thought, it's thoroughly lobotomized with no concept of concepts.
Posted May 11, 2024 20:21 UTC (Sat)
by flussence (guest, #85566)
[Link] (5 responses)
Are you saying there's a threshold of "AI-ness", whereby in crossing it, someone distributing a 1TB torrent of Disney DVD rips and RIAA MP3s, encrypted with a one time pad output from a key derivation function with a trivially guessable input, and being caught doing so, would result in the torrent file itself being arrested instead? Does a training set built by stealing the work of others have legal personhood now? Does the colour of the bits and the intent of the deed no longer matter to a court if the proponent of the technology is sufficiently high on their own farts?
Posted May 12, 2024 1:28 UTC (Sun)
by Paf (subscriber, #91811)
[Link] (4 responses)
Posted May 13, 2024 10:54 UTC (Mon)
by LtWorf (subscriber, #124958)
[Link] (3 responses)
Posted May 13, 2024 15:40 UTC (Mon)
by atnot (subscriber, #124910)
[Link] (2 responses)
LLMs don't have that, they just try to predict what the answer would be on stackoverflow. Including aparently, much to my delight, "closed as duplicate". If you try using them for actually writing code, it very quickly becomes clear they have no actual understanding of the language beyond stochastically regurgitating online tutorials[1]. They falter as soon as you ask for something that isn't a minor variation of a common question or something that has been uploaded on github thousands of times.
If we are to call both of these things "learning", we do have to acknowledge that they are drastically different meanings of the therm.
[1] And no, answers to naive queries about how X works do not prove it "understands" X, merely that the training data contains enough instances of this question being answered to be memorizeable. Which for a language like C is going to be a lot. Consider e.g. that an overwhelming majority of universities in the world have at least one C course.
Posted May 13, 2024 15:44 UTC (Mon)
by bluca (subscriber, #118303)
[Link]
That's really not true for the normal use case, which is fancy autocomplete. It doesn't just regurgitate online tutorials or stack overflow, it provides autocompletion based on the body of work you are currently working on, which is why it's so useful as a tool. The process is the same stochastic parroting mind you, of course language models don't really learn anything in the sense of gaining an "understanding" of something in the human sense.
Posted May 13, 2024 20:39 UTC (Mon)
by rschroev (subscriber, #4164)
[Link]
Have you tried something like CoPilot? I've been trying it out a bit over the last three weeks (somewhat grudgingly). One of the things that became clear quite soon is that it does not just gets it code from StackOverflow and GitHub and the like; it clearly tries to adapt to the body of code I'm working on (it certainly doesn't always gets it right, but that's a different story.) An example, to make things more concrete. Let's say I have a struct with about a dozen members, and a list of key-value pairs, where those keys are the same as the names of the struct members, and I want to assign the values to the struct members. I'll start writing something like: It then doesn't take long before CoPilot starts autocompleting with the remaining struct members, offering me the exact code I was trying to write, even when I'm pretty sure the names I'm using are unique and not present in publicly accessible sources. I'm not commenting on the usefulness of all this; I'm just showing that what it does is not just applying StackOverflow and GitHub to my code. We probably should remember that LLMs are not all alike. It's very well possible that e.g. ChatGPT would have a worse "understanding" (for lack of a better word) of my code, and would rely much more on what it learned before from public sources.
Posted May 11, 2024 21:57 UTC (Sat)
by kleptog (subscriber, #1183)
[Link] (1 responses)
No, the alternative is to consider copyright infringment based on how much something resembles a copyrighted work. Whether the copyrighted work was part of the training set is not relevant to this determination. This is pretty much the EU directive position.
It's pretty much the same idea that allows search engines to process all the information on the internet without having to ask the copyright holders, but they can't just reproduce those pages for users.
Posted May 12, 2024 0:39 UTC (Sun)
by NYKevin (subscriber, #129325)
[Link]
Incidentally, this is also the US position, although the rules for the training process itself remain somewhat opaque (unlike in the EU).
Posted May 12, 2024 1:25 UTC (Sun)
by Paf (subscriber, #91811)
[Link] (2 responses)
*Why* is human output copyrightable and AI output not? Can you explain this, and give reasons? You've been stating it, but not giving reasons.
Posted May 12, 2024 2:13 UTC (Sun)
by NYKevin (subscriber, #129325)
[Link]
Posted May 14, 2024 1:20 UTC (Tue)
by mirabilos (subscriber, #84359)
[Link]
Robotically made things, or things made by animals, are not works that reflect personal creativity.
Debian dismisses AI-contributions policy
Debian dismisses AI-contributions policy
Debian dismisses AI-contributions policy
Debian dismisses AI-contributions policy
Wol
Debian dismisses AI-contributions policy
The output should be considered a derived work of all training data.
Take Open Source code and manually obfuscate it until nobody can prove its origin. It does not loose its Copyright status by doing that, though. And it's a lot of work. By far not click-of-a-button. So it's not done very often.
Debian dismisses AI-contributions policy
Debian dismisses AI-contributions policy
Debian dismisses AI-contributions policy
If the LLMs learnt from large proprietary code bases, too, then I would actually be happy with the status quo.
But currently the flow of code is basically only from open source into proprietary.
Debian dismisses AI-contributions policy
> If the LLMs learnt from large proprietary code bases, too, then I would actually be happy with the status quo.
> But currently the flow of code is basically only from open source into proprietary.
Debian dismisses AI-contributions policy
> We don't get there by making copyright even more draconian that it is now
Which I currently don't see as a realistic possibility.
Debian dismisses AI-contributions policy
Debian dismisses AI-contributions policy
Debian dismisses AI-contributions policy
Debian dismisses AI-contributions policy
Debian dismisses AI-contributions policy
Debian dismisses AI-contributions policy
Debian dismisses AI-contributions policy
Debian dismisses AI-contributions policy
Debian dismisses AI-contributions policy
Wol
Debian dismisses AI-contributions policy
Wol
Debian dismisses AI-contributions policy
This seems ... extreme. I don't doubt that different people can have different ideas of what "copyright maximalist" means — there are different axes on which things can be maximized. Disagreeing on that does not justify calling somebody a liar and attacking them in this way, methinks.
Whoa there
Debian dismisses AI-contributions policy
Debian dismisses AI-contributions policy
Debian dismisses AI-contributions policy
Debian dismisses AI-contributions policy
Debian dismisses AI-contributions policy
Debian dismisses AI-contributions policy
Debian dismisses AI-contributions policy
Debian dismisses AI-contributions policy
Debian dismisses AI-contributions policy
Debian dismisses AI-contributions policy
Debian dismisses AI-contributions policy
for (auto &kv: kv_pairs) {
if (kv.first == "name")
mystruct.name = kv.second;
// ...
}
Debian dismisses AI-contributions policy
Debian dismisses AI-contributions policy
Debian dismisses AI-contributions policy
Debian dismisses AI-contributions policy
Debian dismisses AI-contributions policy