Class action against GitHub Copilot
Readers outside of the US may not be entirely familiar with the concept of a class-action lawsuit as practiced here. It is a way to seek compensation for a wrong perpetrated against a large number of people without clogging the courts with separate suits from each. The plaintiffs are grouped into a "class", with a small number of "lead plaintiffs" and the inevitable lawyers to represent the class as a whole. Should such a suit prevail, it will typically result in some sort of compensation to be paid to anybody who can demonstrate that they are a member of the class.
Class-action lawsuits have been used to, for example, get compensation for victims of asbestos exposure; they can be used to address massive malfeasance involving a lot of people. In recent decades, though, the class-action lawsuit seems to have become mostly a vehicle for extorting money from a business for the enrichment of lawyers. It is not an uncommon experience in the US to receive a mailing stating that the recipient may be a member of a class in a suit they have never heard of and that, by documenting their status, they can receive a $5 coupon in compensation for the harm that was done to them.
Compensation for the lawyers involved, instead, tends to run into the millions of dollars. Not all class-action lawsuits are abusive in this way, but it happens often enough that it has become second nature to look at a new class-action with a jaundiced eye.
The complaint
The complaint was filed on behalf of two unnamed lead plaintiffs against GitHub, Microsoft, and a multitude of companies associated with OpenAI (which is partially owned by Microsoft and participated in the development of Copilot). It explains at great length how Copilot has been trained on free software, and that it can be made to emit clearly recognizable fragments of that software without any of the associated attribution or licensing information. A few examples are given, showing where the emitted software came from, with some asides on the (poor) quality of the resulting code.
Distribution of any software must, of course, be done in compliance with the licenses under which that software is released. Even the most permissive of free-software licenses do not normally allow the removal of copyright or attribution information. Thus, the complaint argues, the distribution of software by Copilot, which does not include this information, is in violation of the that software's licenses and not, as GitHub seems to claim, a case of fair use. Whether fair use applies to Copilot may well be one of the key turning points in this case.
The members of the class of people who have allegedly been harmed by this activity are defined as:
All persons or entities domiciled in the United States that, (1) owned an interest in at least one US copyright in any work; (2) offered that work under one of GitHub’s Suggested Licenses; and (3) stored Licensed Materials in any public GitHub repositories at any time between January 1, 2015 and the present (the “Class Period”).
It is, as would be expected, a US-focused effort; if there is harm against copyright owners elsewhere in the world, it will have to be addressed in different courts. This wording would seem to exclude developers who have never themselves placed code on GitHub, but whose code has been put there by others — a frequent occurrence.
The list of charges against the defendants is impressive in its length and scope:
- Violation of the Digital Millennium Copyright Act, brought about by the removal of copyright information from the code spit out by Copilot.
- Breach of contract: the violation of the free-software licenses themselves. The failure to live up to the terms of a license is normally seen as a copyright violation rather than a contract issue, but they have thrown in the contract allegation as well.
- Tortious interference in a contractual relationship; this is essentially a claim that GitHub is using free software to compete against its creators and has thus done them harm.
- Fraud: GitHub users, it is claimed, were induced to put their software on GitHub by the promises made in GitHub's terms of service, which are said to be violated by the distribution of that software through Copilot.
- False designation of origin — not saying where the software Copilot "creates" actually comes from.
- Unjust enrichment: profiting by removing licensing information from free software.
- Unfair competition: essentially a restatement of many of the other charges in a different light.
- Breach of contract (again): the contracts in question this time are GitHub's terms of service and privacy policy.
- Violation of the California Consumer Privacy Act: a claim that the plaintiff's personal identifying information has been used and disclosed by GitHub. Exactly which information has been abused in this way is not entirely clear.
- Negligent handling of personal data: another claim related to the disclosure of personal information.
- Conspiracy: because there are multiple companies involved, their having worked together on Copilot is said to be a conspiracy.
So what is this lawsuit asking in compensation for all of these wrongs? It starts with a request for an injunction to force Copilot to include the relevant licensing and attribution information with the code it emits. From there, the requests go straight to money, with attorney's fees being at the top of the list. After that, there are nine separate requests for both statutory and punitive damages. And just in case anybody thinks that the lawyers are thinking too small:
Plaintiffs estimate that statutory damages for Defendants’ direct violations of DMCA Section 1202 alone will exceed $9,000,000,000. That figure represents minimum statutory damages ($2,500) incurred three times for each of the 1.2 million Copilot users Microsoft reported in June 2022.
It seems fair to say that a lot of damage is being alleged here.
Some thoughts
The vacuuming of a massive amount of free software into the proprietary Copilot system has created a fair amount of discomfort in the community. It does, in a way, seem like a violation of the spirit of what we are trying to do. Whether it is a violation of the licenses involved is not immediately obvious, though. Human programmers will be influenced by the code they have seen through their lives and may well re-create, unintentionally, something they have seen before. Perhaps an AI-based system should be forgiven for doing the same.
Additionally, there could be an argument to be made that the code emitted by Copilot doesn't reach the point of copyright violation. The complaint spends a lot of time on the ability to reproduce in Copilot, using the right prompt, a JavaScript function called isEven() — which does exactly what one might expect — from a Creative-Commons-licensed textbook. It is not clear that a slow and buggy implementation of isEven() contains enough creative expression to merit copyright protection, though.
That said, there are almost certainly ways to get more complex — and useful — output from Copilot that might be considered to be a copyright violation. There are a lot of interesting questions that need to be answered regarding the intersection of copyright and machine-learning systems that go far beyond free software. Systems that produce images or prose, for example, may be subject to many of the same concerns. It would be good for everybody involved if some sort of consensus could emerge on how copyright should apply to such systems.
A class-action lawsuit is probably not the place to build that consensus. Lawsuits are risky affairs at best, and the chances of nonsensical or actively harmful rulings from any given court are not small. Judges tend to be smart people, but that does not mean that they are equipped to understand the issues at hand here. This suit could end up doing harm to the cause of free software overall.
The request for massive damages raises its own red flags. As the Software Freedom Conservancy noted in its response to the lawsuit, a core component of the ethical enforcement of free-software licenses is to avoid the pursuit of financial gain. The purpose of an enforcement action should be to obtain compliance with the licenses, not to generate multi-billion-dollar payouts. But such a payout appears to be an explicit goal of this action. Should it succeed, there can be no doubt that many more lawyers will quickly jump into that fray. That, in turn, could scare many people (and companies) away from free software entirely.
Bear in mind that most of these suits end up being settled before going to court. Often, that settlement involves a payment from the defendant without any admission of wrongdoing; the company is simply paying to make the suit go away. Should that happen here, the result will be a demonstration that money can be extracted from companies in this way without any sort of resolution of the underlying issues — perhaps a worst-case scenario.
Copilot does raise some interesting copyright-related questions, and it may
well be, in the end, a violation of our licenses. Machine-learning systems
do not appear to be going away anytime soon, so it will be necessary to
come to some conclusions about how those systems interact with existing
legal structures. Perhaps this class-action suit will be a step in that
direction, but it is hard to be optimistic that there will be any helpful
outcomes from that direction. Perhaps, at least, GitHub users will
receive a coupon they can use to buy a new mouse or something.
Posted Nov 10, 2022 15:56 UTC (Thu)
by q_q_p_p (guest, #131113)
[Link] (33 responses)
Posted Nov 10, 2022 17:28 UTC (Thu)
by zdzichu (subscriber, #17118)
[Link] (26 responses)
Posted Nov 10, 2022 17:44 UTC (Thu)
by q_q_p_p (guest, #131113)
[Link]
Posted Nov 10, 2022 17:53 UTC (Thu)
by fenncruz (subscriber, #81417)
[Link] (24 responses)
Posted Nov 10, 2022 18:50 UTC (Thu)
by NYKevin (subscriber, #129325)
[Link] (23 responses)
Posted Nov 12, 2022 23:02 UTC (Sat)
by apoelstra (subscriber, #75205)
[Link] (22 responses)
This is clearly not in the spirit of the GPL but it's unclear to me whether it matches the letter.
Posted Nov 12, 2022 23:29 UTC (Sat)
by Wol (subscriber, #4433)
[Link] (21 responses)
Demanding that code - licenced under a non-GPL licence - be distributed "in the spirit of the GPL" is doing a major dis-service to the authors of that code!
What is this obsession with the GPL?! Who gave you the right to dictate to me what licence my code should be licenced under!
Cheers,
Posted Nov 13, 2022 8:53 UTC (Sun)
by milesrout (subscriber, #126894)
[Link] (20 responses)
But that's not the demand. The demand is that code derived from GPL-licensed code be distributed under the GPL. That's what the GPL requires.
> What is this obsession with the GPL?! Who gave you the right to dictate to me what licence my code should be licenced under!
Nobody is telling you what licence your code should be distributed under, unless your code is a derivative work of GPL-licensed code in which case YOU have told yourself to distribute it under the GPL by making it such.
Posted Nov 13, 2022 12:59 UTC (Sun)
by Wol (subscriber, #4433)
[Link] (19 responses)
Cheers,
Posted Nov 13, 2022 14:11 UTC (Sun)
by amacater (subscriber, #790)
[Link]
Posted Nov 13, 2022 17:54 UTC (Sun)
by q_q_p_p (guest, #131113)
[Link] (7 responses)
You don't have to agree with me and instead make your own top comment - which license should Copilot use?
Posted Nov 13, 2022 18:16 UTC (Sun)
by Wol (subscriber, #4433)
[Link] (6 responses)
> If instead Copilot would produce only GPL code, this would make FOSS advocates happy and corporate (including open source) shills unhappy. That's win-win situation for me.
Is that GPL2? GPL3? GPL 2 or 3? GPL2+? GPL3+?
Oh - and BSD, MIT, licenses like that are FLOSS. I'm pretty sure their advocates would be LESS than happy with Copilot laundering their code into GPL! If Copilot produces only GPL code, that's a lose-lose situation for a LOT of people. People who are fans of FLOSS ...
Please. Just stop trolling. Just because you're a GPL fanatic doesn't mean other FLOSS people agree with you that the GPL is a "good thing (tm)". GPL3 is a disaster ...
Cheers,
Posted Nov 13, 2022 19:16 UTC (Sun)
by q_q_p_p (guest, #131113)
[Link] (2 responses)
AGPL-3.0-or-later ideally, GPL-3.0-or-later realistically.
> Oh - and BSD, MIT, licenses like that are FLOSS. I'm pretty sure their advocates would be LESS than happy with Copilot laundering their code into GPL!
"(including open source)" was referring to them.
>Just because you're a GPL fanatic doesn't mean other FLOSS people agree with you that the GPL is a "good thing (tm)". GPL3 is a disaster ...
Imagine, I'm also a "people" and I don't think GPL3 to be a disaster.
Posted Nov 13, 2022 21:01 UTC (Sun)
by NYKevin (subscriber, #129325)
[Link] (1 responses)
How about complying with the actual text of the GPL?
> If conditions are imposed on you (whether by court order, agreement or
I express no opinion about whether the output of Copilot is actually a derivative work of its training materials. But *IF* it is, then it needs to comply with multiple incompatible copyleft licenses, so therefore it cannot be distributed at all. The GPL says this in black and white.
Posted Nov 13, 2022 21:20 UTC (Sun)
by q_q_p_p (guest, #131113)
[Link]
That's also an acceptable solution to me.
Posted Nov 13, 2022 20:37 UTC (Sun)
by mpr22 (subscriber, #60784)
[Link] (1 responses)
Posted Nov 13, 2022 21:49 UTC (Sun)
by Wol (subscriber, #4433)
[Link]
It's well known that Linus likes the *practicality* of the GPL2, and is very *unhappy* with the *spirit* of the GPL, which is why Linux has never been licenced 2+, and which is why that move is unlikely ever to be considered.
People apparently use BSD because they want their code to spread. If that code gets incorporated into GPL projects, those projects are *hindering* the spread of the code by adding extra restrictions. Hopefully, the GPL code points back at the original BSD, but if the GPL version out-evolves the BSD one then the wants of the original developers have clearly been trampled on.
What's LEGAL is not always MORAL. The best approach is respect - for the code, for the authors, for people in general.
Cheers,
Posted Nov 14, 2022 9:41 UTC (Mon)
by LtWorf (subscriber, #124958)
[Link]
Then they shouldn't have picked a license that allows to do it?
Posted Nov 14, 2022 9:24 UTC (Mon)
by LtWorf (subscriber, #124958)
[Link] (9 responses)
AFAIK It's only GPL and AGPL.
The authors of the MIT licensed file might not like the GPL but they did not restrict this kind of use.
It is however true that care should be taken to not mix code with incompatible licenses, so it's not so easy as "just license everything under AGPL".
By the way people say AGPL because it's the most restrictive of the famous ones.
Posted Nov 14, 2022 14:29 UTC (Mon)
by Wol (subscriber, #4433)
[Link] (8 responses)
The file is only *distributable* under the GPL, but you cannot apply any licence to code you did not write. The licence(s) that apply to the file are the licences the authors/owners applied. Any recipient can (if they know the history) copy the BSD function from that file, and distribute it under BSD.
Cheers,
Posted Nov 14, 2022 21:07 UTC (Mon)
by LtWorf (subscriber, #124958)
[Link] (7 responses)
I know it's not my code but it seems I'm free to relicense MIT to GPL, and of course license the following mod
Posted Nov 14, 2022 21:50 UTC (Mon)
by sfeam (subscriber, #2841)
[Link] (2 responses)
Posted Nov 15, 2022 9:44 UTC (Tue)
by farnz (subscriber, #17727)
[Link] (1 responses)
I disagree with your analysis of Copilot, by analogy to a human.
Letting a human read text is not, in and of itself, an infringing activity. Nor is the resulting brain state in the human, even if it includes literal copies of text code they read. But the output of a human can itself be infringing, if instead of using my training to inform what I do, I regurgitate memorised chunks of text.
I expect the same principles to apply to Copilot and similar systems; training Copilot is not infringing. The resulting model is not infringing in and of itself. The output from the system, can, however, be infringing, and the degree to which it is a legal problem depends on the degree of infringement, and the extent to which the system disguises the origins of the code (in terms of contributory infringement, if I tell you that I'm showing you sample code from a given source, and you copy it, that's a different case to if I give you code that I do not attribute).
Posted Nov 17, 2022 10:25 UTC (Thu)
by NRArnot (subscriber, #3033)
[Link]
for obj in object_list: obj.do_stuff() is surely fair, however the AI arrived at it.
for sd5obj in sd5_get_blue_meanies(): sd5obj.frobnicate_from_sd4( ) is surely a verbatim copy of somebody's identifiable code, and should at the very least be attributed.
Posted Nov 14, 2022 22:17 UTC (Mon)
by Wol (subscriber, #4433)
[Link] (3 responses)
You clearly didn't read the thread you pointed at.
The MIT licence allows SUBlicencing, ie applying other terms on top. According to your thread, there is no such thing as relicencing, certainly I've never seen that in any legal document.
And as I understand English, RElicencing means throwing out the old licence, and replacing it with a new one. No open source licence I've come across allows any such thing. As I understand the meaning of the word "relicencing", in practice this has precious little difference from handing over the copyright.
And that's also why/how you can extract the original code under the original licence. If it had been RElicenced, you wouldn't be able to get the original licence back.
Cheers,
Posted Nov 15, 2022 23:46 UTC (Tue)
by NYKevin (subscriber, #129325)
[Link] (2 responses)
* With both the GPL and the MIT license, A automatically extends a license to each person who obtains a copy of the software, by whatever means. In our example, this means that A gives a license to B and to C.
Posted Nov 16, 2022 5:50 UTC (Wed)
by unilynx (guest, #114305)
[Link] (1 responses)
The license is a promise not to sue for copyright infringement providing you follow certain requirements.
C has the same rights as B in your case no matter what B said, as B has no standing to sue for copyright infringement as he does not have any copyright.
(Unless A specifically provided the product to B with the understanding that the MIT license would only be valid to B)
Posted Nov 17, 2022 19:48 UTC (Thu)
by NYKevin (subscriber, #129325)
[Link]
Posted Nov 10, 2022 21:10 UTC (Thu)
by developer122 (guest, #152928)
[Link] (4 responses)
There's a good reason that OpenZFS is distributed as an nvidia-style module despite being copyleft.
Posted Nov 10, 2022 21:28 UTC (Thu)
by q_q_p_p (guest, #131113)
[Link] (3 responses)
Posted Nov 11, 2022 0:43 UTC (Fri)
by pabs (subscriber, #43278)
[Link] (2 responses)
Posted Nov 17, 2022 10:34 UTC (Thu)
by NRArnot (subscriber, #3033)
[Link] (1 responses)
Posted Nov 27, 2022 16:19 UTC (Sun)
by flussence (guest, #85566)
[Link]
Posted Nov 17, 2022 13:04 UTC (Thu)
by esemwy (guest, #83963)
[Link]
Posted Nov 10, 2022 16:26 UTC (Thu)
by bluca (subscriber, #118303)
[Link] (75 responses)
The demand for money makes it even more obvious this is a malicious effort by some copyright trolls, pushing for a maximalist interpretation of the law, which would be bad news for everybody but their lawyers and wallets.
Posted Nov 10, 2022 17:14 UTC (Thu)
by Rigrig (subscriber, #105346)
[Link] (31 responses)
Training an AI should be fine, but when it faithfully reproduces input it gets tricky: Take the Regardless if it was produced by AI or a human, that sure smells like copyright violation to me.
Posted Nov 10, 2022 19:11 UTC (Thu)
by bluca (subscriber, #118303)
[Link] (30 responses)
What this is really used for in the real world is to take care of boilerplate and such.
Posted Nov 10, 2022 21:05 UTC (Thu)
by ballombe (subscriber, #9523)
[Link] (27 responses)
Posted Nov 10, 2022 21:17 UTC (Thu)
by bluca (subscriber, #118303)
[Link] (26 responses)
Posted Nov 11, 2022 9:51 UTC (Fri)
by gspr (guest, #91542)
[Link] (11 responses)
Posted Nov 11, 2022 10:24 UTC (Fri)
by bluca (subscriber, #118303)
[Link] (10 responses)
Copyright maximalism is bad for us. The only reason this gains any traction is because it's done by Microsoft, if Copilot had been built by Mozilla reactions would be quite different, and that's just sad and short-sighted.
Posted Nov 11, 2022 11:11 UTC (Fri)
by gspr (guest, #91542)
[Link] (9 responses)
Posted Nov 11, 2022 11:47 UTC (Fri)
by bluca (subscriber, #118303)
[Link] (8 responses)
Posted Nov 11, 2022 12:14 UTC (Fri)
by gspr (guest, #91542)
[Link] (7 responses)
That's the question at the heart of the conundrum, and it's a very complex one. Sure, we can probably agree that Copilot does not contain bit-for-bit copies of copyrighted material. But that's not the bar. Distributing a lossily compressed copy of a copyrighted image without permission can still be infringement. On the other hand, distributing the average value of all the pixels in said image certainly is not. The spectrum in-between is where it gets hard, and this is (in my opinion) probably where Copilot and similar fall.
Posted Nov 11, 2022 15:32 UTC (Fri)
by Wol (subscriber, #4433)
[Link]
But if somebody else then takes my work and publishes it in a newspaper, that's not a legal document. Me putting it in a legal document did not strip copyright, it just gave ME immunity. The publisher can still get done for it.
Cheers,
Posted Nov 11, 2022 19:55 UTC (Fri)
by bluca (subscriber, #118303)
[Link] (5 responses)
Posted Nov 12, 2022 15:24 UTC (Sat)
by farnz (subscriber, #17727)
[Link] (4 responses)
It can be defined as something that distributes copies of something in exactly the same way as a human engineer can be defined as someone who distributes copies of something.
If I have, in my notebooks, details of how to do something in kernel-dialect C, and I read a snippet of code from those details then adapt it to the codebase I'm working on, then I've distributed a copy of the snippet in my notebook. If the snippet in my notebook is not protected by copyright, then this is not an issue; if it is, then I've potentially infringed copyright by copying out that snippet and adapting it.
The same applies to Copilot - its model takes the place of the engineer's notebooks and knowledge of what they can find in their notebooks, and its output is potentially infringing in exactly the same way as a human engineer's output is potentially infringing, complete with fun questions around "non-literal copying".
Posted Nov 17, 2022 13:16 UTC (Thu)
by esemwy (guest, #83963)
[Link] (3 responses)
Posted Nov 17, 2022 14:08 UTC (Thu)
by farnz (subscriber, #17727)
[Link] (2 responses)
You're getting into the philosophy of what it means to be human, and missing my point at the same time.
My point is simply that if a human can infringe while doing the same thing that Copilot does, then it's absurd to say that Copilot cannot infringe because it's an AI - rather, it's reasonable to say that Copilot's ability to infringe copyright is bounded on the lower end by the degree to which a human doing the same thing can infringe copyright.
I've also provided a sketch of how a human can infringe copyright, which I can expand upon if it's not clear; unless you can demonstrate that Copilot is incapable of doing what the human does to infringe, however, you can't then claim that Copilot can't infringe where a human can.
Posted Nov 17, 2022 15:57 UTC (Thu)
by esemwy (guest, #83963)
[Link] (1 responses)
Posted Nov 17, 2022 17:10 UTC (Thu)
by farnz (subscriber, #17727)
[Link]
I'm not following your reasoning here - in what way should Copilot be permitted to do something that would make Microsoft liable for the infringement if an employee did it?
Remember that I'm setting a lower bound - "if Copilot was just a communication interface to a human being at Microsoft who looked at the context sent to and then responded with a code snippet, would Microsoft be liable?". My claim is that if Microsoft would be liable in this variant on Copilot, then Microsoft are also liable if Copilot is, in fact, an "AI" based around machine learning, but that this is a one-way inference - if Microsoft would not be liable if Copilot was a comms channel, this doesn't tell you anything about whether Microsoft are liable if Copilot is actually an AI.
To summarize: my reasoning is that "an AI did it, not a human" should never be a get-out clause - it can increase your liability beyond that you'd face if a human did it, but it can never decrease it.
Posted Nov 11, 2022 11:06 UTC (Fri)
by paulj (subscriber, #341)
[Link] (10 responses)
Posted Nov 11, 2022 13:11 UTC (Fri)
by rahulsundaram (subscriber, #21946)
[Link] (9 responses)
Bluca works for MS but is not involved with GitHub directly as I understand it. He has said before he doesn't think it is important to add such notes but I think the repeated level of participation in topics like these warrants one. It is a complex issue and it was clear from the beginning this is all going to end up in court(s). It may very well end with rulings that affect the future of such tools and even copyright in general. If you are going to come in strongly on one side (even if it happens to be coincidentally favorable to your employer which I can completely accept it is), other folks might want to take that into consideration when evaluating your opinion.
Posted Nov 11, 2022 14:30 UTC (Fri)
by bluca (subscriber, #118303)
[Link] (4 responses)
Posted Nov 11, 2022 14:50 UTC (Fri)
by paulj (subscriber, #341)
[Link]
Posted Nov 12, 2022 20:16 UTC (Sat)
by k8to (guest, #15413)
[Link]
Posted Nov 18, 2022 1:20 UTC (Fri)
by jschrod (subscriber, #1646)
[Link] (1 responses)
You are currently posting 20% of the comments on this article, without disclosing your affiliation.
I.e., you are a MS shill.
*plonk*
PS: I know a lot of folks who work at MS Research, and I'm grateful for the great work they are doing there. It is also obvious that there are some folks at MS who are doing very good work at the Linux kernel. But they are always open about their affiliation, even when commenting articles. And lwn.net is not some obscure Web site with an obscure community -- we are here at the heart of the Linux community that takes such issues seriously and discusses is with open visor.
PPS: For the record: I'm an owner of a company that is a MS partner, but my company has nothing to do with the Linux side of MS's business.
Posted Dec 7, 2022 11:06 UTC (Wed)
by nye (subscriber, #51576)
[Link]
This kind of ad hominem trolling has no place in LWN. Corbet, please for the love of god can we start seeing some temporary bans for repeat trolls like this?
Posted Nov 11, 2022 17:52 UTC (Fri)
by paulj (subscriber, #341)
[Link] (3 responses)
It's almost impossible for such associations not to colour one's thinking at least a little. The enthusiasm shown for the debate by bluca is clear anyway.
Posted Nov 11, 2022 19:47 UTC (Fri)
by bluca (subscriber, #118303)
[Link] (2 responses)
Posted Nov 14, 2022 10:29 UTC (Mon)
by paulj (subscriber, #341)
[Link]
Nor did I "doxx" you. You acknowledged your employement with MS before here on LWN.
Posted Nov 14, 2022 10:43 UTC (Mon)
by LtWorf (subscriber, #124958)
[Link]
Ok, here's some facts: at the moment of writing
your comments amount to the 22% of the comments
there are 40 usernames that commented, so you are clearly over represented
Posted Nov 11, 2022 19:03 UTC (Fri)
by ballombe (subscriber, #9523)
[Link] (2 responses)
Nice to know.
> Do you?
My issue is that there is no verifiable claim about the size of the AI model.
Github made everyone nervous by changing the TOS. They pay the price now.
Posted Nov 11, 2022 19:34 UTC (Fri)
by bluca (subscriber, #118303)
[Link] (1 responses)
Besides, some folks are reimplementing their own server + model, using the same client interface, and yes it's an actual AI model, not an index: https://github.com/moyix/fauxpilot
Posted Nov 13, 2022 11:49 UTC (Sun)
by ballombe (subscriber, #9523)
[Link]
There would be two stages: first locate the relevant code snippet in the database, and then the
The second step is something that AI are well suited to do and nobody is claiming it is violating copyright, except
The whole concept of using function name to infer their implementation requires some kind of storage, from purely information-theoretic consideration if only to conserve Kolmogorov complexity.
> Besides, some folks are reimplementing their own server + model, using the same client interface, and yes it's an actual AI model, not an index: https://github.com/moyix/fauxpilot
So even if copilot is shut down, you can go about your work by using fauxpilot ? Good!
Posted Dec 16, 2022 7:45 UTC (Fri)
by ssmith32 (subscriber, #72404)
[Link] (1 responses)
Posted Dec 18, 2022 2:25 UTC (Sun)
by anselm (subscriber, #2796)
[Link]
That would depend on the exact circumstances. In many jurisdictions, code must exhibit a certain minimal degree of creativity to be eligible for copyright. If the boilerplate code in question is a very obvious or indeed the only sensible way of achieving a certain result in the programming language (and just a hassle to type out), then it may not fall under copyright because it is not sufficiently creative to warrant protection.
In such cases the main advantage of GitHub Copilot is probably that it is able to regurgitate the boilerplate with adapted variable names etc. But if all you're interested in is saving yourself some typing for boilerplate code that you use often, many programming editors have their own facilities to do this in a way that is a lot simpler and safer and doesn't involve referring to a ginormous proprietary search engine with a complete disregard of the legalities and etiquette of sharing code.
Posted Nov 10, 2022 17:15 UTC (Thu)
by MarcB (subscriber, #101804)
[Link] (28 responses)
Why is this "as it should be"? Obviously, an AI system must not be allowed to perform any "copyright washing". Otherwise copyleft licenses would be completely undermined and any leaked, proprietary source code could be "freed" of its license.
The existing exemptions are for the purpose of the mining itself. The final result of this is then subject to a separate check. A research paper, or statistics or some abstract summary would obviously be allowed, but in this case, the output can be the literal input (minus copyright and license information). It is absolutely not clear if this is legal under any jurisdiction.
Posted Nov 10, 2022 19:03 UTC (Thu)
by bluca (subscriber, #118303)
[Link] (27 responses)
Because that's drivel. It is not how this works in the real world, it's completely fabricated clickbait.
Posted Nov 10, 2022 20:41 UTC (Thu)
by MarcB (subscriber, #101804)
[Link] (12 responses)
There are examples of to happening, so it is obviously not fabricated. It might not be an issue for the users of Copilot, because most likely the risk of developers manually copying misattributed/unattributed code from the internet is much higher, but it certainly is an issue for Microsoft.
Even if the code generated by Copilot is not a verbatim copy of the input, it is clear, that an automated transformation is not enough to free code from its original copyright. The questions then would be, how it could be shown that the AI did create the output "on its own" and who carries the burden of this proof (the plaintiff would obviously unable to do so, because they cannot access the model).
In any case, my main point was that the directives exemptions are insufficient to declare such a lawsuit nonsensical in the EU. The directive uses the following definition:
Does this cover the output of source code? Maybe, but not obviously.
Posted Nov 10, 2022 21:02 UTC (Thu)
by Wol (subscriber, #4433)
[Link]
I think it's clear - if the plaintiff can show that the Copilot code is identical to their own, and the defendant (Copilot) had access to their code, then it's up to Copilot to prove it's not a copy.
There's also the question of "who has access to the evidence" - if you possess evidence (or should possess evidence) and fail to produce it, you cannot challenge your opponents claims over it.
So yes it is a *major* headache for Microsoft.
Oh - and as for the guy who thought "everything should be licenced GPL" - there is ABSOLUTELY NO WAY Microsoft will do that. Just ask AT&T what happened when they stuck copyright notices on Unix ...
Cheers,
Posted Nov 10, 2022 21:16 UTC (Thu)
by bluca (subscriber, #118303)
[Link] (9 responses)
Of course it's fabricated, complainers go out of their way to get the tool to spit out what they were looking for and then go "ah-ha!", for clickbait effect, as if it meant something. Just like using one VHS with a copied movie does not mean that the VHS company is responsible for movie piracy. Or just like if google returns a search result with a torrent link for a music track it doesn't mean google is responsible for music piracy, and so on.
> In any case, my main point was that the directives exemptions are insufficient to declare such a lawsuit nonsensical in the EU. The directive uses the following definition:
Of course it covers it, that's exactly what copilot is used for: fills in patterns (boilerplate). Have you every actually used it?
Posted Nov 11, 2022 12:22 UTC (Fri)
by gspr (guest, #91542)
[Link] (8 responses)
does not imply
> it's fabricated
or
> for clickbait effect
> Just like using one VHS with a copied movie does not mean that the VHS company is responsible for movie piracy.
If playing back a new blank VHS tape in a particular way resulted in a blurry copy of said movie, then yeah, it perhaps it would.
> Or just like if google returns a search result with a torrent link for a music track it doesn't mean google is responsible for music piracy, and so on.
I don't see how this is even comparable.
> Of course it covers it, that's exactly what copilot is used for: fills in patterns (boilerplate). Have you every actually used it?
I'm not sure it matters what it's used for by you and your peers, if it comes with an out-of-the-box ability to also do the other things. Again: this is *not* the same as "a disk drive can be used for piracy" – the difference is that Copilot already (possibly, that's the debate) contains within it the necessary information to produce the infringing material.
Posted Nov 11, 2022 13:57 UTC (Fri)
by farnz (subscriber, #17727)
[Link] (2 responses)
To choose an example at one extreme, A&M Records, Inc. v. Napster, Inc. established that while there were non-infringing uses of Napster, Napster's awareness that there were infringing uses of their technology product was enough to establish liability.
And it's worth noting in this context that Napster on its own was not infringing copyright - to infringe copyright, you needed two Napster users to actively make a decision to infringe: one to make the content available, and one to request a copy of infringing content. In other words, one user had to prompt Napster to spit out what they were looking for, and even then it wouldn't do that unless another user had unlawfully supplied that content to their local copy of Napster. In contrast, if Copilot's output infringes, it only needs the prompting user to make it infringe - which doesn't bode well for Microsoft if the court determines that Copilot's output is an infringement.
Posted Nov 11, 2022 14:39 UTC (Fri)
by bluca (subscriber, #118303)
[Link] (1 responses)
Posted Nov 11, 2022 15:07 UTC (Fri)
by farnz (subscriber, #17727)
[Link]
That's a misrepresentation both of the Napster case (where the court deemed that the user's right to ingest copyrighted materials into the system was irrelevant), and of the EU Copyright Directive, which merely says that ingesting publicly available material into your system is not copyright infringement on its own, and that the fact of such ingestion does not make the model infringing. This does not preclude a finding of infringement by the model or its output - it simply means that to prove infringement you can't rely on the training data including your copyrighted material, but instead have to show that the output is infringing.
Posted Nov 11, 2022 14:04 UTC (Fri)
by Wol (subscriber, #4433)
[Link] (4 responses)
So you think that the sale of knives, hammers, screwdrivers etc should be banned? Because they come with an out-of-the-box ability to be used for murder. Come to that, maybe banning cars would be a very good idea, along with electricity, because they're big killers.
It's not the USE that matters. All tools have the *ability* to be mis-used, sometimes seriously. Ban cameras - they take porn pictures. But if the PRIMARY use is ABuse, that's when the law should step in. Everything else has to rely on the courts and common sense.
In the UK, carrying offensive weapons in public is illegal. Yet many of my friends - quite legally - carry very sharp knives. Because they're "tools of the trade" for chef'ing.
Cheers,
Posted Nov 11, 2022 14:19 UTC (Fri)
by gspr (guest, #91542)
[Link] (3 responses)
A pen won't reproduce a copyrighted text without a human inputting missing data, even though it of course can he used to reproduce such a text with human assistance. Copilot, on the other hand, can (maybe!)
Posted Nov 11, 2022 14:33 UTC (Fri)
by bluca (subscriber, #118303)
[Link] (2 responses)
Posted Nov 11, 2022 14:38 UTC (Fri)
by farnz (subscriber, #17727)
[Link]
Your "hence" does not follow from your first statement.
The law says that the act of ingestion does not itself infringe copyright, nor does the fact of ingestion make the model infringe copyright automatically. It does not, however, says that the model is not subject to the original licence if it is found to be infringing copyright, nor does it say that the output of the model is not contributory infringement.
Posted Nov 11, 2022 14:42 UTC (Fri)
by gspr (guest, #91542)
[Link]
Yeah. But it's not allowed to *reproduce* that copyrighted material in a way incompatible with the original license. On one extreme, ingesting the material to produce, say, the parity of all the bits involved, is clearly not "reproduction" - and so is OK. On the other extreme, ingesting it and storing it perfectly in internal storage and spitting it back out on demand, clearly is "reproduction" - and surely not OK.
As I see it, the whole debate is about where between those extremes Copilot falls.
I'm not claiming to have the right answer. In fact, I don't even think I have _a_ answer. But I object to your sweeping statements about this seemingly being an easy and clear case.
Posted Nov 14, 2022 9:28 UTC (Mon)
by geert (subscriber, #98403)
[Link]
"patterns, trends, and correlations". For code, that would be reporting e.g. that 37% of all code that needs to sort something resort to quicksort, instead of reproducing a perfect copy of the source code of your newly-developed sorting algorithm released under the GPL.
Yeah, the "is not limited to" might be considered a loophole, but I guess anything that doesn't follow the spirit would be tossed out...
Posted Nov 11, 2022 17:42 UTC (Fri)
by mathstuf (subscriber, #69389)
[Link] (12 responses)
Posted Nov 11, 2022 19:36 UTC (Fri)
by bluca (subscriber, #118303)
[Link] (11 responses)
Posted Nov 11, 2022 20:07 UTC (Fri)
by mathstuf (subscriber, #69389)
[Link] (10 responses)
Posted Nov 11, 2022 20:44 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Please note, that it's not a question of copyright.
Posted Nov 11, 2022 21:39 UTC (Fri)
by bluca (subscriber, #118303)
[Link] (8 responses)
Posted Nov 12, 2022 3:12 UTC (Sat)
by pabs (subscriber, #43278)
[Link] (7 responses)
Posted Nov 12, 2022 11:29 UTC (Sat)
by bluca (subscriber, #118303)
[Link] (6 responses)
Posted Nov 12, 2022 15:39 UTC (Sat)
by farnz (subscriber, #17727)
[Link] (5 responses)
It would have been wise for Microsoft to train Copilot against their crown jewels (Office and Windows) for two reasons:
Posted Nov 12, 2022 18:31 UTC (Sat)
by bluca (subscriber, #118303)
[Link] (3 responses)
Posted Nov 12, 2022 20:33 UTC (Sat)
by mathstuf (subscriber, #69389)
[Link] (1 responses)
How the hell do you think anyone would get a citation for that? Are you saying that Microsoft doesn't have useful Win32 API usage to train on for Windows developers? Or are you saying that even Microsoft doesn't use it well enough to bother training anything on it?
Posted Nov 12, 2022 22:00 UTC (Sat)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Nov 13, 2022 16:56 UTC (Sun)
by farnz (subscriber, #17727)
[Link]
For 1, it's not about the naysayers, it's about what you can say in court to convince a judge (or jury in some US civil cases) that the naysayers are overreacting. The statement "we trained this against our crown jewels, the Windows and Office codebases, because we are completely certain that its output cannot contain enough of our original code to infringe copyright" is a very convincing statement to a judge or jury - and even if the court finds that Copilot engages in contributory infringement of people's copyright (having seen a demo of it doing so), the court is likely to be lenient on Microsoft as a result - the fact of having trained it against their core business codebases is helpful evidence that any infringement by Copilot's output is unintentional and something Microsoft would fix, because it puts their core business at risk.
And for 2, which part do you want a citation on? That Office and Windows are a big Win32 codebase written by good developers? That people still write code for Win32? That there's boilerplate in Win32 that would be simplified with an AI assistant helping you write the code?
Posted Nov 12, 2022 22:28 UTC (Sat)
by anselm (subscriber, #2796)
[Link]
Posted Nov 14, 2022 10:44 UTC (Mon)
by LtWorf (subscriber, #124958)
[Link]
Posted Nov 10, 2022 20:04 UTC (Thu)
by lkundrak (subscriber, #43452)
[Link] (8 responses)
You're welcome.
Posted Nov 11, 2022 10:25 UTC (Fri)
by bluca (subscriber, #118303)
[Link] (7 responses)
Posted Nov 11, 2022 11:22 UTC (Fri)
by gspr (guest, #91542)
[Link] (6 responses)
Posted Nov 11, 2022 11:48 UTC (Fri)
by bluca (subscriber, #118303)
[Link] (5 responses)
Posted Nov 11, 2022 12:16 UTC (Fri)
by gspr (guest, #91542)
[Link] (4 responses)
Posted Nov 11, 2022 14:28 UTC (Fri)
by bluca (subscriber, #118303)
[Link] (3 responses)
Posted Nov 11, 2022 14:59 UTC (Fri)
by lkundrak (subscriber, #43452)
[Link] (2 responses)
Posted Nov 11, 2022 16:30 UTC (Fri)
by bluca (subscriber, #118303)
[Link] (1 responses)
Posted Nov 11, 2022 16:32 UTC (Fri)
by corbet (editor, #1)
[Link]
Posted Nov 12, 2022 21:52 UTC (Sat)
by vetse (subscriber, #143022)
[Link] (3 responses)
Posted Nov 14, 2022 10:53 UTC (Mon)
by kleptog (subscriber, #1183)
[Link] (2 responses)
So the likely end result was that big companies with lots of money get to make new models on lots of data, while start-ups, researchers and students who are working on the next generation of technologies in this area are stymied by possible lawsuits. This was deemed undesirable.
The chosen solution is to allow model training on any publicly available data for research and training purposes. And organisations that publish online can opt-out (in a machine readable fashion) from being used in machine learning. It doesn't say anything about the copyright status of the output of the models.
Of course, it's only a directive, so you're relying on the member states to properly implement this. But it's better than it was.
Ref: https://eur-lex.europa.eu/eli/dir/2019/790/oj
Posted Nov 18, 2022 15:44 UTC (Fri)
by nim-nim (subscriber, #34454)
[Link] (1 responses)
It is *sooo* refreshing to live in a legal system where past mistakes are not set in stone.
Posted Nov 18, 2022 19:50 UTC (Fri)
by Wol (subscriber, #4433)
[Link]
Common Law is - certainly in its origin - just people asking judges to settle disputes. It just solidifies to stone as in "this is what seems right".
And then if it seems appropriate Parliament can come along, pass Statute Law, and toss the whole Common Law structure into the bin.
Although if the Judges think it unfair they can gut the Statute - it does happen ...
Cheers,
Posted Nov 14, 2022 10:23 UTC (Mon)
by LtWorf (subscriber, #124958)
[Link]
> I am so glad to live in Europe, where the legislation is way ahead of the US on this matter and makes clear that such a lawsuit is absolutely bogus and nonsensical.
I disagree.
The thing that was ruled ok was not for generating, and certainly not for verbatim copy paste. This is different and requires a separate ruling.
> The demand for money makes it even more obvious this is a malicious effort
There has to be a demand for money, or they would be saying there was no harm done and microsoft should continue to do whatever it's doing. But the fact that there is a great number of people complaining about this online tells us that they are feeling wronged… and a court might decide just how wronged they all are.
Posted Nov 10, 2022 16:50 UTC (Thu)
by bfields (subscriber, #19510)
[Link] (8 responses)
I don't love this framing.
It seems to me there are two potential goals here: compensating victims and making sure crime doesn't pay. In cases where the damage was a small amount multiplied by a large number of people, the latter may be more important.
I mean, if someone scams me out of $5, it's rarely going to be worth my while to pursue them for the $5, but if someone else wants to, I'd happily throw that $5 into a pot of money for them to use. And when I've gotten one of those mailings, that's been my attitude--"hey, doesn't sound like it's worth even figuring out whether this applies to me, but it's probably good someone went after them."
In a case where something costs a lot of people a little money each, what should we do? No doubt the system could be more efficient, but I don't see how you're going to completely avoid the expense of sorting out exactly what happened and who's at fault, all to reasonable degree of certainty--something that could inherently need a lot of legwork by a lot of people, some with pretty specialized skills. And sometimes, in the end, you'll fail to make the case, so you'll have to build that into your costs too.
We could let that kind of thing slide. I think that could have long-term consequences we wouldn't like.
We could make it a government function and pay for it with our taxes. Obviously, that's already what we do in some cases.
Or we can have a system like this that allows law firms to take on the risk and expense in exchange for taking a cut when they win.
I honestly don't know what's best, but I don't think the latter is obviously "abusive" just because the compensation to individual victims is sometimes trivial.
Anyway, sorry, that's all a bit of a derail, as I'm not really convinced of the merits of this particular case. (Maybe I haven't thought it through.)
Posted Nov 10, 2022 17:55 UTC (Thu)
by hkario (subscriber, #94864)
[Link] (5 responses)
Posted Nov 10, 2022 20:04 UTC (Thu)
by bfields (subscriber, #19510)
[Link] (4 responses)
That's OK, it doesn't necessarily have to be a threat to the corporation's existence to be useful, it just has to raise the expected cost of the undesirable behavior sufficiently.
Posted Nov 11, 2022 11:20 UTC (Fri)
by hkario (subscriber, #94864)
[Link] (3 responses)
Posted Nov 11, 2022 12:16 UTC (Fri)
by Wol (subscriber, #4433)
[Link] (2 responses)
Unfortunately, we are completely useless at punishing the secretary for the misdeeds of the company. If the Secretary knows they can (and WILL) be fined or imprisoned personally for company misdeeds, any company that starts "playing dirty" will soon find itself unable to recruit a secretary, and subject to intense scrutiny or being broken up.
Cheers,
Posted Nov 11, 2022 15:12 UTC (Fri)
by KJ7RRV (subscriber, #153595)
[Link] (1 responses)
Aren't most types of companies limited liability entities?
Posted Nov 11, 2022 15:27 UTC (Fri)
by Wol (subscriber, #4433)
[Link]
I think I know what you're getting at - the Board of Directors are (allegedly) protected against being responsible for things going wrong on their watch. But if nobody can be held legally responsible for the company breaking the law (say for example ignoring health and safety, and people getting killed), then things can turn nasty, and they regularly do. Usually on a smaller scale than that, admittedly.
But the post of Company Secretary (a *mandatory* post - which is why you'll find that most organisations technically are not allowed to function without a Secretary) is mandated by law to be the official record keeper and legal advisor. As such they can be held personally liable for any wrongdoing on their watch they should have known about, or were told about. And (for companies over a certain size, the last figure I knew was £3M) they have to be legally qualified in some way.
If we started to make company secretaries realise this was actually a serious role, with serious liabilities, the standard of corporate governance would probably rise pretty quickly!
Cheers,
Posted Nov 10, 2022 18:57 UTC (Thu)
by unBrice (subscriber, #72229)
[Link]
Neither do I, but I thought you may be interested in hearing about a third alternative to state-owned agencies and predatory law firms. In France class actions are restricted to non profit organizations that have to go through a vetting process (eg there are 15 of such vetted non-profits defending customers). Additionally they are only allowed to sue for specific offenses (including discrimination and infringements to consumer protection laws). I suspect the way damage compensation works is also very different but I am not knowledgeable on that.
Posted Nov 18, 2022 15:53 UTC (Fri)
by nim-nim (subscriber, #34454)
[Link]
It would be more honest to scrap the compensation altogether but lawmakers are reluctant to admit they don’t trust themselves to prosecute big money fairly.
Posted Nov 10, 2022 16:56 UTC (Thu)
by Wol (subscriber, #4433)
[Link] (1 responses)
They're excluding all the foreign contributors to github ... they're excluding all the contributors who didn't upload their own code ... surely they should exclude all the alleged violators (aka Copilot users) who are not subject to US law?
Smiles :-)
Cheers,
Posted Dec 10, 2022 23:53 UTC (Sat)
by sammythesnake (guest, #17693)
[Link]
(Based on my attempt to read the minds of the plaintiffs and with no statement of whether I'd agree)
Posted Nov 10, 2022 17:47 UTC (Thu)
by NYKevin (subscriber, #129325)
[Link]
It must be emphasized that, at least for the purposes of this lawsuit, that doesn't matter. Courts will look at the specific examples you put before them. If you put a bad example before the court, you're (maybe) going to get a bad outcome. Nothing in the US copyright law really allows you to make this sort of "well, there's infringement in here somewhere" argument, except perhaps for contributory copyright infringement (i.e. "what they sued Napster over"). But you probably can't win that one either, because of the Betamax decision (Copilot has substantial non-infringing uses, since some of its output will not be similar to any of its training examples - and if you don't have substantial similarity, you don't have infringement under US law).
That said, the isEven example looks awfully concerning to me, and even if this particular case goes nowhere, Microsoft and GitHub need to get a lot better at eliminating such outputs.
Posted Nov 10, 2022 18:09 UTC (Thu)
by flussence (guest, #85566)
[Link] (2 responses)
Posted Nov 10, 2022 21:35 UTC (Thu)
by mpr22 (subscriber, #60784)
[Link] (1 responses)
Posted Nov 13, 2022 9:01 UTC (Sun)
by NYKevin (subscriber, #129325)
[Link]
In other words: The court is not going to make some sweeping statement about AI code. The court is going to ignore the AI, look at the examples given, and decide those examples are infringing. Then the court is going to write a clear warning that the ruling might not generalize to other AI code. We're going to end up with a massive gray area and years of legal uncertainty. Various tech companies will then add it to their long list of "parts of US copyright law that we want Congress to fix."
Posted Nov 10, 2022 18:29 UTC (Thu)
by magnus (subscriber, #34778)
[Link] (2 responses)
Posted Nov 10, 2022 18:45 UTC (Thu)
by NYKevin (subscriber, #129325)
[Link] (1 responses)
I don't know as much about the text models. You can't download any of those (yet), but there might be a blog post or something where one of these companies brags about the size of their model.
Posted Nov 16, 2022 6:20 UTC (Wed)
by jezuch (subscriber, #52988)
[Link]
> you would have to compress each image file down to a few bytes
"Mona Lisa" is just a few bytes and is enough to reproduce the original :)
> but there might be a blog post or something where one of these companies brags about the size of their model.
I don't know the details, but I hear it's on the range of billions of neurons.
Posted Nov 10, 2022 18:35 UTC (Thu)
by nickodell (subscriber, #125165)
[Link] (17 responses)
At the same time, I don't think this is a stupid lawsuit. I don't think it particularly has much to do with free software, either. Free software developers are just the first group ornery enough to file a lawsuit about it. The central issue is, "Is it fair use to train an AI on copyrighted data?" No US court has answered this question. This lack of clarity causes unequal pay.
For example, consider how large and small copyright holders get treated. DALLE-2 is trained on images from the internet, regardless of copyright status. OpenAI announced that they had a deal to license all of Shutterstock's catalogue of stock photos. Why did they do that, if OpenAI had the legal right to train an AI on those photos either way? Because Shutterstock has lawyers, and if OpenAI went to court and lost, it would set a terrible precedent for them. This is how it will go, in general. Big copyright holders, with a willingness to fund a lawsuit for years, will get compensated. Small copyright holders will get nothing. If there were clarity, these two groups would get treated equally.
In addition to creators getting shafted, this lack of clarity has a chilling effect on AI research too. No business would make GPT-3 essential to their business if it turned out, years down the line, that using GPT-3 obligated them to pay licensing to everybody who had ever written something on the Internet. Until this is solved, investment in AI will be slowed down.
Some commenters point out that the EU allows text/data mining for the purpose of AI. This is true, but that exception applies only for non-commercial use, which Copilot is clearly not.
Posted Nov 10, 2022 19:07 UTC (Thu)
by bluca (subscriber, #118303)
[Link] (3 responses)
Wrong. The exception applies to _everybody_. Non-commercial have additional provisions, such as not being obliged to offer an explicit opt-out (a-la robots.txt). But the copyright exception is exactly the same.
Posted Nov 11, 2022 15:15 UTC (Fri)
by KJ7RRV (subscriber, #153595)
[Link] (2 responses)
Posted Nov 11, 2022 19:59 UTC (Fri)
by bluca (subscriber, #118303)
[Link]
Posted Nov 17, 2022 19:52 UTC (Thu)
by NYKevin (subscriber, #129325)
[Link]
Posted Nov 11, 2022 8:30 UTC (Fri)
by nilsmeyer (guest, #122604)
[Link] (12 responses)
Do they have "Froot" in them? In Germany Almond Milk can't be sold as milk because it doesn't come from a mammal - apparently this would confuse customers or so the judges thought. However there are no punitive damages and the result of the lawsuit is to change the name, not shake down the competition for money.
I'm not sure which system is better. For example, over here, if a company suffers a data breach due to negligence the government can fine them and keep the money, but the people affected won't see a cent of the money. If they sue they have to prove the monetary damages, and you don't get compensated for the time lost talking to lawyers or sitting in court.
Posted Nov 11, 2022 12:04 UTC (Fri)
by Wol (subscriber, #4433)
[Link]
Fortunately, when they tried to ban Plum Duff, sanity prevailed. The word "plum" (in this context) does not refer to a fruit. I really hope they don't decide to rename mincemeat!
Cheers,
Posted Nov 11, 2022 15:07 UTC (Fri)
by nickodell (subscriber, #125165)
[Link]
Incidentally, there is also a push by dairy farmers to get the FDA to define milk as cow's milk. https://www.wired.com/story/the-fda-may-nix-the-word-milk...
Posted Nov 14, 2022 12:06 UTC (Mon)
by LtWorf (subscriber, #124958)
[Link] (9 responses)
That's because vegans keep killing their children and then say "well I gave milk"
https://www.bbc.com/news/world-europe-40274493
https://www.newsweek.com/parents-convicted-feeding-baby-v...
Posted Nov 14, 2022 14:50 UTC (Mon)
by kleptog (subscriber, #1183)
[Link] (2 responses)
Makes no difference though. Feeding infants exclusively cow's milk is also bad. There's a reason baby formula exists.
You can't fix stupid though. We live in an age where almost anything you'd want to know is at your fingertips, and stupid things still happen.
Posted Nov 14, 2022 22:24 UTC (Mon)
by Wol (subscriber, #4433)
[Link] (1 responses)
Picking a medical rather than social example, why do people suffer from Sickle Cell? In most of the world, it's a pure handicap. But in much of the tropics, where malaria is endemic, people who suffer from mild Sickle Cell have a distinct survival advantage.
And in a world where "do something" usually trumps "do nothing", confirmation bias means you do that something a lot faster ...
Cheers,
Posted Nov 16, 2022 21:29 UTC (Wed)
by LtWorf (subscriber, #124958)
[Link]
Killing children isn't clever. Stop it please.
Posted Nov 15, 2022 10:39 UTC (Tue)
by paulj (subscriber, #341)
[Link] (5 responses)
Baby mammals need mammal milk, ideally from a healthy, well-fed mother of their own species.
Posted Nov 15, 2022 15:47 UTC (Tue)
by Wol (subscriber, #4433)
[Link] (4 responses)
They also self-diagnosed the baby as lactose-intolerant. Stupid thing to do, but ...
If you think someone else is being stupid, how can you be sure it's not your own ignorance or stupidity that leads you to that conclusion? The parents' actions make perfect sense inside their own world and belief system. Sounds to me like they somehow slipped through the ante-natal safety net ...
Cheers,
Posted Nov 16, 2022 10:44 UTC (Wed)
by paulj (subscriber, #341)
[Link] (3 responses)
Posted Nov 16, 2022 14:10 UTC (Wed)
by Wol (subscriber, #4433)
[Link] (2 responses)
(I meant to say post-natal, but either way that is the sort of thing the mother-to-be should have been taught.)
Cheers,
Posted Nov 17, 2022 10:23 UTC (Thu)
by paulj (subscriber, #341)
[Link] (1 responses)
Posted Nov 17, 2022 14:16 UTC (Thu)
by corbet (editor, #1)
[Link]
Posted Nov 11, 2022 9:50 UTC (Fri)
by vegard (subscriber, #52330)
[Link] (12 responses)
They should have trained different models corresponding to different levels of license compatibility; that way, you could ensure that each model only produces code that can be reasonably said to fall under a specific license.
I would also argue that specific models are just a different representation of what is fundamentally the same code. If you encoded the whole Linux kernel source code in a PNG (losslessly) that does not fundamentally change its license or how it can be used.
(The above is all my personal views/opinion.)
Posted Nov 11, 2022 10:09 UTC (Fri)
by bluca (subscriber, #118303)
[Link] (11 responses)
Because the original license is irrelevant, as the model is not built using sources distributed under their licenses, but under specific exceptions in copyright law that allow text and data mining - see the recent EU directive on copyright. In the US is murkier because there's no corresponding exception yet, so it's all done under fair use exception, which is a pain as it has to be defended in court every time. Hopefully the legislators over there will catch up soon and fix this.
Posted Nov 11, 2022 10:17 UTC (Fri)
by bluca (subscriber, #118303)
[Link] (5 responses)
Posted Nov 11, 2022 10:49 UTC (Fri)
by farnz (subscriber, #17727)
[Link] (4 responses)
I think you're reading the exception in EU law over-broadly, and it doesn't say quite what you're claiming.
The exception for text and data mining says that you are not infringing copyright solely by virtue of feeding material into your machine learning system, and that the resulting model is not itself automatically a derived work of the inputs. It does not say that the output of your machine learning system cannot infringe copyright. As far as I can tell, the intent behind this exception is to allow your system to engage in the sorts of copying that we already say are outside the scope of copyright when a human does it - for example, having read about io_uring, a human might build their next system around a submission and completion queue pair, and this is not a copy in the sense of copyright law.
This means that a court could rule legitimately that a given output from the system is sufficient of a copy to be a copyright infringement by the system, and that the use of the system is thus contributory infringement whenever it produces a literal copy of a protected part of its input.
This, in turn, would bring use of systems like GitHub Copilot into line with employing a human to do the same job: if, as a result of a prompt, I write out precisely the code that a previous employer used (complete with comments - whether I copied it from a notebook, or kept a copy of a past employer's source code), then that is copyright infringement. If, on the other hand, I write code that's similar in structure simply because there are only a few ways to loop over all items in a container, that's not copyright infringement.
Assuming the US courts apply this sort of reasoning, then the question before them is whether a human writing the same code with the same prompting would be infringing copyright or not - if you substitute "a Microsoft employee wrote this for a Microsoft customer to use" for "GitHub Copilot wrote this for a Microsoft customer to use", do you still see infringement or not?
Posted Nov 11, 2022 12:10 UTC (Fri)
by Wol (subscriber, #4433)
[Link]
So the exception doesn't cover you for using the result ...
Cheers,
Posted Nov 11, 2022 14:49 UTC (Fri)
by bluca (subscriber, #118303)
[Link] (2 responses)
Exactly, and there are many commentators that completely miss this, and assume the opposite, hence the need to clarify it.
> It does not say that the output of your machine learning system cannot infringe copyright.
Sure, but it implies that it does not automatically does so either. Then the onus is on the complainers to show that, first of all, these artificially induced snippets are copyrightable in the first place, and after that to show that intentionally inducing the tool to reproduce them, which requires knowing in advance what they look like and what keywords and surrounding setting to prepare in order to achieve that result, means that it's the tool that is at fault rather than the user.
Posted Nov 11, 2022 15:25 UTC (Fri)
by KJ7RRV (subscriber, #153595)
[Link]
"> _____ _____ _____ _____ and _____ _____ _____ _____ you _____ _____ _____ _____ solely _____ _____ _____ _____ material _____ _____ _____ _____ system, _____ _____ _____ _____ model _____ _____ _____ _____ a _____ _____ _____ _____ inputs. _____ _____ _____ _____ many _____ _____ _____ _____ this, _____ _____ _____ _____ hence _____ _____ _____ _____ it. _____ _____ _____ _____ say _____ _____ _____ _____ your _____ _____ _____ _____ infringe _____ _____ _____ _____ implies _____ _____ _____ _____ automatically _____ _____ _____ _____ the _____ _____ _____ _____ complainers _____ _____ _____ _____ of _____ _____ _____ _____ snippets _____ _____ _____ _____ first _____ _____ _____ _____ to _____ _____ _____ _____ the _____ _____ _____ _____ which _____ _____ _____ _____ what _____ _____ _____ _____ what _____ _____ _____ _____ to _____ _____ _____ _____ achieve _____ _____ _____ _____ it's _____ _____ _____ _____ at _____ _____ _____ _____ user."
Posted Nov 11, 2022 15:37 UTC (Fri)
by farnz (subscriber, #17727)
[Link]
You're not actually clarifying, unfortunately. The effect of the EU Copyright Directive is not to say that the model and its training process are guaranteed not to infringe copyright; rather, it's to say that the mere fact that copyrighted material was used as input to the training process does not imply that the training process or resulting model infringe, in and of itself.
And you're asking more of the complainers than EU law does. Under EU law, the complainers first have to show that there is a copyrightable interest in the output (which you do get right), and after that, they only have to show that the tool's output infringes that copyright. In particular, the tool is at fault if the material is infringing and it produces it from a non-infringing prompt - even if the prompt has to be carefully crafted to cause the tool to produce the infringing material.
As an example, let's use the prompt:
This is not infringing in most jurisdictions - there's nothing in there that is copyrightable at this point, as all the names are either descriptive, or completely non-descript. If a machine learning model then goes on to output the Quake III Arena implementation of Q_rsqrt from this prompt, complete with the comments (including the commented out "this can be removed" line), then there's infringement by the tool, and if it can be demonstrated that the only place the tool got the code from was its training set, the tool provider is likely to be found to be a contributory infringer.
It doesn't matter that I've set the tool up with a troublesome prompt here (that's the first 5 lines of Q_rsqrt, just renamed to Q_rsqrt); I haven't infringed, and thus the infringement is a result of the tool's training data being copied verbatim into its output.
This is, FWIW, exactly the same test that would apply if I gave that prompt to a human who'd seen Quake III Arena's source code, and they infringed by copying the Quake III Arena implementation - I would not be able to prove infringement just because the human had seen the original source, but I would be able to do so if, given the prompt, they produced a literal copy Q_rsqrt.
Posted Nov 11, 2022 11:09 UTC (Fri)
by anselm (subscriber, #2796)
[Link] (1 responses)
What the EU says about data mining is irrelevant to this case because Github as an entity is not based in the EU.
What is more pertinent is that the terms and conditions of GitHub stipulate that if you upload stuff to Github, you license Github to use said stuff to “improve the service”, including indexing etc. Since Copilot is basically a fancy search engine for Github-hosted code, it would not be unreasonable for Microsoft's lawyers to argue that their use of code already on Github to train Copilot is covered by the site's existing T&C's, so they don't even need to make a fair-use argument. This would be completely independent of the licenses in the various projects on Github, which govern the use and disposition of Github-hosted code by third parties.
Having said that, the question of whether Copilot's output can infringe on the copyright of its inputs is a separate (and difficult) issue, which should probably be investigated in the wider context of ML applications that, e.g., paint pictures or generate prose. It is obvious that much of the code Copilot deals with is boilerplate which is not sufficiently original to qualify for copyright protection in the first place, but then again, the fact that Copilot can be coaxed into producing swathes of demonstrably copyrighted code without the correct attribution or license grant should not be overlooked, either.
(Personally I think that there are more efficient methods to deal with boilerplate, and the amount of due diligence required on the part of Copilot users to ensure that any nontrivial stuff Copilot may regurgitate is not violating any copyrights, let alone fit for purpose, negates the advantage of using Copilot to begin with, but your mileage may vary.)
Posted Nov 11, 2022 14:54 UTC (Fri)
by bluca (subscriber, #118303)
[Link]
I live and use it in the EU, so it is very relevant to me ;-) Also the EU is pretty much the world's regulatory superpower, so what it says on this matters a lot.
Posted Nov 11, 2022 14:56 UTC (Fri)
by kleptog (subscriber, #1183)
[Link] (2 responses)
Indeed, it's concerning for me that this is a case that potentially could set wide-reaching precedent. I know judges creating law is a feature of Common Law systems, but it feels like some of the issues here should be decided by the legislature, not the judiciary.
Posted Nov 11, 2022 15:14 UTC (Fri)
by Wol (subscriber, #4433)
[Link] (1 responses)
Getting into politics I know, but it's why I feel the gutting of the House Of Lords about 20 years ago was a travesty of good government. The UK's effective Supreme Court was a (pretty ineffective) branch of the legislature. And as such it did a damn good job of making sure the law actually worked.
Career politicians as a whole seem pretty malevolent - by design or accident - and having a decent body of people who wielded power by accident and no design of their own seemed a very good counterbalance. For pretty much a century the House of Lords did a damn good job of making sure legislation was fair, just, and worked. (For some definition of "just", I know. The Lords are people of their time, with the prejudices of their time, but are far less likely to be swayed by populist demagogues.)
The problem the US suffers in particular at the moment, and we're going down the same route, is we seem to have "government by barrels of ink" ...
Cheers.
Posted Nov 11, 2022 19:39 UTC (Fri)
by bluca (subscriber, #118303)
[Link]
Posted Nov 11, 2022 11:27 UTC (Fri)
by ceplm (subscriber, #41334)
[Link] (2 responses)
That is a quite uncool things to say. Yes, of course, payments to the class members are usually just a pittance, but that is not the point of the class actions lawsuits. The main point of class actions is on the side of the balance sheets: what the defendant has to pay for the lost suit, and that class actions lawsuit are by far more powerful tool for product liability and consumer protection than any government action (comparing to for example Europe). Do you remember Ford Pinto, Chevrolet Corvair and other cars which resulted in class action lawsuits (and long list of other industries changed for better by class actions lawsuits could follow)? Yeah, nobody would dare to make such cars any more, because of those.
Concerning the payments made to the lawyers involved in class actions. Do you know how much these litigations usually cost? Do you know how many law firms went into bankruptcy because of class action lawsuits?
I don’t want this thread to fall into political discussion (so I probably won’t follow on any replies to this comment), and I easily admit that the system is often abused, but I would like it to stand here, that it is one of strongest tools “normal people” have against corporate interests and you would be sorry to loose it. We, who don’t have it (I live in Europe) are now struggling to create something similar (https://ec.europa.eu/commission/presscorner/detail/en/sta...).
Posted Nov 11, 2022 12:19 UTC (Fri)
by Wol (subscriber, #4433)
[Link] (1 responses)
Are you sure? In the UK we have Trading Standards, who's bite is pretty bad. The trouble is, like so many such Quangos, they are seen by the Treasury as a dead cost, and as such are badly hamstrung by inadequate budgets. If their budget bore any relationship to their value, they would rapidly rise ...
Cheers,
Posted Nov 12, 2022 8:10 UTC (Sat)
by ceplm (subscriber, #41334)
[Link]
Posted Nov 14, 2022 9:20 UTC (Mon)
by LtWorf (subscriber, #124958)
[Link] (1 responses)
Since when? I thought it had always been perfectly fine to sell free software, since its inception.
Posted Nov 14, 2022 9:59 UTC (Mon)
by excors (subscriber, #95769)
[Link]
I don't think it's saying you shouldn't try to make money from developing or distributing free software, it's saying you shouldn't try to make money from free software licence enforcement. The goal of enforcement should be compliance, and financial penalties are merely a mechanism for encouraging compliance and for funding further enforcement.
Posted Nov 17, 2022 10:41 UTC (Thu)
by anton (subscriber, #25547)
[Link]
Yes, it would be better if Copilot was free software itself, but that's a different issue.
Posted Nov 17, 2022 13:25 UTC (Thu)
by esemwy (guest, #83963)
[Link]
I mean, after all, it’s not copying anything….
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Wol
Class action against GitHub Copilot
Class action against GitHub Copilot
Wol
Class action against GitHub Copilot
Class action against GitHub Copilot
If instead Copilot would produce only GPL code, this would make FOSS advocates happy and corporate (including open source) shills unhappy. That's win-win situation for me.
Class action against GitHub Copilot
Wol
Class action against GitHub Copilot
Also again: you don't have to agree with me and instead make your own top comment - which license should Copilot use? How about proprietary with Commercial Use Only clause?
Class action against GitHub Copilot
> otherwise) that contradict the conditions of this License, they do not
> excuse you from the conditions of this License. If you cannot convey a
> covered work so as to satisfy simultaneously your obligations under this
> License and any other pertinent obligations, then as a consequence you may
> not convey it at all. For example, if you agree to terms that obligate you
> to collect a royalty for further conveying from those to whom you convey
> the Program, the only way you could satisfy both those terms and this
> License would be to refrain entirely from conveying the Program.
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Wol
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Wol
Class action against GitHub Copilot
That very analysis you link to states the opposite: it is not possible to relicense code to GPL. It is possible to include MIT code with GPL code, but this does not have the effect of changing the MIT license to something else. The original copyright, and the original permission to distribute under MIT terms (not GPL) continues to exist.
Whether any of that applies to the Copilot case is another matter. I am inclined to think that if the operation of Copilot is deemed to be permissible because it constitutes the training of an automated system, then its output is neither a compilation nor an original creative work and therefore is not subject to copyright at all. Thus the question of license becomes moot unless and until the output is re-worked as part of a larger whole that is copyrightable and could be licensed (or challenged) at that time.
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Wol
Class action against GitHub Copilot
* With the MIT license, B may optionally give a different license to C (or, in principle, the same license, but that would be kind of pointless). This is called a "sublicense." It does not extinguish the original license, so C can decide which license to comply with. However, as a general rule in most jurisdictions, the sublicense cannot be broader than the rights that B already enjoys under the MIT license in the first place - you can't give what you don't have.
* In the context where the sublicense is the GPL, C is still required to comply with the MIT license's formalities. Obviously, B is also required to comply with those formalities. In practice, this is usually done simply by copying the copyright and license into a comment at the top of the file, but there are other ways of complying. If you're distributing binaries, you might need to take additional measures to ensure a copy of the license is still visible somewhere.
* Under the GPL, sublicensing is forbidden - C can only get the license directly from A. A and C never need to communicate in order to do this. The license, as mentioned above, springs into being automatically as soon as C has a copy of the software, regardless of how A or B may feel about that. I'm pretty sure the FSF did this to prevent B from placing "additional restrictions" on the sublicense.
* In principle, we could imagine a license which does not have this automatic property, where all rights have to be conveyed by explicit sublicensing (sort of the inverse of how the GPL works). This is inconvenient, so nobody does it in the FOSS world, but I imagine it's more common in environments where permissions are more closely guarded.
* We could also imagine a license which terminates as soon as you accept a sublicense from somebody else. This would make sublicensing have the effect of extinguishing the original license, and allow intermediaries to restrict downstream users' rights via clickwrap licenses. Again, this is unheard of in the FOSS world, but to my understanding it is not unlawful. I suppose you could call that operation "relicensing," but in practice, I don't think this is really a thing that anyone does.
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
isEven()
function from the complaint:
If anyone writes an isEven()
, chances are it looks like a lot of other isEven()
functions out there.
But this one is exactly the same as a textbook example, which uses recursion for every input except 0
or 1
. It even includes the test code from the book, including the // -> ??
exercise comment.
Which is also what a lot of people are worried about: that this will be used to blatantly violate copyrights by claiming "It was written by an AI, which means it's the product of fair use."Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
My smartphone camera can also *potentially" output copyright-infringing pictures. My mp3 player *potentially* plays copyrighted songs. And so on - these are tools, and their main intended and common use matters, and for alleged open source supporters to side with copyright maximalists and trolls looking for a quick payday is missing the point so much that it's not even fun anymore.
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Wol
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
> *plonk*
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
No, that is what I ask.
For all we know it can be in the petabyte size. The model could just return table of indices to a gigantic array of strings.
Class action against GitHub Copilot
Class action against GitHub Copilot
AI would post-process the snippet to adapt it to the context.
in so far that it is obfuscating the first stage.
Class action against GitHub Copilot
Class action against GitHub Copilot
Using someone else's boilerplate code is still a copyright violation.
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
"(2) ‘text and data mining’ means any automated analytical technique aimed at analysing text and data in digital form in order to generate information which includes but is not limited to patterns, trends and correlations"
Class action against GitHub Copilot
Wol
Class action against GitHub Copilot
"(2) ‘text and data mining’ means any automated analytical technique aimed at analysing text and data in digital form in order to generate information which includes but is not limited to patterns, trends and correlations"
> Does this cover the output of source code? Maybe, but not obviously.
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Wol
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
2) [citation needed]
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Speaking of civility, I think that this branch of the conversation has gone far enough - and beyond. Can we retire it here please?
Civility
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Wol
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
2. getting $5 in "compensation" is a passive-aggressive corporate version of a non-apology: "sorry if you feel offended"
1. because the total amount is hardly large for a typical corporation that's a target of a class action (a $5 fine for doing something is not a fine, it's just the cost of doing something).
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Wol
Company secretaries are personally liable?
Company secretaries are personally liable?
Wol
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Wol
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
A good class action?
Have you ever read the blog Lowering the Bar? He writes about the really stupid ones. My favorite is the multiple class action suits alleging that Froot Loops have no fruit in them.
A good class action?
A good class action?
A good class action?
A good class action?
A good class action?
A good class action?
Wol
A good class action?
A good class action?
A good class action?
A good class action?
Wol
A good class action?
A good class action?
A good class action?
Wol
A good class action?
A good class action?
Wol
A good class action?
...and all of this has been pretty far off-topic for some time now. Perhaps we could conclude this sub-thread here?
A good class action?
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
The text and data mining exceptions work in a similar way, but instead of getting permission from the author, permission is given by the law, which trumps any original license, and the author's wishes too. The only right you have is to ask commercial entities for an opt out. Non-profits don't even have to provide an opt-out. The only restrictions AI builders have is that the source must be publicly available (ie: if someone breaks in a computer system and steals sources or data, it is not allowed to be used for this or any other purposes).
Class action against GitHub Copilot
Class action against GitHub Copilot
Wol
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
float rsqrt(float number) {
long i;
float x2, y;
const float threehalfs = 1.5F;
x2 = number * 0.5F;
y = number;
Class action against GitHub Copilot
Because the original license is irrelevant, as the model is not built using sources distributed under their licenses, but under specific exceptions in copyright law that allow text and data mining - see the recent EU directive on copyright. In the US is murkier because there's no corresponding exception yet, so it's all done under fair use exception, which is a pain as it has to be defended in court every time.
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Wol
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Wol
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
The vacuuming of a massive amount of free software into the proprietary Copilot system has created a fair amount of discomfort in the community. It does, in a way, seem like a violation of the spirit of what we are trying to do.
Actually, what Copilot does would be very much in at least my view of the spirit of free software, if it conformed with the license conditions (i.e., their model of GPLv3-compatible software would spit out code under GPLv3, their model of CDDL-compatible software would spit out code under CDDL, their model of permissive code would spit out code under a permissive license, etc.). I am publishing my software as free software because I want others to be able to use the four freedoms, including freedom 1: To study how the code works, and change it to make it do what they wish. Copilot could be a helpful tool for that.
Class action against GitHub Copilot