Class action against GitHub Copilot
Class action against GitHub Copilot
Posted Nov 10, 2022 20:41 UTC (Thu) by MarcB (guest, #101804)In reply to: Class action against GitHub Copilot by bluca
Parent article: Class action against GitHub Copilot
There are examples of to happening, so it is obviously not fabricated. It might not be an issue for the users of Copilot, because most likely the risk of developers manually copying misattributed/unattributed code from the internet is much higher, but it certainly is an issue for Microsoft.
Even if the code generated by Copilot is not a verbatim copy of the input, it is clear, that an automated transformation is not enough to free code from its original copyright. The questions then would be, how it could be shown that the AI did create the output "on its own" and who carries the burden of this proof (the plaintiff would obviously unable to do so, because they cannot access the model).
In any case, my main point was that the directives exemptions are insufficient to declare such a lawsuit nonsensical in the EU. The directive uses the following definition:
"(2) ‘text and data mining’ means any automated analytical technique aimed at analysing text and data in digital form in order to generate information which includes but is not limited to patterns, trends and correlations"
Does this cover the output of source code? Maybe, but not obviously.
Posted Nov 10, 2022 21:02 UTC (Thu)
by Wol (subscriber, #4433)
[Link]
I think it's clear - if the plaintiff can show that the Copilot code is identical to their own, and the defendant (Copilot) had access to their code, then it's up to Copilot to prove it's not a copy.
There's also the question of "who has access to the evidence" - if you possess evidence (or should possess evidence) and fail to produce it, you cannot challenge your opponents claims over it.
So yes it is a *major* headache for Microsoft.
Oh - and as for the guy who thought "everything should be licenced GPL" - there is ABSOLUTELY NO WAY Microsoft will do that. Just ask AT&T what happened when they stuck copyright notices on Unix ...
Cheers,
Posted Nov 10, 2022 21:16 UTC (Thu)
by bluca (subscriber, #118303)
[Link] (9 responses)
Of course it's fabricated, complainers go out of their way to get the tool to spit out what they were looking for and then go "ah-ha!", for clickbait effect, as if it meant something. Just like using one VHS with a copied movie does not mean that the VHS company is responsible for movie piracy. Or just like if google returns a search result with a torrent link for a music track it doesn't mean google is responsible for music piracy, and so on.
> In any case, my main point was that the directives exemptions are insufficient to declare such a lawsuit nonsensical in the EU. The directive uses the following definition:
Of course it covers it, that's exactly what copilot is used for: fills in patterns (boilerplate). Have you every actually used it?
Posted Nov 11, 2022 12:22 UTC (Fri)
by gspr (guest, #91542)
[Link] (8 responses)
does not imply
> it's fabricated
or
> for clickbait effect
> Just like using one VHS with a copied movie does not mean that the VHS company is responsible for movie piracy.
If playing back a new blank VHS tape in a particular way resulted in a blurry copy of said movie, then yeah, it perhaps it would.
> Or just like if google returns a search result with a torrent link for a music track it doesn't mean google is responsible for music piracy, and so on.
I don't see how this is even comparable.
> Of course it covers it, that's exactly what copilot is used for: fills in patterns (boilerplate). Have you every actually used it?
I'm not sure it matters what it's used for by you and your peers, if it comes with an out-of-the-box ability to also do the other things. Again: this is *not* the same as "a disk drive can be used for piracy" – the difference is that Copilot already (possibly, that's the debate) contains within it the necessary information to produce the infringing material.
Posted Nov 11, 2022 13:57 UTC (Fri)
by farnz (subscriber, #17727)
[Link] (2 responses)
To choose an example at one extreme, A&M Records, Inc. v. Napster, Inc. established that while there were non-infringing uses of Napster, Napster's awareness that there were infringing uses of their technology product was enough to establish liability.
And it's worth noting in this context that Napster on its own was not infringing copyright - to infringe copyright, you needed two Napster users to actively make a decision to infringe: one to make the content available, and one to request a copy of infringing content. In other words, one user had to prompt Napster to spit out what they were looking for, and even then it wouldn't do that unless another user had unlawfully supplied that content to their local copy of Napster. In contrast, if Copilot's output infringes, it only needs the prompting user to make it infringe - which doesn't bode well for Microsoft if the court determines that Copilot's output is an infringement.
Posted Nov 11, 2022 14:39 UTC (Fri)
by bluca (subscriber, #118303)
[Link] (1 responses)
Posted Nov 11, 2022 15:07 UTC (Fri)
by farnz (subscriber, #17727)
[Link]
That's a misrepresentation both of the Napster case (where the court deemed that the user's right to ingest copyrighted materials into the system was irrelevant), and of the EU Copyright Directive, which merely says that ingesting publicly available material into your system is not copyright infringement on its own, and that the fact of such ingestion does not make the model infringing. This does not preclude a finding of infringement by the model or its output - it simply means that to prove infringement you can't rely on the training data including your copyrighted material, but instead have to show that the output is infringing.
Posted Nov 11, 2022 14:04 UTC (Fri)
by Wol (subscriber, #4433)
[Link] (4 responses)
So you think that the sale of knives, hammers, screwdrivers etc should be banned? Because they come with an out-of-the-box ability to be used for murder. Come to that, maybe banning cars would be a very good idea, along with electricity, because they're big killers.
It's not the USE that matters. All tools have the *ability* to be mis-used, sometimes seriously. Ban cameras - they take porn pictures. But if the PRIMARY use is ABuse, that's when the law should step in. Everything else has to rely on the courts and common sense.
In the UK, carrying offensive weapons in public is illegal. Yet many of my friends - quite legally - carry very sharp knives. Because they're "tools of the trade" for chef'ing.
Cheers,
Posted Nov 11, 2022 14:19 UTC (Fri)
by gspr (guest, #91542)
[Link] (3 responses)
A pen won't reproduce a copyrighted text without a human inputting missing data, even though it of course can he used to reproduce such a text with human assistance. Copilot, on the other hand, can (maybe!)
Posted Nov 11, 2022 14:33 UTC (Fri)
by bluca (subscriber, #118303)
[Link] (2 responses)
Posted Nov 11, 2022 14:38 UTC (Fri)
by farnz (subscriber, #17727)
[Link]
Your "hence" does not follow from your first statement.
The law says that the act of ingestion does not itself infringe copyright, nor does the fact of ingestion make the model infringe copyright automatically. It does not, however, says that the model is not subject to the original licence if it is found to be infringing copyright, nor does it say that the output of the model is not contributory infringement.
Posted Nov 11, 2022 14:42 UTC (Fri)
by gspr (guest, #91542)
[Link]
Yeah. But it's not allowed to *reproduce* that copyrighted material in a way incompatible with the original license. On one extreme, ingesting the material to produce, say, the parity of all the bits involved, is clearly not "reproduction" - and so is OK. On the other extreme, ingesting it and storing it perfectly in internal storage and spitting it back out on demand, clearly is "reproduction" - and surely not OK.
As I see it, the whole debate is about where between those extremes Copilot falls.
I'm not claiming to have the right answer. In fact, I don't even think I have _a_ answer. But I object to your sweeping statements about this seemingly being an easy and clear case.
Posted Nov 14, 2022 9:28 UTC (Mon)
by geert (subscriber, #98403)
[Link]
"patterns, trends, and correlations". For code, that would be reporting e.g. that 37% of all code that needs to sort something resort to quicksort, instead of reproducing a perfect copy of the source code of your newly-developed sorting algorithm released under the GPL.
Yeah, the "is not limited to" might be considered a loophole, but I guess anything that doesn't follow the spirit would be tossed out...
Class action against GitHub Copilot
Wol
Class action against GitHub Copilot
"(2) ‘text and data mining’ means any automated analytical technique aimed at analysing text and data in digital form in order to generate information which includes but is not limited to patterns, trends and correlations"
> Does this cover the output of source code? Maybe, but not obviously.
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Wol
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot