Class action against GitHub Copilot

Posted Nov 10, 2022 20:41 UTC (Thu) by MarcB (guest, #101804)
In reply to: Class action against GitHub Copilot by bluca
Parent article: Class action against GitHub Copilot

> Because that's drivel. It is not how this works in the real world, it's completely fabricated clickbait.

There are examples of to happening, so it is obviously not fabricated. It might not be an issue for the users of Copilot, because most likely the risk of developers manually copying misattributed/unattributed code from the internet is much higher, but it certainly is an issue for Microsoft.

Even if the code generated by Copilot is not a verbatim copy of the input, it is clear, that an automated transformation is not enough to free code from its original copyright. The questions then would be, how it could be shown that the AI did create the output "on its own" and who carries the burden of this proof (the plaintiff would obviously unable to do so, because they cannot access the model).

In any case, my main point was that the directives exemptions are insufficient to declare such a lawsuit nonsensical in the EU. The directive uses the following definition:
"(2) ‘text and data mining’ means any automated analytical technique aimed at analysing text and data in digital form in order to generate information which includes but is not limited to patterns, trends and correlations"

Does this cover the output of source code? Maybe, but not obviously.

Class action against GitHub Copilot

Posted Nov 10, 2022 21:02 UTC (Thu) by Wol (subscriber, #4433) [Link]

> Even if the code generated by Copilot is not a verbatim copy of the input, it is clear, that an automated transformation is not enough to free code from its original copyright. The questions then would be, how it could be shown that the AI did create the output "on its own" and who carries the burden of this proof (the plaintiff would obviously unable to do so, because they cannot access the model).

I think it's clear - if the plaintiff can show that the Copilot code is identical to their own, and the defendant (Copilot) had access to their code, then it's up to Copilot to prove it's not a copy.

There's also the question of "who has access to the evidence" - if you possess evidence (or should possess evidence) and fail to produce it, you cannot challenge your opponents claims over it.

So yes it is a *major* headache for Microsoft.

Oh - and as for the guy who thought "everything should be licenced GPL" - there is ABSOLUTELY NO WAY Microsoft will do that. Just ask AT&T what happened when they stuck copyright notices on Unix ...

Cheers,
Wol

Class action against GitHub Copilot

Posted Nov 10, 2022 21:16 UTC (Thu) by bluca (subscriber, #118303) [Link] (9 responses)

> There are examples of to happening, so it is obviously not fabricated.

Of course it's fabricated, complainers go out of their way to get the tool to spit out what they were looking for and then go "ah-ha!", for clickbait effect, as if it meant something. Just like using one VHS with a copied movie does not mean that the VHS company is responsible for movie piracy. Or just like if google returns a search result with a torrent link for a music track it doesn't mean google is responsible for music piracy, and so on.

> In any case, my main point was that the directives exemptions are insufficient to declare such a lawsuit nonsensical in the EU. The directive uses the following definition:
"(2) ‘text and data mining’ means any automated analytical technique aimed at analysing text and data in digital form in order to generate information which includes but is not limited to patterns, trends and correlations"
> Does this cover the output of source code? Maybe, but not obviously.

Of course it covers it, that's exactly what copilot is used for: fills in patterns (boilerplate). Have you every actually used it?

Class action against GitHub Copilot

Posted Nov 11, 2022 12:22 UTC (Fri) by gspr (guest, #91542) [Link] (8 responses)

> complainers go out of their way to get the tool to spit out what they were looking for and then go "ah-ha!"

does not imply

> it's fabricated

> for clickbait effect

> Just like using one VHS with a copied movie does not mean that the VHS company is responsible for movie piracy.

If playing back a new blank VHS tape in a particular way resulted in a blurry copy of said movie, then yeah, it perhaps it would.

> Or just like if google returns a search result with a torrent link for a music track it doesn't mean google is responsible for music piracy, and so on.

I don't see how this is even comparable.

> Of course it covers it, that's exactly what copilot is used for: fills in patterns (boilerplate). Have you every actually used it?

I'm not sure it matters what it's used for by you and your peers, if it comes with an out-of-the-box ability to also do the other things. Again: this is *not* the same as "a disk drive can be used for piracy" – the difference is that Copilot already (possibly, that's the debate) contains within it the necessary information to produce the infringing material.

Class action against GitHub Copilot

Posted Nov 11, 2022 13:57 UTC (Fri) by farnz (subscriber, #17727) [Link] (2 responses)

To choose an example at one extreme, A&M Records, Inc. v. Napster, Inc. established that while there were non-infringing uses of Napster, Napster's awareness that there were infringing uses of their technology product was enough to establish liability.

And it's worth noting in this context that Napster on its own was not infringing copyright - to infringe copyright, you needed two Napster users to actively make a decision to infringe: one to make the content available, and one to request a copy of infringing content. In other words, one user had to prompt Napster to spit out what they were looking for, and even then it wouldn't do that unless another user had unlawfully supplied that content to their local copy of Napster. In contrast, if Copilot's output infringes, it only needs the prompting user to make it infringe - which doesn't bode well for Microsoft if the court determines that Copilot's output is an infringement.

Class action against GitHub Copilot

Posted Nov 11, 2022 14:39 UTC (Fri) by bluca (subscriber, #118303) [Link] (1 responses)

Napster and its users did not have a right to ingest copyrighted materials. AI developers have a right, by law (see EU Copyright Directive), to take any source material and use it to build a model, as long as it is publicly available.

Class action against GitHub Copilot

Posted Nov 11, 2022 15:07 UTC (Fri) by farnz (subscriber, #17727) [Link]

That's a misrepresentation both of the Napster case (where the court deemed that the user's right to ingest copyrighted materials into the system was irrelevant), and of the EU Copyright Directive, which merely says that ingesting publicly available material into your system is not copyright infringement on its own, and that the fact of such ingestion does not make the model infringing. This does not preclude a finding of infringement by the model or its output - it simply means that to prove infringement you can't rely on the training data including your copyrighted material, but instead have to show that the output is infringing.

Class action against GitHub Copilot

Posted Nov 11, 2022 14:04 UTC (Fri) by Wol (subscriber, #4433) [Link] (4 responses)

> I'm not sure it matters what it's used for by you and your peers, if it comes with an out-of-the-box ability to also do the other things.

So you think that the sale of knives, hammers, screwdrivers etc should be banned? Because they come with an out-of-the-box ability to be used for murder. Come to that, maybe banning cars would be a very good idea, along with electricity, because they're big killers.

It's not the USE that matters. All tools have the *ability* to be mis-used, sometimes seriously. Ban cameras - they take porn pictures. But if the PRIMARY use is ABuse, that's when the law should step in. Everything else has to rely on the courts and common sense.

In the UK, carrying offensive weapons in public is illegal. Yet many of my friends - quite legally - carry very sharp knives. Because they're "tools of the trade" for chef'ing.

Cheers,
Wol

Class action against GitHub Copilot

Posted Nov 11, 2022 14:19 UTC (Fri) by gspr (guest, #91542) [Link] (3 responses)

Sorry, my phrasing was bad. I did not mean to refer to what actions can be taken with the thing. What I'm trying to convey is that Copilot (perhaps!) contains (some representation of) the copyrighted material, and can *therefore* be used to reproduce the material.

A pen won't reproduce a copyrighted text without a human inputting missing data, even though it of course can he used to reproduce such a text with human assistance. Copilot, on the other hand, can (maybe!)

Class action against GitHub Copilot

Posted Nov 11, 2022 14:33 UTC (Fri) by bluca (subscriber, #118303) [Link] (2 responses)

It is *allowed* to ingest copyrighted materials for the models, by law. Hence it is not subject to the original license, among other things.

Class action against GitHub Copilot

Posted Nov 11, 2022 14:38 UTC (Fri) by farnz (subscriber, #17727) [Link]

Your "hence" does not follow from your first statement.

The law says that the act of ingestion does not itself infringe copyright, nor does the fact of ingestion make the model infringe copyright automatically. It does not, however, says that the model is not subject to the original licence if it is found to be infringing copyright, nor does it say that the output of the model is not contributory infringement.

Class action against GitHub Copilot

Posted Nov 11, 2022 14:42 UTC (Fri) by gspr (guest, #91542) [Link]

> It is *allowed* to ingest copyrighted materials for the models, by law. Hence it is not subject to the original license, among other things.

Yeah. But it's not allowed to *reproduce* that copyrighted material in a way incompatible with the original license. On one extreme, ingesting the material to produce, say, the parity of all the bits involved, is clearly not "reproduction" - and so is OK. On the other extreme, ingesting it and storing it perfectly in internal storage and spitting it back out on demand, clearly is "reproduction" - and surely not OK.

As I see it, the whole debate is about where between those extremes Copilot falls.

I'm not claiming to have the right answer. In fact, I don't even think I have _a_ answer. But I object to your sweeping statements about this seemingly being an easy and clear case.

Class action against GitHub Copilot

Posted Nov 14, 2022 9:28 UTC (Mon) by geert (subscriber, #98403) [Link]

> [...] aimed at analysing text and data in digital form in order to generate information which includes but is not limited to patterns, trends and correlations"

"patterns, trends, and correlations". For code, that would be reporting e.g. that 37% of all code that needs to sort something resort to quicksort, instead of reproducing a perfect copy of the source code of your newly-developed sorting algorithm released under the GPL.

Yeah, the "is not limited to" might be considered a loophole, but I guess anything that doesn't follow the spirit would be tossed out...