Class action against GitHub Copilot
Class action against GitHub Copilot
Posted Nov 11, 2022 10:49 UTC (Fri) by farnz (subscriber, #17727)In reply to: Class action against GitHub Copilot by bluca
Parent article: Class action against GitHub Copilot
I think you're reading the exception in EU law over-broadly, and it doesn't say quite what you're claiming.
The exception for text and data mining says that you are not infringing copyright solely by virtue of feeding material into your machine learning system, and that the resulting model is not itself automatically a derived work of the inputs. It does not say that the output of your machine learning system cannot infringe copyright. As far as I can tell, the intent behind this exception is to allow your system to engage in the sorts of copying that we already say are outside the scope of copyright when a human does it - for example, having read about io_uring, a human might build their next system around a submission and completion queue pair, and this is not a copy in the sense of copyright law.
This means that a court could rule legitimately that a given output from the system is sufficient of a copy to be a copyright infringement by the system, and that the use of the system is thus contributory infringement whenever it produces a literal copy of a protected part of its input.
This, in turn, would bring use of systems like GitHub Copilot into line with employing a human to do the same job: if, as a result of a prompt, I write out precisely the code that a previous employer used (complete with comments - whether I copied it from a notebook, or kept a copy of a past employer's source code), then that is copyright infringement. If, on the other hand, I write code that's similar in structure simply because there are only a few ways to loop over all items in a container, that's not copyright infringement.
Assuming the US courts apply this sort of reasoning, then the question before them is whether a human writing the same code with the same prompting would be infringing copyright or not - if you substitute "a Microsoft employee wrote this for a Microsoft customer to use" for "GitHub Copilot wrote this for a Microsoft customer to use", do you still see infringement or not?
Posted Nov 11, 2022 12:10 UTC (Fri)
by Wol (subscriber, #4433)
[Link]
So the exception doesn't cover you for using the result ...
Cheers,
Posted Nov 11, 2022 14:49 UTC (Fri)
by bluca (subscriber, #118303)
[Link] (2 responses)
Exactly, and there are many commentators that completely miss this, and assume the opposite, hence the need to clarify it.
> It does not say that the output of your machine learning system cannot infringe copyright.
Sure, but it implies that it does not automatically does so either. Then the onus is on the complainers to show that, first of all, these artificially induced snippets are copyrightable in the first place, and after that to show that intentionally inducing the tool to reproduce them, which requires knowing in advance what they look like and what keywords and surrounding setting to prepare in order to achieve that result, means that it's the tool that is at fault rather than the user.
Posted Nov 11, 2022 15:25 UTC (Fri)
by KJ7RRV (subscriber, #153595)
[Link]
"> _____ _____ _____ _____ and _____ _____ _____ _____ you _____ _____ _____ _____ solely _____ _____ _____ _____ material _____ _____ _____ _____ system, _____ _____ _____ _____ model _____ _____ _____ _____ a _____ _____ _____ _____ inputs. _____ _____ _____ _____ many _____ _____ _____ _____ this, _____ _____ _____ _____ hence _____ _____ _____ _____ it. _____ _____ _____ _____ say _____ _____ _____ _____ your _____ _____ _____ _____ infringe _____ _____ _____ _____ implies _____ _____ _____ _____ automatically _____ _____ _____ _____ the _____ _____ _____ _____ complainers _____ _____ _____ _____ of _____ _____ _____ _____ snippets _____ _____ _____ _____ first _____ _____ _____ _____ to _____ _____ _____ _____ the _____ _____ _____ _____ which _____ _____ _____ _____ what _____ _____ _____ _____ what _____ _____ _____ _____ to _____ _____ _____ _____ achieve _____ _____ _____ _____ it's _____ _____ _____ _____ at _____ _____ _____ _____ user."
Posted Nov 11, 2022 15:37 UTC (Fri)
by farnz (subscriber, #17727)
[Link]
You're not actually clarifying, unfortunately. The effect of the EU Copyright Directive is not to say that the model and its training process are guaranteed not to infringe copyright; rather, it's to say that the mere fact that copyrighted material was used as input to the training process does not imply that the training process or resulting model infringe, in and of itself.
And you're asking more of the complainers than EU law does. Under EU law, the complainers first have to show that there is a copyrightable interest in the output (which you do get right), and after that, they only have to show that the tool's output infringes that copyright. In particular, the tool is at fault if the material is infringing and it produces it from a non-infringing prompt - even if the prompt has to be carefully crafted to cause the tool to produce the infringing material.
As an example, let's use the prompt:
This is not infringing in most jurisdictions - there's nothing in there that is copyrightable at this point, as all the names are either descriptive, or completely non-descript. If a machine learning model then goes on to output the Quake III Arena implementation of Q_rsqrt from this prompt, complete with the comments (including the commented out "this can be removed" line), then there's infringement by the tool, and if it can be demonstrated that the only place the tool got the code from was its training set, the tool provider is likely to be found to be a contributory infringer.
It doesn't matter that I've set the tool up with a troublesome prompt here (that's the first 5 lines of Q_rsqrt, just renamed to Q_rsqrt); I haven't infringed, and thus the infringement is a result of the tool's training data being copied verbatim into its output.
This is, FWIW, exactly the same test that would apply if I gave that prompt to a human who'd seen Quake III Arena's source code, and they infringed by copying the Quake III Arena implementation - I would not be able to prove infringement just because the human had seen the original source, but I would be able to do so if, given the prompt, they produced a literal copy Q_rsqrt.
Class action against GitHub Copilot
Wol
Class action against GitHub Copilot
Class action against GitHub Copilot
Class action against GitHub Copilot
float rsqrt(float number) {
long i;
float x2, y;
const float threehalfs = 1.5F;
x2 = number * 0.5F;
y = number;