DeVault: GitHub Copilot and open source laundering
DeVault: GitHub Copilot and open source laundering
Posted Jun 24, 2022 1:13 UTC (Fri) by NYKevin (subscriber, #129325)In reply to: DeVault: GitHub Copilot and open source laundering by Wol
Parent article: DeVault: GitHub Copilot and open source laundering
> If your licence doesn't make it clear, then the Judge will almost certainly side against you on the basis that "copyright law is ambiguous".
That is egregiously wrong. Ambiguous law does not automatically favor the defendant in any jurisdiction I've ever heard of. At best, the defendant might be able to raise the defense of "innocent infringement" in some jurisdictions. But under US law, that does not relieve the defendant of liability, it merely reduces the monetary amount of their damages, which can still be quite substantial if many copies were made. Also, a valid copyright notice often defeats or greatly weakens this defense (see e.g. 17 USC 401(d)), depending on jurisdiction.
Seriously, if anyone in this thread is contemplating acting on this suggestion, I would strongly urge that person to consult an attorney who specializes in copyright law. This is not how the law works at all. You cannot go before a judge and say "the law was ambiguous so I just did it anyway," and expect to automatically win.
> But if your licence does make it clear, the Judge needs a reason to say "your licence is invalid" (and he'd rather not).
You have it backwards. Either the defendant is arguing that no license is required (and so it doesn't matter whether the license is valid or invalid), or the defendant is arguing that the license is valid and its actions fall within the scope of the license. Arguing that the license is invalid is something the plaintiff might do in order to defeat the latter defense; it never makes sense for the defendant to raise such an argument.
      Posted Jun 24, 2022 2:30 UTC (Fri)
                               by Wol (subscriber, #4433)
                              [Link] (25 responses)
       
And the legislature will not examine YOUR code, and decide YOUR case ... It  is the Judge who *decides* whether it is a derivative work or not. 
> That is egregiously wrong. Ambiguous law does not automatically favor the defendant in any jurisdiction I've ever heard of. 
Who was talking about the *law*? I was talking about the *licence*. 
If the licence makes it absolutely clear that the licensor considers ML to create derivative code, then the licensee cannot claim an innocent mistake. The licensee MUST claim that a licence is not required and copyright does not apply. 
At the end of the day, it's down to the Judge to *apply* the law. And if it is clear to the Judge that the defendant "knew or should have known" that they were acting against the wishes of the licensor, then there is no defence of estoppel, or "innocent infringement", or "but I thought it was okay". 
And, faced with the choice of siding with the plaintiff and saying to the defendent "you knew the defendent did not permit that", or CREATING NEW LAW by explicitly defining ML into the Public Domain or whatever, which do you think a Judge is going to choose? 
At the end of the day, putting this stuff into your licence does not change the law. But it makes it a damn sight more likely that the Judge is going to side with your interpretation of the law. 
Cheers, 
     
    
      Posted Jun 24, 2022 2:55 UTC (Fri)
                               by NYKevin (subscriber, #129325)
                              [Link] (24 responses)
       
That is exactly my point. The judge will make this decision, based on the facts of the case and what the legislature wrote in the statute. Not based on the license. The license has zero to do with what is or is not a derivative work. 
     
    
      Posted Jun 24, 2022 8:04 UTC (Fri)
                               by Wol (subscriber, #4433)
                              [Link] (23 responses)
       
If the Judge has to decide whether ML is a derivative work or not (and create new law in the process!!!), then if the licensor made it clear that he considered it DID make a derivative work, the Judge will be inclined to side with the licensor. 
If the license says "I consider this to be a derivative work", the Judge will not want to create new law by disagreeing - you're effectively twisting the Judge's arm. How far he lets you twist it is down to him :-) 
Remember PJ - Judges try to upset the apple-cart as little as possible. If you give the Judge an out, he will take it ... 
Cheers, 
     
    
      Posted Jun 24, 2022 9:15 UTC (Fri)
                               by kleptog (subscriber, #1183)
                              [Link] (7 responses)
       
In particular, the EU Copyright Directive states that Text and Data Mining for the purpose of research and education is permitted. You can write whatever your like in your license, it has no effect. Now, GitHub is making a commercial product here so they don't get to claim an broad exemption. So it comes to the individual member states to regulate as they see fit. 
Which basically makes the conclusion: It depends. 
     
    
      Posted Jun 24, 2022 11:01 UTC (Fri)
                               by bluca (subscriber, #118303)
                              [Link] (3 responses)
       
     
    
      Posted Jun 24, 2022 11:03 UTC (Fri)
                               by bluca (subscriber, #118303)
                              [Link] 
       
     
      Posted Jun 24, 2022 12:20 UTC (Fri)
                               by Wol (subscriber, #4433)
                              [Link] (1 responses)
       
That's easily got round - opt out by default and grant people you DO like a licence. In other words, the default is "mining is permitted", and the law says you have to explicitly change THE DEFAULT if you don't like it. Pretty sensible, imho. 
Cheers, 
     
    
      Posted Jun 24, 2022 12:37 UTC (Fri)
                               by bluca (subscriber, #118303)
                              [Link] 
       
     
      Posted Jun 24, 2022 13:49 UTC (Fri)
                               by edeloget (subscriber, #88392)
                              [Link] (2 responses)
       
The same things goes for books, photographies and so on. You can train a language model on copyrighted books for research and education. But if you want to do it for the purpose of offering a commercial service, you have to get the proper authorization (this can be costly but it's not out of reach for a company like Microsoft). 
I haven't read all the comments below, so maybe I'll state a point that has already been proposed. The "model is a derivative work" question is interesting, yet I don't think this is a real issue. The problem I see (and I think it's a problem at the moment I read the "we can do it" rationale by Github) is that I don't believe them: contrary to what they say, the code written by the machine is a derivative work, no matter how hard they'll try to press on this. Not only you can easily obtain code that is a direct copy paste of existing code (with comments, if needed :)) but even if you don't, you'll end up with code that 1) has a striking similary with exiting code (after I have used it for a while, I don't envision Copilot to magically imagine new algorithms) and 2) is directly inspired by the input code (Copilot is unable to code a solution to a new problem ; for exemple, it cannot propose you to use an API it does not already know). 
So, as a conclusion, I would not go by the "a model created using this code is a derivative work" clause. I would go by "any code created by a ML model trained with this code is a derivative work" clause which I find both more logical and more satisfactory. As a consequence, the tool itself can exist, but cannot be used to created anything but free software - as I see it, this would be a win-win situation (although it might be tough to market to software shops :)) 
     
    
      Posted Jun 24, 2022 15:57 UTC (Fri)
                               by bluca (subscriber, #118303)
                              [Link] 
       
The EU directive allows TDM for commercial programs too. It adds an opt-out provision for that case. 
     
      Posted Jun 24, 2022 16:36 UTC (Fri)
                               by rgmoore (✭ supporter ✭, #75)
                              [Link] 
       I absolutely agree that any code produced by Copilot that is a verbatim copy of anything from its training corpus would be a copyright violation, unless that piece fell under one of the established limits on copyright, such as purely functional material that has a restricted number of ways it can be expressed or a snippet that's too small to be considered expressive.  Material that's suspiciously similar to something in the training corpus would be at least deeply suspect.  But that would be true whether it comes from Copilot or from a human programmer.  If your code is a copy of someone else's, it's a copyright violation regardless of how it got that way unless it isn't eligible for copyright in the first place.
 The problem with trying to restrict Copilot (and similar programs) is threefold:
 Again, this applies only to training the model.  The output of the model is a different thing and may be a copyright violation even if the model itself isn't.
      
           
     
      Posted Jun 24, 2022 10:00 UTC (Fri)
                               by mjg59 (subscriber, #23239)
                              [Link] (9 responses)
       
Why? Do you have examples of this occurring? 
     
    
      Posted Jun 24, 2022 12:25 UTC (Fri)
                               by Wol (subscriber, #4433)
                              [Link] (8 responses)
       
PJ was very clear on this - Judges are very reluctant to rock the boat. If "is this a derivative work" is not clear then, given the choice of a NARROW interpretation of the licence that says "the licence denies permission, I'll side with the licence", or a BROAD interpretation that says "all licences like that are invalid", which one are they going to choose? 
Especially when the defendant has "known or should have known" the plaintiff's express wish and ignored it. 
Cheers, 
     
    
      Posted Jun 24, 2022 17:45 UTC (Fri)
                               by rgmoore (✭ supporter ✭, #75)
                              [Link] (3 responses)
       The point is that isn't how licenses work.  The idea behind a copyright license is that the licensor grants the licensee some rights they would normally be denied by copyright law in exchange for a consideration.  For example, if copyright law would normally deny me the right to use your program to train my ML model, you can write a license that would grant me that right.
 But in practice a license can't prevent someone from doing something they would otherwise have the right to do under copyright law.  It is possible to write a license that requires the licensee giving up some rights under copyright law as part of the consideration they get for receiving some other rights.  But nobody is forced to agree to the license!  If they simply refuse to accept the license, they can continue doing anything they normally had the right to do under copyright law.  Refusing the licensing terms would deny them whatever rights the license would grant them, but if they weren't intending to do those things it's an empty threat.
      
           
     
    
      Posted Jun 24, 2022 20:25 UTC (Fri)
                               by Wol (subscriber, #4433)
                              [Link] (2 responses)
       
But if copyright LAW is not clear on the matter? 
That is what everybody is ignoring - it is down to the Judge to decide what the law IS. If the licence explicitly refuses permission, does the Judge make a NARROW ruling that says the licensor's explicit wishes rule, or a BROAD ruling that all such clauses are invalid. 
PJ was quite clear that given a choice between a broad or narrow ruling, the Judge would opt for the narrow ruling every time. 
And I don't know which case it was, but there was a discussion about a pro-software-patent Judge some while back, who ruled "In THIS case, the software is clearly non-patentable. I can't conceive of a scenario where any software is patentable". Note he didn't even attempt to say software isn't patentable. He was pro-patents. But he stated, in a ruling, "I don't think it is possible for software to pass the patentability bar". He made a very narrow ruling, but accepted that the consequences would probably be wide. 
Cheers, 
     
    
      Posted Jun 24, 2022 21:21 UTC (Fri)
                               by rgmoore (✭ supporter ✭, #75)
                              [Link] (1 responses)
       But that applies only if the broad and narrow ruling turn out the same way.  In that case, the judge will usually rule on the narrowest possible grounds that results in the outcome they think is right for the case.  If the broad and narrow grounds for the ruling have opposite results, the judge has to go based on which one seems to be a better reading of the law and situation, not just on narrow versus broad.  More generally, narrow vs broad is something that's more true of low-level judges than of higher-level ones.  Even if individual judges make narrow rulings, it's likely that different judges will rule differently.  That will create uncertainty and force a higher court to rule on the matter, creating a broader ruling.  That's the way these things usually go.
      
           
     
    
      Posted Jun 25, 2022 17:05 UTC (Sat)
                               by khim (subscriber, #9252)
                              [Link] 
       Where does that idea comes from? Narrow ruling is used precisely to ensure the possibility of broad ruling (made in a different case by a different judge later) to proclaim the opposite outcome! Narrow is almost always better. Because, well, it's narrow. It describes the situation more precisely. The only save you can have is to proclaim that narrow reading is so narrow it's not applicable to your case at all. That often happens with patents (judge is presented with half-dozen of patents which can be, theoretically, be treated as prior art and eliminate the patent completely, but 9 times out of 10 judge doesn't do that, but only just proclaim that yes, patent is still valid, just not applicable for your case). I would say it's the way these things usually don't go. 99% of time decision doesn't reach high enough courts to decide anything definitively. Usually it takes dozens of cases and decades of litigation for that to happen. 
     
      Posted Jun 24, 2022 19:30 UTC (Fri)
                               by mjg59 (subscriber, #23239)
                              [Link] (3 responses)
       
     
    
      Posted Jun 24, 2022 20:43 UTC (Fri)
                               by Wol (subscriber, #4433)
                              [Link] (2 responses)
       
And that is exactly the argument in front of the Judge. *IS* it a derived work? And if the law is unclear, and the licensor is explicit that he considers it a derived work, then the only safe option for the Judge is to rule that it IS a derived work and let the legislators sort it out. 
And this is why you can NOT "defer to the law" in this argument. The question at issue is not "is this a derivative work according to the law?", but "what is the law?". THAT is the argument in front of the Judge. 
The legislators can choose to let the genie out the bottle. The Judge will not be happy about letting the genie out the bottle off his own bat. 
Cheers, 
     
    
      Posted Jun 24, 2022 21:13 UTC (Fri)
                               by mjg59 (subscriber, #23239)
                              [Link] (1 responses)
       
     
    
      Posted Jul 1, 2022 8:32 UTC (Fri)
                               by nim-nim (subscriber, #34454)
                              [Link] 
       
Despite years of commercial pretense to the reverse professionals know “smart” systems are no smarter than the human who coded them. 
     
      Posted Jun 24, 2022 18:23 UTC (Fri)
                               by NYKevin (subscriber, #129325)
                              [Link] (4 responses)
       
Nope, that's not how the law works. Stating it over and over again does not make it true. 
For the purposes of determining whether X is a derivative work of Y, the judge looks at X, Y (its contents, not its license), and the copyright statute. Nothing else. 
     
    
      Posted Jun 24, 2022 18:28 UTC (Fri)
                               by NYKevin (subscriber, #129325)
                              [Link] 
       
     
      Posted Jun 24, 2022 20:50 UTC (Fri)
                               by Wol (subscriber, #4433)
                              [Link] (1 responses)
       
And if that's not enough for him to make up his mind? 
That *SHOULD* be all that's needed. But if that IS all that's needed, why can't we all make our own minds up? Surely it's obvious? Why do we need Judges? It can't be THAT hard ... ? 
Cheers, 
     
    
      Posted Jun 24, 2022 21:59 UTC (Fri)
                               by NYKevin (subscriber, #129325)
                              [Link] 
       
     
      Posted Jun 27, 2022 14:18 UTC (Mon)
                               by anselm (subscriber, #2796)
                              [Link] 
       
I think it would still be of some interest whether X resulted from Y through “cp Y X” or through a query to Copilot whose model was trained on Y. In the first case, X is pretty clearly a derived work of Y. In the second case, Microsoft, at least, would probably like to claim it isn't.
 
     
    DeVault: GitHub Copilot and open source laundering
      
Wol
DeVault: GitHub Copilot and open source laundering
      
DeVault: GitHub Copilot and open source laundering
      
Wol
DeVault: GitHub Copilot and open source laundering
      
DeVault: GitHub Copilot and open source laundering
      
DeVault: GitHub Copilot and open source laundering
      DeVault: GitHub Copilot and open source laundering
      
Wol
DeVault: GitHub Copilot and open source laundering
      
DeVault: GitHub Copilot and open source laundering
      
DeVault: GitHub Copilot and open source laundering
      
DeVault: GitHub Copilot and open source laundering
      DeVault: GitHub Copilot and open source laundering
      
DeVault: GitHub Copilot and open source laundering
      
Wol
DeVault: GitHub Copilot and open source laundering
      DeVault: GitHub Copilot and open source laundering
      
Wol
DeVault: GitHub Copilot and open source laundering
      PJ was quite clear that given a choice between a broad or narrow ruling, the Judge would opt for the narrow ruling every time.
      > But that applies only if the broad and narrow ruling turn out the same way.
DeVault: GitHub Copilot and open source laundering
      DeVault: GitHub Copilot and open source laundering
      
DeVault: GitHub Copilot and open source laundering
      
Wol
DeVault: GitHub Copilot and open source laundering
      
DeVault: GitHub Copilot and open source laundering
      
DeVault: GitHub Copilot and open source laundering
      
DeVault: GitHub Copilot and open source laundering
      
DeVault: GitHub Copilot and open source laundering
      
Wol
DeVault: GitHub Copilot and open source laundering
      
DeVault: GitHub Copilot and open source laundering
      For the purposes of determining whether X is a derivative work of Y, the judge looks at X, Y (its contents, not its license), and the copyright statute. Nothing else.
 
           