The supposed decline of copyleft
At DebConf17, John Sullivan, the executive director of the FSF, gave a talk on the supposed decline of the use of copyleft licenses use free-software projects. In his presentation, Sullivan questioned the notion that permissive licenses, like the BSD or MIT licenses, are gaining ground at the expense of the traditionally dominant copyleft licenses from the FSF. While there does seem to be a rise in the use of permissive licenses, in general, there are several possible explanations for the phenomenon.
When the rumor mill starts
Sullivan gave a recent example of the claim of the decline of copyleft in an article on Opensource.com by Jono Bacon from February 2017 that showed a histogram of license usage between 2010 and 2017 (seen below).
From that, Bacon elaborates possible reasons for the apparent decline of the GPL. The graphic used in the article was actually generated by Stephen O'Grady in a January article, The State Of Open Source Licensing, which said:![]()
Sullivan, however, argued that the methodology used to create both articles was problematic. Neither contains original research: the graphs actually come from the Black Duck Software "KnowledgeBase" data, which was partly created from the old Ohloh web site now known as Open Hub.
To show one problem with the data, Sullivan mentioned two free-software projects, GNU Bash and GNU Emacs, that had been showcased on the front page of Ohloh.net in 2012. On the site, Bash was (and still is) listed as GPLv2+, whereas it changed to GPLv3 in 2011. He also claimed that "Emacs was listed as licensed under GPLv3-only, which is a license Emacs has never had in its history", although I wasn't able to verify that information from the Internet archive. Basically, according to Sullivan, "the two projects featured on the front page of a site that was using [the Black Duck] data set were wrong". This, in turn, seriously brings into question the quality of the data:
Reproducible observations are necessary to the establishment of solid theories in science. Sullivan didn't try to contact Black Duck to get access to the database, because he assumed (rightly, as it turned out) that he would need to "pay for the data under terms that forbid you to share that information with anybody else". So I wrote Black Duck myself to confirm this information. In an email interview, Patrick Carey from Black Duck confirmed its data set is proprietary. He believes, however, that through a "combination of human and automated techniques", Black Duck is "highly confident at the accuracy and completeness of the data in the KnowledgeBase". He did point out, however, that "the way we track the data may not necessarily be optimal for answering the question on license use trend" as "that would entail examination of new open source projects coming into existence each year and the licenses used by them".
In other words, even according to Black Duck, its database may not be useful to establish the conclusions drawn by those articles. Carey did agree with those conclusions intuitively, however, saying that "there seems to be a shift toward Apache and MIT licenses in new projects, though I don't have data to back that up". He suggested that "an effective way to answer the trend question would be to analyze the new projects on GitHub over the last 5-10 years." Carey also suggested that "GitHub has become so dominant over the recent years that just looking at projects on GitHub would give you a reasonable sampling from which to draw conclusions".
Indeed, GitHub published a report in 2015 that also seems to confirm MIT's popularity (45%), surpassing copyleft licenses (24%). The data is, however, not without its own limitations. For example, in the above graph going back to the inception of GitHub in 2008, we see a rather abnormal spike in 2013, which seems to correlate with the launch of the choosealicense.com site, described by GitHub as "our first pass at making open source licensing on GitHub easier".
In his talk, Sullivan was critical of the initial version of the site which he described as biased toward permissive licenses. Because the GitHub project creation page links to the site, Sullivan explained that the site's bias could have actually influenced GitHub users' license choices. Following a talk from Sullivan at FOSDEM 2016, GitHub addressed the problem later that year by rewording parts of the front page to be more accurate, but that any change in license choice obviously doesn't show in the report produced in 2015 and won't affect choices users have already made. Therefore, there can be reasonable doubts that GitHub's subset of software projects may not actually be that representative of the larger free-software community.
In search of solid evidence
So it seems we are missing good, reproducible results to confirm or dispel these claims. Sullivan explained that it is a difficult problem, if only in the way you select which projects to analyze: the impact of a MIT-licensed personal wiki will obviously be vastly different from, say, a GPL-licensed C compiler or kernel. We may want to distinguish between active and inactive projects. Then there is the problem of code duplication, both across publication platforms (a project may be published on GitHub and SourceForge for example) but also across projects (code may be copy-pasted between projects). We should think about how to evaluate the license of a given project: different files in the same code base regularly have different licenses—often none at all. This is why having a clear, documented and publicly available data set and methodology is critical. Without this, the assumptions made are not clear and it is unreasonable to draw certain conclusions from the results.
It turns out that some researchers did that kind of open research in 2016 in a paper called "The Debsources Dataset: Two Decades of Free and Open Source Software" [PDF] by Matthieu Caneill, Daniel M. Germán, and Stefano Zacchiroli. The Debsources data set is the complete Debian source code that covers a large history of the Debian project and therefore includes thousands of free-software projects of different origins. According to the paper:
Sullivan argued that the Debsources data set is interesting because of its quality: every package in Debian has been reviewed by multiple humans, including the original packager, but also by the FTP masters to ensure that the distribution can legally redistribute the software. The existence of a package in Debian provides a minimal "proof of use": unmaintained packages get removed from Debian on a regular basis and the mere fact that a piece of software gets packaged in Debian means at least some users found it important enough to work on packaging it. Debian packagers make specific efforts to avoid code duplication between packages in order to ease security maintenance. The data set covers a period longer than Black Duck's or GitHub's, as it goes all the way back to the Hamm 2.0 release in 1998. The data and how to reproduce it are freely available under a CC BY-SA 4.0 license.
Sullivan presented the above graph from the research paper that showed the evolution of software license use in the Debian archive. Whereas previous graphs showed statistics in percentages, this one showed actual absolute numbers, where we can't actually distinguish a decline in copyleft licenses. To quote the paper again:
Indeed, looking at the graph, at most do we see a rise of the Apache and MIT licenses and no decline of the GPL per se, although its adoption does seem to slow down in recent years. We should also mention the possibility that Debian's data set has the opposite bias: toward GPL software. The Debian project is culturally quite different from the GitHub community and even the larger free-software ecosystem, naturally, which could explain the disparity in the results. We can only hope a similar analysis can be performed on the much larger Software Heritage data set eventually, which may give more representative results. The paper acknowledges this problem:
The Debsources research also shares methodology limitations with Black Duck: while Debian packages are reviewed before uploading and we can rely on the copyright information provided by Debian maintainers, the research also relies on automated tools (specifically FOSSology) to retrieve license information.
Sullivan also warned against "ascribing reason to numbers": people may have different reasons for choosing a particular license. Developers may choose the MIT license because it has fewer words, for compatibility reasons, or simply because "their lawyers told them to". It may not imply an actual deliberate philosophical or ideological choice.
Finally, he brought up the theory that the rise of non-copyleft licenses isn't necessarily at the detriment of the GPL. He explained that, even if there is an actual decline, it may not be much of a problem if there is an overall growth of free software to the detriment of proprietary software. He reminded the audience that non-copyleft licenses are still free software, according to the FSF and the Debian Free Software Guidelines, so their rise is still a positive outcome. Even if the GPL is a better tool to accomplish the goal of a free-software world, we can all acknowledge that the conversion of proprietary software to more permissive—and certainly simpler—licenses is definitely heading in the right direction.
[I would like to thank the DebConf organizers for providing meals for me during the conference.]
| Index entries for this article | |
|---|---|
| GuestArticles | Beaupré, Antoine | 
| Conference | DebConf/2017 | 
      Posted Aug 23, 2017 17:47 UTC (Wed)
                               by Cyberax (✭ supporter ✭, #52523)
                              [Link] (5 responses)
       
I suggest augmenting the report by counting the lines of code in the packages as well. It'd be interesting to see how much new code was added under each license. 
     
    
      Posted Aug 23, 2017 18:16 UTC (Wed)
                               by compenguy (guest, #25359)
                              [Link] (1 responses)
       
In the end, we concluded that blackduck was a million-dollar tool to help you start the investigation, but it could *never* be trusted as to the results of any investigation it did on its own.  It supplied the initial data, which had to all be hand researched and verified. 
We'd even found that it had an entry for some of my employer's proprietary software in their database - listed under an open source license. 
     
    
      Posted Aug 23, 2017 19:07 UTC (Wed)
                               by davidstrauss (guest, #85867)
                              [Link] 
       
     
      Posted Aug 24, 2017 9:51 UTC (Thu)
                               by dunlapg (guest, #57764)
                              [Link] (2 responses)
       Indeed, as one of the main arguments for each license is which one encourages more contribution: Does copyleft, since you're legally required to make your changes available anyway (and because you can be sure your competitors will do the same)?  Or does a permissive license, since it gives potential contributors more leeway in how they incorporate the code into a product?
 On a side note, a few years ago at OSCON it definitely seemed like there was a concerted effort to push permissive licenses in preference to GPL licenses.
      
           
     
    
      Posted Aug 25, 2017 0:02 UTC (Fri)
                               by rahvin (guest, #16953)
                              [Link] (1 responses)
       
To effectively derive any kind of meaning (IMO) you'd need not only lines of code, but number of contributors and downloads, activity and such to try to exclude these one shot code releases that in all likelyhood there is no significant use and IMO aren't of any value in making this evaluation. There's been a dramatic increase in these one shot postings of scripts and other code that before would have just been posted to pastebin or some forum before (along with no copyright statement) and are now posted to github because of the simplicity and ease of use. You include these things in there and it skews the data towards permissive but no one would license this stuff copyleft simply because there isn't a point because the author has no intention of maintaining anything or making a project out of it. Without excluding these kinds of data you simply can't get an accurate representation.  
     
    
      Posted Sep 2, 2017 6:35 UTC (Sat)
                               by mcortese (guest, #52099)
                              [Link] 
       May I assume that your comment applies only to half of the article, where Github is the source of the data, and not to the Debian part? 
     
      Posted Aug 23, 2017 20:05 UTC (Wed)
                               by jzb (editor, #7867)
                              [Link] (4 responses)
       
I certainly don't see a lot of businesses, which are driving much of open source development at this point, embracing copyleft for new things. (Sadly.)  
 
 
     
    
      Posted Aug 23, 2017 21:21 UTC (Wed)
                               by linuxrocks123 (subscriber, #34648)
                              [Link] (3 responses)
       
FWIW (not much), here's my shot at such a list: Linux kernel, Mozilla Firefox, Google Chrome, Android Open Source Project, LibreOffice, TeXLive, GCC, LLVM, Python, Ruby, Perl, OpenJDK, VirtualBox, Apache HTTP Server, MySQL, PostgreSQL, VLC, Handbrake, mpv, Eclipse 
     
    
      Posted Aug 25, 2017 6:03 UTC (Fri)
                               by smckay (guest, #103253)
                              [Link] (2 responses)
       
     
    
      Posted Sep 4, 2017 10:21 UTC (Mon)
                               by codehelp (guest, #57016)
                              [Link] (1 responses)
       
No, the binary .deb does not, it's in the related source package (as listed in the .dsc). Random .debs outside the Debian archive are not going to have the scrutiny of license metadata which underpins the numbers, so those should be excluded. Thereby, you end up with the union of the "important" set and the set of reliable data approximating a subset of the current Debian main archive. 
 
     
    
      Posted Sep 4, 2017 10:26 UTC (Mon)
                               by codehelp (guest, #57016)
                              [Link] 
       
Bah, need to clarify that. Many .debs in Debian will contain a copyright file in /usr/share/doc/<package-name> but some of those can be symlinks, or provided by a related package, so scanning the .debs isn't trivial. Third party .debs from outside the Debian archive are unlikely to bother at all. The point I should have made is that it is much more useful to scan the source code of the archive than to scan the binaries that get installed. 
 
     
      Posted Aug 24, 2017 22:42 UTC (Thu)
                               by cornelio (guest, #117499)
                              [Link] (1 responses)
       
I am personally not interested if people are preferring permissively licenses but rather would prefer to focus on *why* it may be happening, There are fine reasons to go permissive: 
1) Developers don't want to deal with lawyers, ever. 
Add to this that many developers just don't care about enforcing the GPL so ultimately the license doesn't matter. I just don't think copylefting matters anymore, and then those that do care have to consider GPLv2 vs GPLv3, and no, the GPLv3 is a no-go for the linux kernel. 
The licensing rule of thumb seems to be: if you have patents go Apache License, otherwise go MIT. If you are just passing by, contribute under whatever license the author chose. 
     
    
      Posted Sep 1, 2017 0:50 UTC (Fri)
                               by ThinkRob (guest, #64513)
                              [Link] 
       
The default there seems to be MIT, but personally I have a suspicion that 70% or so of the "why" behind that is "I dunno, because that's what <insert library> uses?"  After all, the JS world does have this teensy problem with cargo culting... 
     
      Posted Aug 25, 2017 12:10 UTC (Fri)
                               by flussence (guest, #85566)
                              [Link] 
       
     
      Posted Aug 26, 2017 22:39 UTC (Sat)
                               by bkuhn (subscriber, #58642)
                              [Link] (5 responses)
       One of the items I asked John about during the talk during Q&A is with regard to cultural shifts and how we address them.  I observe a lot of confirmation bias in looking at decline of copyleft stories.  A cultural feeling of "copyleft is a hassle and gets me nothing" has risen among (and sorry for a slightly ageist comment) younger developers.  My theory is that they didn't live through an era where all the tools they needed to do anything were proprietary, like those of us who started programming in the 1980s did. 
In a heavily mixed Free Software / proprietary world, it's easy to think that code will stay free on its own, without any help of its license.  Stories about copyleft decline just feed that mindset. Finally, a lot of what happens in tech culture is what Noam Chomsky talked about in Manufacturing Consent: it's very easy to repeat ideas that simply confirm conventional thinking.  Any idea that challenges convention requires too much time for people to digest arguments, so repeating conventional wisdoms like "copyleft is old hat" works. I say in my talks a lot that copyleft is just a tool, not an end in itself.  I think explain at length about how the real goal is software freedom, and copyleft is a strong strategy to maximize software freedom.  But that line of reasoning takes a lot of time to express.  Most LWN readers probably understand it after years of talking about such things, but the average new developer doesn't have time for it: there's a playground of Free Software to enjoy and work with, and if a few things in their stack are proprietary, it's much easier to ignore if the whole stack isn't proprietary.  We face hardest time for software freedom we're ever going to face: software freedom has had enough success to invite cooption, but not enough to become the default desire and mindset yet.  We've got work to do, and I thank my colleague John for some in-depth analysis here about these politics of work. 
     
    
      Posted Sep 2, 2017 7:01 UTC (Sat)
                               by mcortese (guest, #52099)
                              [Link] (4 responses)
       Exactly. Copyleft is the tool, not the goal. So the question is, if its use declines (provided this claim is true) is it because we care less about software freedom, or because there's less need for such a tool? The former would by worrying, but the latter would mean we are now close to the moment we'll say: mission accomplished, unarm the weapons.
      
           
     
    
      Posted Sep 2, 2017 14:28 UTC (Sat)
                               by bkuhn (subscriber, #58642)
                              [Link] (2 responses)
       That's a false dichotomy, both in the assumption, and the conclusions.  As John's talk pointed out, “decline” could mean a lot of things, and we often don't know which one people when they claim there's a decline:
 As for the reasons for such, should any decline exist, the potential reasons aren't two, as you suggest, but numerous.  In addition to your two, there's also:
The supposed decline of copyleft
      
The supposed decline of copyleft
      
The supposed decline of copyleft
      
The supposed decline of copyleft
      It'd be interesting to see how much new code was added under each license.
The supposed decline of copyleft
      
The supposed decline of copyleft
      The supposed decline of copyleft
      
The supposed decline of copyleft
      
The supposed decline of copyleft
      
The supposed decline of copyleft
      
> Do .debs carry license metadata?
The supposed decline of copyleft
      
People will always see what they want to see
      
2) Users don't want to deal with lawyers, ever.
3) Neither developers nor users want to have the FSF telling them what they can/can't do due to the license.
4) KISS: just compare the complexity of the GPL(any version) against the MIT license.
People will always see what they want to see
      
The supposed decline of copyleft
      
The supposed decline of copyleft
      The supposed decline of copyleft
      The supposed decline of copyleft
      if its use declines (provided this claim is true) is it because we care less about software freedom, or because there's less need for such a tool?
but the latter would mean we are now close to the moment we'll say: mission accomplished, unarm the weapons.
This is the issue I was writing about here, and what the second point above is getting at. There's lots of Free Software being written, but more software than ever in history being created, and therefore there is lots more proprietary software than there once was, too.
     
    
      Posted Sep 3, 2017 2:44 UTC (Sun)
                               by pabs (subscriber, #43278)
                              [Link] (1 responses)
       
I'm reminded of a certain quote, which I will butcher here :) 
The price of software freedom is eternal vigilance and copyleft enforcement. 
     
    
      Posted Sep 3, 2017 23:04 UTC (Sun)
                               by bronson (subscriber, #4806)
                              [Link] 
       
     
      Posted Sep 4, 2017 17:13 UTC (Mon)
                               by federico3 (guest, #101963)
                              [Link] 
       
Many are massive users of Open Source, contribute relatively little compared to how much they benefit from it, and are quite scared by GPLv3. 
 
     
    The supposed decline of copyleft
      
The supposed decline of copyleft
      
The supposed decline of copyleft
      
Not because of the alleged complexity of the license - which is nothing difficult for a team of lawyers - but because companies don't want reciprocity, unsurprisingly.
           