LWN.net Weekly Edition for December 3, 2015

A referendum on GPL enforcement

By Jonathan Corbet
December 2, 2015

One of the key provisions of the GNU General Public License (GPL) is that derivative products must also be released under the GPL. A great many companies rigorously follow the terms of the license, while others avoid GPL-licensed software altogether because they are unwilling to follow those terms. Some companies, though, seem to feel that the terms of the GPL do not apply to them, presenting the copyright holder with two alternatives: find a way to get those companies to change their behavior, or allow the terms of the license to be flouted. In recent times, little effort has gone into the first option; depending on the results of an ongoing fundraising campaign, that effort may drop to nearly zero. We would appear to be at a decision point with regard to how (and whether) we would like to see GPL enforcement done within our community.

When software is distributed in ways that violate the GPL, the first order of business is always to open a discussion with the person or company doing the distribution in the hope of effecting a change. Should that discussion fail, though, the only alternative may well be the court system. One has to look long and hard to find examples of the GPL being enforced through legal action, though. The Germany-based gpl-violations.org project has posted some notable successes over the years, but the project has been dormant for some time (it's worth noting, though, that the news page says that enforcement activity should restart in 2016). One hears murmurings about a specific kernel developer launching quiet suits as a revenue-generation activity, but there is no public record of — and little public support for — that work. About the only other group doing GPL enforcement is the Software Freedom Conservancy (SFC), which is based in the US.

The SFC is, of course, supporting the ongoing suit against VMware. Beyond that, the group does a fair amount of quiet enforcement activity that does not end up in court. The SFC has found itself in a tight financial position, though, as the result of a loss of corporate funding. In response, it has launched a fundraising campaign aimed at building a new financial base consisting of individual supporters. Some 750 supporters ($90,000/year) are needed to keep "basic community services" running, and 2,500 ($300,000/year) to support the GPL enforcement operation (beyond the VMware suit, which has separate funding). These are daunting amounts of money to raise, but, as anybody who has run an organization of any size knows, the SFC is not asking for a lot.

Your editor has heard people claim that the SFC's problems are self-made. The aggressive BusyBox enforcement actions of a few years back are seen by many as having scared many companies away while having brought about the release of little, if any, interesting source. The use of BusyBox as a lever to force compliance for other projects (such as the kernel) that were not a party to the action was also disturbing to some. SFC president Bradley Kuhn is not as diplomatic an interface to the organization as some might like; even others working in the GPL enforcement area have had significant disagreements with him.

Whatever the reasons may be, the simple fact is that the SFC is in a bit of a lonely position. To an extent, that loneliness may be an inherent part of a GPL enforcer's role. Without a willingness to litigate, GPL enforcement lacks teeth, but a willingness to litigate may necessarily bring with it a reputation for litigiousness.

None of that changes the fact that, for now, only the SFC seems willing to take on this lonely role. Companies have made it clear that that they do not wish to take an active role in GPL enforcement; even the companies that are the most enthusiastic code contributors and the most meticulous about observing the GPL in their own activities seem unwilling to work to ensure that others do the same. Perhaps the only significant case of a company asserting the GPL was when IBM raised GPL-violation charges against the SCO Group more than ten years ago; even then, IBM had to come under significant attack itself before employing the GPL in its own defense.

For those who care about the GPL, enforcement is important. It seems safe to say that, if the GPL is not enforced, its provisions will eventually come to have no meaning. Companies that expend the (often considerable) resources to stay in compliance will be at a disadvantage relative to those that don't bother; eventually the list of companies that don't bother will surely grow. A world in which the GPL is not enforced is a world where the GPL loses its force and becomes much like the BSD license in actual effect. If ignoring the provisions of the GPL becomes the norm, we may find ourselves without an effective copyleft license for software.

Some might welcome that development; to them, the GPL is an overly complex holdover from the past that is not necessary in today's world. But it can be argued that the GPL deserves a lot of credit for the success of Linux relative to other free operating systems. Its source-release requirements helped to prevent forks and made it safe for companies to contribute in the knowledge that their competitors could not take undue advantage of their work. A world without the GPL could be a world with more fragmentation — and more proprietary software.

It seems clear that the GPL must be respected if it is to remain a viable license. That said, there may be room for people to differ on how that respect should be ensured. Those who think that the SFC is not going about things in the right way would do well to propose alternatives. There must certainly be some good ideas circulating for other ways to increase GPL compliance.

For those who do appreciate the role the SFC plays in the GPL-enforcement area, this would probably be a good time to think about how that work is funded. It seems safe to say that corporations cannot be counted on to ensure that GPL enforcement happens. The SFC has chosen not to pursue GPL-enforcement lawsuits as a revenue-generation technique, saying, probably rightly, that it would compromise the real goal: bringing companies into compliance. So it is up to the individuals who care enough about this activity to support it going forward.

As Bradley put it in this posting, the current fundraising campaign is a sort of referendum on whether the community likes the work the SFC is doing and wants it to continue. It is possible that the answer is "no," but, regardless of the outcome, this seems like a question that deserves serious consideration; the consequences of the answer, either way, could be felt for years into the future.

Comments (51 posted)

Randomness in the web browser

By Nathan Willis
December 2, 2015

Developers today may associate random-number generation most closely with security issues—and understandably so, given what we hear in the news about attempts to undermine the encryption that protects our communication online. But randomness is important in many other tasks, from games to data sampling to scientific simulation. Recently, several developers decided to explore the properties of the JavaScript Math.random() implementations provided in major web browsers, and found most to be wanting.

For context, JavaScript's Math.random() returns a floating-point pseudo-random number from the interval [0,1). It is specifically not aimed at secure or cryptographic usage; for cryptography, there is the RandomSource.getRandomValues() from the W3C Web Crypto API.

V8

On November 19, Mike Malone from Betable, a company that produces software for online gaming, posted a look at the Math.random() implementation in the V8 JavaScript engine (used by Chrome/Chromium and by several server-side projects, like Node.js). Malone noted that Betable used Math.random() to generate API request identifiers, and had calculated that 22-character identifiers would be enough to avoid collisions—at around one collision in six billion requests. But, much to the team's surprise, an employee discovered that the code in question was generating colliding identifiers regularly—and fairly often.

As it happened, the culprit was V8's Math.random(). The ECMAScript specification (the official standard defining what is known as JavaScript) offers no guidance as to what algorithm should be used to generate the numbers, nor does it even mandate with what precision the function should return its floats. Thus, each implementer has essentially gone its own direction. The V8 implementation, perhaps like a lot of code, was not thoroughly vetted when it was originally committed, and it used a mediocre pseudo-random number generator (PRNG) called MWC1616. Worse, though, the V8 code included a bug that drastically shortened the cycle length of the number sequence it generated—from 2⁶⁰ to 2¹⁵ (on 64-bit architectures). The difference meant that collisions in Betable's code would happen about once in every 30,000 identifiers.

In short, the V8 code uses two fast sub-generators, each with a cycle length of around 2³⁰, then combines their results to produce its final output. But rather than XORing the output of the two sub-generators (or combining them in some more complex manner), it simply concatenates them. In practice, this produces terrible results, because the recommended method for scaling the floats produced by Math.random() into an integer range involves a floor function, which essentially chops off the low-order bits. That, in turn, means the bits contributed by the second sub-generator gets chopped off, too. If the two sub-generators were more thoroughly mixed, this would not be the case.

Betable eventually switched over to using a cryptographic PRNG instead, but Malone's post notes that there are a variety of algorithms available that not only produce higher-quality pseudo-random outputs than the one used in V8, but they also run faster. Interestingly enough, the V8 Math.random() code was updated shortly after Malone's post, fixing the MWC1616 implementation to better mix the outputs of the two sub-generators. There has been no public discussion of the patch, though; the project's code-review site records no information as to why the patch in question was committed, and Malone offered a comment on how V8's MWC1616 could be further improved, but there has been no reply.

Firefox

Mozilla's Jan de Mooij was among the many who saw Malone's post, and decided to investigate the real-world performance of Firefox's Math.random() for comparison. He discovered that Firefox, too, uses a weak PRNG implementation. On November 30, he posted a follow-up to the original post that included pass/fail results from running the TestU01 randomness test suite against both Firefox and V8.

The algorithm used by Firefox, he said, was a linear congruential generator (LCG) that was "imported from Java decades ago". It fails 12 of the 96 tests in TestU01's medium-sized "Crush" test battery, and one of the ten tests in the "SmallCrush" battery (interestingly enough, the SmallCrush test that Firefox fails is the "BirthdaySpacings" test, which looks for the frequency of collisions: the same weakness discovered by Betable). In both cases, Firefox did perform considerably better than the pre-patch version of Chrome.

De Mooij also worried that Firefox's PRNG should be able to produce 53-bit precision (i.e., the default precision of floats as defined in the ECMAScript specification), but the implementation only returned a 32-bit result. "This means the result of the RNG is always one of about 4.2 billion different numbers, instead of 9007199 billion (2^53). In other words, it can generate 0.00005% of all numbers an ideal RNG can generate."

Thus, he set out to replace the PRNG with XorShift128+, which is both regarded as high-quality (passing, for example, TestU01's "BigCrush" battery) and is known to be fast. The resulting patch was subsequently accepted, so it should make its way into a Firefox stable release within a few months.

The rest

De Mooij did not limit his experiments to V8 and Firefox, however. He also ran the TestU01 test batteries against Safari 9, Internet Explorer (IE) 11, and Microsoft's new Edge browser. This revealed some surprises. For one thing, although Safari and Chrome formerly shared the same codebase (WebKit), they have diverged on Math.random(). Safari was using a different PRNG named GameRand, although it exhibited similarly poor numbers against TestU01. De Mooij filed a bug against WebKit, and on November 30, WebKit was patched to also use XorShift128+ for Math.random().

The tests also suggest that IE 11 was using the same PRNG as Firefox, since it produced essentially the same pass/fail results. Finally, De Mooij noted that the Edge browser seems to be using the same PRNG as IE 11, although so far he has not been able to make Edge run the complete Crush test battery without crashing, so those results should be taken with a grain of salt.

Moving forward, it will be interesting to see whether or not the attention that this story has attracted will persuade the V8 team to switch over to XorShift128+ for its Math.random() PRNG as well. While the newly patched versions of Firefox and Chrome both pass the SmallCrush test battery, Chrome (using MWC1616) still fails several Crush tests. Regardless of what happens, though, the response to this incident has been remarkably fast: in less than two weeks since Malone's first post, three major browser engines have updated their code.

Perhaps that speed stems from the knowledge that, right or wrong, software developers will be tempted to use Math.random() for critical functionality. After all, the mere fact that a separate, cryptographically secure API is available did not prevent Betable from encountering a potentially serious bug in its software. And the LCG that Firefox imported from Java is weak enough that predicting its output is trivial; anyone relying on that code for robust randomness should expect disappointment.

Still, the root cause of these widespread weaknesses found in Math.random() was copying in code without thoroughly vetting it. One hopes that it is better to copy in XorShift128+ than it was to copy in the last PRNGs, but better still would be a renewed emphasis on putting these functions to the test regularly.

Comments (9 posted)

Streamlining license compliance with FOSSology 3.0

By Nathan Willis
December 3, 2015

License compliance in big free-software projects is not a simple task. Beyond the basic requirements (such as providing access to source code), compliance can consist of numerous details: figuring out how the licenses on individual components combine in an aggregate work, ensuring that required license texts are reproduced where needed, tracking the names of copyright holders to properly give credit, and so on. Little wonder, then, that compliance management has grown into a sizeable industry in recent years. Perhaps the best-known open-source compliance tool is FOSSology, which released version 3.0 in early November. The update adds new user-interface features intended to make project workflow smoother, and it adds several new functional enhancements.

Broadly speaking, FOSSology follows the same design used by other license-compliance systems. Users upload source code from a project, at which point the program scans the contents of the uploaded files to look for licensing information. The goal is to identify what license applies to every individual file—a task that requires some heuristics when, say, license statements may appear in per-file headers and in directory-wide README files. The end result is an unambiguous understanding of what licenses and copyrights apply to the total codebase; the license requirements can then be met (and copyrights listed) when the source code is distributed. Determining which license applies is a problem that cannot be completely automated, so FOSSology (and similar tools) provide a workflow in which users can examine the hard-to-determine cases and apply a decision. It is also important, in the long run, that users don't have to repeat too much of the process whenever refreshing just one portion of a large codebase.

The FOSSology 3.0 release makes improvements to several facets of this workflow. The web-based user interface has been improved (both to be faster and to provide additional flexibility) and there are some new options for the critical step in the aforementioned process: automatically detecting license information by scanning the uploaded code.

Scanners

One distinguishing feature found in FOSSology is that it supports multiple, pluggable code-scanning engines. Earlier releases supported two scanners, Monk and Nomos. The new release adds support for a third, called Ninka. Monk is a basic full-text scanner that looks for matches against known license text, while Nomos is a regular-expression based scanner that picks out significant phrases that may come from variant wordings of a license.

Unlike the others, Ninka originates from outside the FOSSology project; it is based on ideas from a 2010 research paper (available at the Ninka site) and attempts to identify licenses based on sentence-level matching. All of the scanners included in FOSSology can be run as standalone utilities, although their main usage is intended to be through batch-scanning jobs that are scheduled and performed automatically, then later reviewed.

In addition to the scanning engines, FOSSology supports user-written filters and heuristics. On that front, the new release adds a new option: whenever the Monk and Nomos scanners automatically detect the same license for a file, a rule can be enabled that automatically accepts the determination and saves it, sparing the human reviewer from manually inspecting that file. Presumably that equates to the user placing a high degree of confidence in Monk and Nomos—although, naturally, human error can occur just like heuristic errors.

In any case, it does not appear that anyone expects FOSSology users to switch off manual review entirely; quite a bit of work went into revising the user interface. The release notes highlight a new UI for the license-review and copyright-statement–editing tasks, as well as a new jQuery-based "folder view" that supports sorting, filtering, and viewing extended file attributes. The additional attributes that FOSSology exposes include some rather important data; when files are scanned, there are modules that attempt to pick out other details of significance besides the license, such as authorship and copyright statements. In fact, the 3.0 release adds a new interface for reviewing and editing the copyright-detection results; in addition to updating copyright information (or fixing simple typos), users can now flag files for further review or discussion, adding notes where needed.

Copyright statements were detected in prior releases, too; it is just the editing interface that is new for 3.0. But the new release does add support for detecting an entirely new class of data: customs or export-control information. Many readers will be familiar with the export restrictions that have been imposed on encryption software over the years. For now, encryption seems to be the primary target of the export-control scanner—based on the keywords it looks for—although "avionics" is included as well, and it will flag all instances of "foreign trade" and other such general terms.

Other new features

Among the other new features added in this release, FOSSology now supports the idea of "candidate licenses," which amount to a state in between the licenses currently tracked in the FOSSology instance's database and a completely unrecognized license. The reasoning is that users may want to tag files as having a license that is not yet in the database or perhaps even to create a filter that recognizes the new license. In prior versions of FOSSology, an admin user would have to add each new license to the database before this could happen. By supporting candidate licenses, users processing code uploads can tag files as needed without waiting for the admin, but if the candidate license later turns out to be unneeded, it has not been unnecessarily added to the database.

There are several other workflow additions in a similar vein. For instance, users can now save a license-conclusion decision for a particular file and have that decision tied to the hash of the file. As long as the file's hash remains the same on subsequent uploads, the file will not reappear in the list of files to review. Hopefully, such little additions speed up the process of reviewing uploads, but without running the risk of letting incomplete or inaccurate decisions creep in.

Last but certainly not least, the FOSSology 3.0 release adds some new import and export options. Users can export Software Package Data Exchange (SPDX) 2.0 files that represent the licensing and copyright information for a project's entire codebase. FOSSology can now also import and export data in comma-separated value (CSV) form, which may make it easier to connect with other tools. And it can generate README or COPYING files based on the license that has been determined to apply to a directory or to a project.

Given how intricate and complex license compliance can be, the obvious conclusion is that tracking and updating compliance information is likely to always include a significant time investment. But tools like FOSSology make the process as smooth as possible, and it is encouraging to see that the latest release has found so many areas for improvement. With more scanners implementing different approaches and with more flexibility in how licensing information is processed, perhaps keeping license compliance in order will someday be reduced to a job simple enough that it becomes routine. No doubt many free-software developers would welcome that.

Comments (none posted)

Page editor: Jonathan Corbet

Inside this week's LWN.net Weekly Edition

Security: Fallout from the Python certificate verification change; New vulnerabilities in chromium, ffmpeg, grub2, kernel, ...
Kernel: Post-init read-only memory; TLS in the kernel; SOCK_DESTROY; A journal for MD/RAID5; Uses for selfie sticks.
Distributions: Upheaval in the Debian Live project; RHEL 7.2, OL 7.2, F21 eol, Ubuntu, ...
Development: What's new in PHP 7; Django 1.9; Introducing sd-event; GIMP at 20 years old; ...
Announcements: SFC fundraiser, FSF giving guide, ELCE videos, Garrett on hacker culture, ...

Next page: Security>>