Leading items

A tempest in a toybox

By Jonathan Corbet
February 1, 2012

The eLinux.org web site is currently promoting a project to write a replacement for Busybox under a permissive license. Normally, the writing of more free software is seen as a good thing, but, in this case, there have been complaints about the perceived motivation behind the project. What this discussion shows is that there are some divisions within our community on how our licenses should be enforced - and even what those licenses say.

One could imagine a number of reasons for wanting to rewrite Busybox. Over time, that package has grown to the point that it's not quite the minimal-footprint tool kit that it once was. Android-based systems can certainly benefit from the addition of Busybox, but the Android world tends to be allergic to GPL-licensed software; a non-GPL Busybox might even find a place in the standard Android distribution. But the project page makes another reason abundantly clear:

Busybox is arguably the most litigated piece of GPL software in the world. Unfortunately, it is unclear what the remedy should be when a GPL violation occurs with busybox. Litigants have sometimes requested remedies outside the scope of busybox itself, such as review authority over unrelated products, or right of refusal over non-busybox modules. This causes concern among chip vendors and suppliers.

What seems to be happening in particular is that the primary Busybox litigant - the Software Freedom Conservancy (SFC) - has taken the termination language in GPLv2 to mean that somebody who fails to comply with the license loses all rights to the relevant software, even after they fix their compliance problems. (See Android and the GPLv2 death penalty for more information on this interpretation of the license, which is not universally held). Under this interpretation, they are withholding the restoration of a license to use and distribute Busybox based on conditions that are not directly related to Busybox; among other things, they require compliance for all other free software products shipped by that company, including the Linux kernel.

Thus, according to Matthew Garrett:

The reason to replace Busybox isn't because they don't want to hand over the source to Busybox - it's because Busybox is being used as a proxy to obtain the source code for more interesting GPLed works. People want a Busybox replacement in order to make it easier to infringe the kernel's license.

There is some truth to the notion that, on its own, license enforcement for Busybox is not hugely interesting. Encouraging compliance with the GPL is a good thing, but, beyond that, there is little to be gained by prying the source for a Busybox distribution from a vendor's hands. There just isn't anything interesting being added to Busybox by those vendors. Rob Landley, who was once one of the Software Freedom Law Center's plaintiffs (before the enforcement work moved to the Software Freedom Conservancy) once wrote:

From a purely pragmatic perspective: I spent over a year doing busybox license enforcement, and a dozen lawsuits later I'm still unaware of a SINGLE LINE OF CODE added to the busybox repository as a result...

Rob has been working on the Toybox project, which happens to be the effort that would someday like to replace Busybox, since 2006.

So, beyond the generation of bad publicity for a violator and some cash for the litigants, Busybox enforcement on its own could perhaps be said to achieve relatively little for the community. But a vendor that can't be bothered to provide a tarball for an unmodified Busybox distribution is highly unlikely to have its act together with regard to other projects, starting with the kernel. And that vendor's kernel code can often be the key to understanding their hardware and supporting it in free software. So it is not surprising that a group engaging in GPL enforcement would insist on the release of the kernel source as well.

On its face, it does seem surprising that vendors would object to this requirement - unless they overtly wish to get away with GPL infringement. Tim Bird, who is promoting the Busybox replacement project, has stated that this is not the case. Instead, Tim says:

It is NOT the goal of this to help people violate the GPL, but rather to decrease the risk of some nuclear outcome, should a mistake be made somewhere in the supply chain for a product. For example, it is possible for a mistake made by an ODM (like providing the wrong busybox source version) could result in the recall of millions of unrelated products. As it stands, the demands made by the SFC in order to bring a company back into compliance are beyond the value that busybox provides to a company.

Mistakes do happen. Companies are often surprisingly unaware of where their code comes from or what version they may have shipped in a given product. Often, the initial violation comes from upstream suppliers, with the final vendor being entirely unaware that they are shipping GPL-licensed software at all. What is being claimed here is that SFC's demands are causing the consequences of any such mistakes to be more than companies are willing to risk.

What does the SFC require of infringers? The SFC demands, naturally enough, that the requirements of the GPL be met for the version of Busybox shipped by an infringer. There are also demands for an unknown amount of financial compensation, both to the SFC (The SFC's FY2010 IRS filings show just over $200,000 in revenue from legal settlements) and to the Busybox developer (Erik Andersen) that the SFC is representing. Then there are the demands for compliance for all other GPL-licensed products shipped by the vendor, demands that, it is alleged, extend to the source for binary-only kernel modules. The SFC also evidently demands that future products be submitted to them for a compliance audit before being shipped to customers.

Such demands may well be appropriate for habitual GPL infringers; they are, arguably, a heavy penalty for a mistake. Whether the cases filed by the SFC relate to habitual behavior or mistakes is not necessarily clear; there have been plenty of allegations either way. One person's mistake is another person's intentional abuse.

If Busybox is, for whatever reason, especially mistake-prone, then replacing it with a mistake-proof, BSD-licensed version might make sense. Not using the software at all is certainly a way to avoid infringing its license. What is a bit more surprising is that some developers are lamenting the potential loss of Busybox as a lever for the enforcement of conditions on the use of the kernel. There are a couple of concerns here:

The use of the GPL death penalty is worrisome, in that it gives any copyright holder extreme power over anybody who can be said to have infringed the license in any trivial way. Even if one is fully in agreement with the SFC's use of the termination clause, there are, beyond doubt, entities out there who would like to use it in ways that are not in the interests of the free software community.
One could argue that enforcement of the licenses for other software packages should be left up to the developers who wrote that code. They may have different ideas about how it should be done or even what compliance means. Any developer with copyrights in the kernel (or any other product) is entirely capable of going to the SFC if they want SFC-style enforcement of their rights.

For such a developer to go to the SFC is exactly what Matthew is asking for in his post. Thus far, despite a search for plaintiffs on the SFC's part, that has not happened. Why that might be is not entirely clear. Perhaps kernel developers are not comfortable with how the SFC goes about its business, or perhaps it's something else. It's worth noting that most kernel developers are employed by companies these days, with the result that much of their output is owned by their employers. For whatever reason, companies have shown remarkably little taste for GPL enforcement in any form, so a lot of kernel code is not available to be put into play in an enforcement action.

That last point is, arguably, a real flaw in the corporate-sponsored free software development model - at least, if the viability of copyleft licenses matters. The GPL, like almost any set of rules, will require occasional enforcement if its rules are to be respected; if corporations own the bulk of the code, and they are unwilling to be part of that enforcement effort, respect for the GPL will decrease. One could argue that scenario is exactly what is playing out now; one could also argue that it is causing Busybox, by virtue of being the only project for which active enforcement is happening, to be unfairly highlighted as the bad guy. If GPL enforcement were spread across a broader range of projects, it would be harder for companies to route around - unless, as some fear, that enforcement would drive companies away from GPL-licensed software altogether.

Situations like this one show that there is an increasing amount of frustration building in our community. Some vendors and some developers are clearly unhappy with how some license enforcement is being done, and they are taking action in response. But there is also a lot of anger over the blatant disregard for the requirements of the GPL at many companies; that, too, is inspiring people to act. There are a number of undesirable worst-case outcomes that could result. On one side, GPL infringement could reach a point where projects like the kernel are, for all practical effect, BSD-licensed. Or GPL enforcement could become so oppressive that vendors flee to code that is truly BSD licensed. Avoiding these outcomes will almost certainly require finding a way to enforce our licenses that most of us can agree with.

Comments (229 posted)

SCALE 10X: The trickiness of the education market

February 1, 2012

This article was contributed by Nathan Willis

The classroom presents a special challenge to Linux and open source advocates. At first glance it seems like a slam dunk: free software lowers costs, and provides students with unique opportunities to learn. But even as FOSS adoption grows into big business for enterprises, start-ups, and mom-and-pop shops, it continues to be a minority player in public schools and universities. There are outreach efforts fighting the good fight, but progress is slow, and learning how to adapt the message to the needs of educators is far from a solved problem.

Open Source Software In Education (OSSIE) was a dedicated Saturday track at SCALE 10X in Los Angeles. The sessions included talks about FOSS aimed at educators and talks about promoting open source usage in schools. The track was running concurrently with the rest of the conference, which made it difficult to attend every session, but the overlap between two of the talks raised more than enough questions for the open source community — namely, how to adapt outreach strategies for success in the often intransigent education sector.

Elizabeth Krumbach's "Bringing Linux into Public Schools and Community Centers" was an overview of the Partimus project's work in the San Francisco Bay area (and similar efforts), setting up and maintaining computer labs for K-12 students. Sebastian Dziallas's "Undergraduate Education Strategies" was a look at the Red Hat-run Professors' Open Source Summer Experience (POSSE), which is a workshop for college professors interested in bringing open source to the classroom.

Case studies from Partimus

Krumbach is both a volunteer and a board member with Partimus, a volunteer-driven nonprofit that accepts hardware donations, outfits them with free software, and provides them to San Francisco area schools. As she explained, Partimus's involvement includes not only the desktop systems used by students, but the network tools and system administration support required to keep the labs running. That frequently means setting up thin clients for the lab machines, plus network-mounted drives and imaging servers to provision or replace clients, and often setting up the infrastructure for the network itself: running Ethernet and power to all of the seats. The client software is based on Ubuntu, Firefox, and LibreOffice on the client side, plus OpenLDAP directory service and the DansGuardian filtering system — which fulfills a legal requirement for most schools.

The talk examined three education deployments in depth, and the lessons interested projects could draw from each. The Mount Airy Learning Tree (MALT) is a community center in Philadelphia, and a Partimus-inspired effort by the Ubuntu Pennsylvania team worked with the center to build its first-ever computer lab. The deployment was initially a success, but it did not end well when MALT relocated to a new venue on the other side of the city. The volunteers who had been supporting the lab found it impossible to make the numerous trips required to support the new facility on an ongoing basis, and the new MALT staff were uninterested in the lab. Although community centers are often easier to work with than public schools, Krumbach said, the MALT experience underlines the necessity of having on-the-ground volunteers available, and of having buy-in by the community center staff itself.

The Creative Arts Charter School (CACS) is a San Francisco charter school, meaning that it is publicly funded but can make autonomous decisions apart from the general school district. CACS is one of Partimus's flagship projects, an ongoing relationship that involves both labs and individual installs for various teachers. In the CACS case, Krumbach highlighted that supporting the computers required Partimus volunteers willing to go to the schools and inspect the machines in person. Teachers, being driven by the demands of the fixed academic calendar, rarely call in to report hardware or software failures: they simply work around them.

The ASCEND charter school in Oakland is another Partimus effort, but one with a distinctly different origin story. Robert Litt, a teacher at ASCEND, learned about Linux and open source from an acquaintance, and sought out help himself. Partimus donated a server to the school, but acts more like a technology consultancy, providing help and educational resources, while the labs are run and maintained by Litt. Krumbach used the example as evidence of the value of a local champion: Litt is a forward-thinking, technology-aware teacher in other respects as well; he runs multiple blogs to communicate with and provide assignments to his elementary-age classes.

Schools, grants, and budgets

A successful school deployment is not primarily a technological challenge, Krumbach said: the software is all there, and getting modern hardware donations is relatively easy. Instead, the challenges center around the individuals. She called attention to the "enthusiastic" leadership of Partimus director Christian Einfeldt, who is an effective advocate for the software and motivator of the volunteers. But on-the-ground supporters and strong allies at the school themselves are vital as well. Finally, she emphasized that "selling" schools on open source software required demonstrating it and providing training classes so that the teachers could gain firsthand experience — not merely enumerating a list of benefits.

The audience in the session included many who either worked in education or who had firsthand experience advocating open source software in the classroom, which at times made for impassioned discussion. The topic that occupied the most time was how to respond when a Linux lab is challenged by the sudden appearance of a grant-funded (or corporate-donated) rival lab built on Windows. Apparently, in those situations it is common for the donation or the grant to stipulate that the new hardware be used only in a particular way — which precludes installing another operating system. Krumbach said that Partimus had encountered such a dilemma, and quoted Einfeldt as saying "it's wrong, but it sort of makes me glad when I walk into that lab and one third of the Windows computers don't boot. And they call us back in when half of them don't boot."

Grants and corporate-sponsored donations relate to another important issue, which is that public schools do not deal with budgets like businesses do. They do have a budget (even a technology budget), Krumbach said, but the mindset is completely different: a school's budget is fixed, it is determined by outsiders, and the school has very little input into the process.

In other words, schools don't deal with income and expenses like businesses do, and thus the "you'll start saving money now" argument common in the small business and enterprise market simply carries no weight. A better strategy is to directly connect open source software to opportunities to do new things: a new course, an optional extra-curricular activity, or a faster and simpler way to teach a particular subject. That approach makes charter schools an especially viable market, she said; anyone interested in promoting open source software would do well to pay attention to when local charter schools are in the planning stages.

The higher-ed gap

While Partimus is interested in the primary and secondary education market (and generally only at the desktop-user level), Red Hat's POSSE targets college professors who teach computer science and software engineering. It has been run both as a week-long boot camp and as a weekend experience, but in either case, the professors are split into groups and learn about the open source development model by immersion: getting familiar with wikis, distributed source code management, and communicating only by online means. Dziallas mentioned that (in at least one case) the professors were instructed to only communicate with each other over IRC during the project; IRC like other tools common in open source projects is rarely used in academia.

At the end of a POSSE training course, the expectation is that the professors will use real-world open source projects as exercises and learning opportunities in their own classes — anywhere from serving as source material to assigning semester-long projects that get the students involved in actual development. In addition, the professors leave POSSE with valuable contacts in the open source community, including people who they can turn to when they have questions or when something goes wrong (such as a project delaying its next release to an inopportune time of year).

Dziallas is currently a student at Olin College, and had worked as an intern at Red Hat in the summer of 2011. Based on that internship and his experience with POSSE, he presented his insights on the cultural differences between open source software and academia, and how understanding them could help bridge the gap.

For starters, he pointed out that open source and academia have radically different timing on a number of fronts. Many Linux-related open source projects now operate on steady, six-month release cycles, while universities typically only re-evaluate their curriculum every four years. Planning is also different: open source projects vary from those with completely ad-hoc roadmaps to those that plan a year in advance — but academia thinks in two-to-five-year cycles for everything from hardware refreshes to accreditation. The "execution time" of the two worlds differs, too, with the lifecycle of a typical software release being six to twelve months, but the lifespan of a particular degree taking four to five years.

As a result, he said, from the open source perspective the academic world seems glacially slow, but from academia's vantage point, open source is chaotic and unpredictable. But the differences do not stop there. In open source, jumping in and doing something without obtaining permission first is the preferred technique — while in academia it is anathema. Open source is always preoccupied with the problem of finding and recruiting more contributors, he said, while academia is currently interested in "mentoring," "portfolio material," and the "workplace readiness" of students. Industry has been quick to connect with universities, recruiting interns and new employees, but open source has so far not been as successful.

Challenges for POSSE

POSSE is Red Hat's effort to bridge the gap and find common ground between open source in the wild and academia. The professors are encouraged to find an existing project that they care about, not to simply pick one at random, in the hopes of building a sustainable relationship. The "immersion" method of learning the open source methodology is supposed to be a quicker path to understanding it than any written explanation can provide. But ultimately, building connections between the interested professors and actual developers is one of the biggest benefits of the program.

Dziallas calculated that of all of the college professors with an interest in learning more about open source, only 50% can make it to a POSSE event (for budgetary or time reasons). In addition, about 30% have some sort of "institutional blocker" that precludes their attendance beyond just logistical issues, and a tiny percentage drop out for loss of interest or other reasons.

Thus POSSE is only reaching a fraction of the educators it would like to, but the challenge does not stop there. Among POSSE alumni, the challenge is maintaining a long-term relationship. The amount of support a professor receives after POSSE corresponds to the success rate. Although some are able to use institutional funds to further their involvement with open source (such as travel support to attend a conference, or to bring in a developer to give a guest lecture), most are not. POSSE has only been in operation since 2009, so its long-term sustainability has yet to be proven. But, Dziallas noted, regardless of whether or not the current formula is sustainable, "we must keep trying."

As was the case with Krumbach's talk, the audience question-and-answer segment of the session was taken up largely by the question of how to make inroads into institutions where there is currently no Linux or open source presence. At the college level, of course, the specifics are different. One audience member asked how to combat purchasing decisions that locked out open source, to which Dziallas replied that there is a big difference between the software that students use to do their homework, and what shapes the education experience: if understanding open source and participating in the community is the goal, that goal can be accomplished on a computer running Microsoft software.

Another audience member weighed in on the topic by suggesting that open source advocates take a closer look at the community colleges and technical colleges in their area, not just the four year, "liberal arts" institutions. In the United States, "community" and "technical" colleges typically have a different mandate, the argument went, and one that puts more emphasis on job training and on learning real-world skills. As a result, they move at a different pace than traditional institutions and respond to different factors.

In both sessions, then, the speakers shared their successes, but the audience expressed an ongoing frustration with cracking into the educational computing space. Of course, selling Linux on the desktop has always been a tougher undertaking than selling it in the server room, but it is clear from the conversations at OSSIE that advocating open source in education is far more complicated than substituting "administrator" for "executive" and "classroom" for "office." Both Partimus and POSSE are gaining valuable insights through their own work about the distinct expectations, timing, and interaction it takes to present a compelling case to educators. They still have more information to gather, but even now other open source projects can learn from their progress.

Comments (3 posted)

Thoughts from LWN's UTF8 conversion

By Jonathan Corbet
February 1, 2012

There are a lot of things that one does not learn in engineering school. In your editor's case, anything related to character encodings has to be put onto that list. That despite the fact that your editor's first programs were written on a system with a six-bit character size; a special "shift out" mechanism was needed to represent some of the more obscure characters - like lower case letters. Text was not portable to machines with any other architecture, but the absence of a network meant that one rarely ran into such problems. And when one did, that was what EBCDIC conversion utilities were for.

Later machines, of course, standardized on eight-bit bytes and the ASCII character set. Having a standard meant that nobody had to worry about character set issues anymore; the fact that it was ill-suited for use outside of the United States didn't seem to matter. Even as computers spread worldwide, usage of ASCII stuck around for a long time. Thus, your editor has a ready-made excuse for not thinking much about character sets when he set out to write the "new LWN site code" in 2002. Additionally, the programming languages and web platforms available at the time did not exactly encourage generality in this area. Anything that wasn't ASCII by then was Latin-1 - for anybody with a sufficiently limited world view.

Getting past the Latin-1 limitation took a long time and a lot of work, but that seems to be accomplished and stable at this point. In the process, your editor observed a couple of things that were not immediately obvious to him. Perhaps those observations will prove useful to anybody else who has had a similarly sheltered upbringing.

Now, too, we have a standard for character representation; it is called "Unicode." In theory, all one needs to do is to work in Unicode, and all of those unpleasant character set problems will go away. Which is a nice idea, but there's a little detail that is easy to skip over: Unicode is not actually a standard for the representation of characters. It is, instead, a mapping between integer character numbers ("code points") and the characters themselves. Nobody deals directly with Unicode; they always work with some specific representation of the Unicode code points.

Suitably enlightened programming languages may well have a specific type for dealing with Unicode strings. How the language represents those strings is variable; many use an integer type large enough to hold any code point value, but there are exceptions. The abortive PHP6 attempt used a variable-width encoding based on 16-bit values, for example. With luck, the programmer need not actually know how Unicode is handled internally to a given language, it should Just Work.

But the use of a language-specific internal representation implies that any string obtained from the world outside a given program is not going to be represented in the same way. Of course, there are standards for string representations too - quite a few standards. The encoding used by LWN now - UTF8 - is a good choice for representing a wide range of code points while being efficient in LWN's still mostly-ASCII world. But there are many other choices, but, importantly, they are all encodings; they are not "Unicode."

So programs dealing in Unicode text must know how outside-world strings are represented and convert those strings to the internal format before operating on them. Any program which does anything more complicated to text than copying it cannot safely do so if it does not fully understand how that text is represented; any general solution almost certainly involves decoding external text to a canonical internal form first.

This is an interesting evolution of the computing environment. Unix-like systems are supposed to be oriented around plain text whenever possible; everything should be human-readable. We still have the human-readable part - better than before for those humans whose languages are not well served by ASCII - but there is no such thing as "plain text" anymore. There is only text in a specific encoding. In a very real sense, text has become a sort of binary blob that must be decoded into something the program understands before it can be operated on, then re-encoded before going back out into the world. A lot of Unicode-related misery comes from a failure to understand (and act on) that fundamental point.

LWN's site code is written in Python 2. Version 2.x of the language is entirely able to handle Unicode, especially for relatively large values of x. To that end, it has a unicode string type, but this type is clearly a retrofit. It is not used by default when dealing with strings; even literal strings must be marked explicitly as Unicode, or they are just plain strings.

When Unicode was added to Python 2, the developers tried very hard to make it Just Work. Any sort of mixture between Unicode and "plain strings" involves an automatic promotion of those strings to Unicode. It is a nice idea, in that it allows the programmer to avoid thinking about whether a given string is Unicode or "just a string." But if the programmer does not know what is in a string - including its encoding - nobody does. The resulting confusion can lead to corrupted text or Python exceptions; as Guido van Rossum put it in the introduction to Python 3, "This value-specific behavior has caused numerous sad faces over the years." Your editor's experience, involving a few sad faces for sure, agrees with this; trying to make strings "just work" leads to code containing booby traps that may not spring until some truly inopportune time far in the future.

That is why Python 3 changed the rules. There are no "strings" anymore in the language; instead, one works with either Unicode text or binary bytes. As a general rule, data coming into a program from a file, socket, or other source is binary bytes; if the program needs to operate on that data as text, it must explicitly decode it into Unicode. This requirement is, frankly, a pain; there is a lot of explicit encoding and decoding to be done that didn't have to happen in a Python 2 program. But experience says that it is the only rational way; otherwise the program (and programmer) never really know what is in a given string.

In summary: Unicode is not UTF8 (or any other encoding), and encoded text is essentially binary data. Once those little details get into a programmer's mind (quite a lengthy process, in your editor's case), most of the difficulties involved in dealing with Unicode go away. Much of the above is certainly obvious to anybody who has dealt with multiple character encodings for any period of time. But it is a bit of a foreign mind set to developers who have spent their time in specialized environments or with languages that don't recognize Unicode - kernel developers, for example. In the end, writing programs that are able to function in a multiple-encoding world is not hard; it's just one more thing to think about.

Comments (91 posted)

Page editor: Jonathan Corbet
Next page: Security>>