By Jonathan Corbet
February 1, 2012
The eLinux.org web site is currently promoting
a project to write
a replacement for Busybox under a permissive license. Normally, the
writing of more free software is seen as a good thing, but, in this case,
there have been
complaints about the
perceived motivation behind the project. What this
discussion shows is that there are some divisions within our community on
how our licenses should be enforced - and even what those licenses say.
One could imagine a number of reasons for wanting to rewrite Busybox. Over
time, that package has grown to the point that it's not quite the
minimal-footprint tool kit that it once was. Android-based systems can
certainly benefit from the addition of Busybox, but the Android world tends
to be allergic to GPL-licensed software; a non-GPL Busybox might even find
a place in the standard Android distribution. But the project page makes
another reason abundantly clear:
Busybox is arguably the most litigated piece of GPL software in the
world. Unfortunately, it is unclear what the remedy should be when
a GPL violation occurs with busybox. Litigants have sometimes
requested remedies outside the scope of busybox itself, such as
review authority over unrelated products, or right of refusal over
non-busybox modules. This causes concern among chip vendors and
suppliers.
What seems to be happening in particular is that the primary Busybox
litigant - the Software Freedom Conservancy (SFC) - has taken the termination
language in GPLv2 to mean that somebody who fails to comply with the
license loses all rights to the relevant software, even after they fix
their compliance problems. (See Android and
the GPLv2 death penalty for more information on this interpretation of
the license, which is not universally held). Under this interpretation,
they are withholding the restoration of a license to use and distribute
Busybox based on conditions that are not directly related to Busybox; among
other things, they require compliance for all other free software products
shipped by that company, including the Linux kernel.
Thus, according to Matthew Garrett:
The reason to replace Busybox isn't because they don't want to hand
over the source to Busybox - it's because Busybox is being used as
a proxy to obtain the source code for more interesting GPLed
works. People want a Busybox replacement in order to make it easier
to infringe the kernel's license.
There is some truth to the notion that, on its own, license enforcement for
Busybox is not hugely interesting. Encouraging compliance with the GPL is
a good thing, but, beyond that, there is little to be gained by prying the
source for a Busybox distribution from a vendor's hands. There just isn't
anything interesting being added to Busybox by those vendors. Rob Landley,
who was once one of the Software Freedom Law Center's plaintiffs (before
the enforcement work moved to the Software Freedom Conservancy) once wrote:
From a purely pragmatic perspective: I spent over a year doing
busybox license enforcement, and a dozen lawsuits later I'm still
unaware of a SINGLE LINE OF CODE added to the busybox repository as
a result...
Rob has been working on the Toybox
project, which happens to be the effort that would someday like to
replace Busybox, since 2006.
So, beyond the generation of bad publicity for a violator and some cash for
the litigants, Busybox enforcement on its own could perhaps be said to
achieve relatively little for the community. But a vendor that can't be
bothered to provide a tarball for an unmodified Busybox distribution is
highly unlikely to have its act together with regard to other projects,
starting with the kernel. And that vendor's kernel code can often be the
key to understanding their hardware and supporting it in free software. So
it is not surprising that a group engaging in GPL enforcement would insist
on the release of the kernel source as well.
On its face, it does seem surprising that vendors would object to
this requirement - unless they overtly wish to get away with GPL
infringement. Tim Bird, who is promoting the Busybox
replacement project, has stated that this
is not the case. Instead, Tim says:
It is NOT the goal of this to help people violate the GPL, but
rather to decrease the risk of some nuclear outcome, should a
mistake be made somewhere in the supply chain for a product. For
example, it is possible for a mistake made by an ODM (like
providing the wrong busybox source version) could result in the
recall of millions of unrelated products. As it stands, the demands
made by the SFC in order to bring a company back into compliance
are beyond the value that busybox provides to a company.
Mistakes do happen. Companies are often surprisingly unaware of where their
code comes from or what version they may have shipped in a given product.
Often, the initial violation comes from upstream suppliers, with the final
vendor being entirely unaware that they are shipping GPL-licensed software
at all. What is being claimed here is that SFC's demands are causing the
consequences of any such mistakes to be more than companies are willing to
risk.
What does the SFC require of infringers? The SFC demands, naturally
enough, that the requirements of the GPL be met for the version of Busybox
shipped by an
infringer. There are also demands for an unknown amount of financial
compensation, both to the SFC (The SFC's FY2010
IRS filings show just over $200,000 in revenue from legal
settlements) and to the Busybox developer (Erik Andersen)
that the SFC is representing.
Then there are the demands for compliance
for all other GPL-licensed products shipped by the vendor, demands that, it
is alleged, extend to the source for binary-only kernel modules. The SFC also
evidently demands that future products be submitted to them for a
compliance audit before being shipped to customers.
Such demands may well be appropriate for habitual GPL infringers; they are,
arguably, a heavy penalty for a mistake. Whether the cases filed by the
SFC relate to habitual behavior or mistakes is not necessarily clear; there
have been plenty of allegations either way. One person's mistake is
another person's intentional abuse.
If Busybox is, for whatever reason, especially mistake-prone, then
replacing it with a mistake-proof, BSD-licensed version might make sense.
Not using the software at all is certainly a way to avoid infringing its
license. What is a bit more surprising is that some developers are
lamenting the potential loss of Busybox as a lever for the enforcement of
conditions on the use of the kernel. There are a couple of concerns here:
- The use of the GPL death penalty is worrisome, in that it gives
any copyright holder extreme power over anybody who can be said to
have infringed the license in any trivial way. Even if one is fully
in agreement with the SFC's use of the termination clause, there are,
beyond doubt, entities out there who would like to use it in ways that
are not in the interests of the free software community.
- One could argue that enforcement of the licenses for other software
packages should be left up to the developers who wrote that code.
They may have different ideas about how it should be done or even
what compliance means. Any
developer with copyrights in the kernel (or any other product) is
entirely capable of going to the SFC if they want SFC-style
enforcement of their rights.
For such a developer to go to the SFC is exactly what Matthew is asking for
in his post. Thus far, despite a search for plaintiffs on the SFC's part,
that has not happened. Why that might be is not entirely clear. Perhaps
kernel developers are not comfortable with how the SFC goes about its
business, or perhaps it's something else. It's worth noting that most
kernel developers are employed by companies these days, with the result
that much of their output is owned by their employers. For whatever reason,
companies have shown remarkably little taste for GPL enforcement in
any form, so a lot of kernel code is not available to be put into play in
an enforcement action.
That last point is, arguably, a real flaw in the corporate-sponsored free
software development model - at least, if the viability of copyleft
licenses matters. The GPL, like almost any set of rules, will require
occasional enforcement if its rules are to be respected; if corporations
own the bulk of the code, and they are unwilling to be part of that
enforcement effort, respect for the GPL will decrease. One could argue
that scenario is exactly what is playing out now; one could also argue that
it is causing Busybox, by virtue of being the only project for which active
enforcement is happening, to be unfairly highlighted as the bad guy. If
GPL enforcement were spread across a broader range of projects, it would be
harder for companies to route around - unless, as some fear, that
enforcement would drive companies away from GPL-licensed software altogether.
Situations like this one show that there is an increasing amount of
frustration building in our community. Some vendors and some developers
are clearly unhappy with how some license enforcement is being done, and
they are taking action in response. But there is also a lot of anger over
the blatant disregard for the requirements of the GPL at many companies;
that, too, is inspiring people to act. There are a number of undesirable
worst-case outcomes that could result. On one side, GPL infringement could
reach a point where projects like the kernel are, for all practical effect,
BSD-licensed. Or GPL enforcement could become so oppressive that vendors
flee to code that is truly BSD licensed. Avoiding these outcomes will
almost certainly require finding a way to enforce our licenses that most of
us can agree with.
Comments (229 posted)
February 1, 2012
This article was contributed by Nathan Willis
The classroom presents a special challenge
to Linux and open source advocates. At first glance it seems like a slam
dunk: free software lowers costs, and provides students with unique
opportunities to learn. But even as FOSS adoption grows into big business
for enterprises, start-ups, and mom-and-pop shops, it continues to be a
minority player in public schools and universities. There are outreach
efforts fighting the good fight, but progress is slow, and learning how to
adapt the message to the needs of educators is far from a solved problem.
Open Source Software In Education (OSSIE) was a dedicated Saturday track at SCALE 10X in Los Angeles. The sessions included talks about FOSS aimed at educators and talks about promoting open source usage in schools. The track was running concurrently with the rest of the conference, which made it difficult to attend every session, but the overlap between two of the talks raised more than enough questions for the open source community — namely, how to adapt outreach strategies for success in the often intransigent education sector.
Elizabeth Krumbach's "Bringing
Linux into Public Schools and Community Centers" was an overview of the
Partimus project's work in the San
Francisco Bay area (and similar efforts), setting up and maintaining
computer labs for K-12 students. Sebastian Dziallas's "Undergraduate
Education Strategies" was a look at the Red Hat-run Professors' Open Source Summer
Experience (POSSE), which is a workshop for college professors interested in bringing open source to the classroom.
Case studies from Partimus
Krumbach is both a volunteer and a board member with Partimus, a
volunteer-driven nonprofit that accepts hardware donations, outfits them
with free software, and provides
them to San Francisco area schools. As she explained, Partimus's
involvement includes not only the desktop systems used by students, but the
network tools and system administration support required to keep the labs
running. That frequently means setting up thin clients for the lab
machines, plus network-mounted drives and imaging servers to provision or
replace clients, and often setting up the infrastructure for the network
itself: running Ethernet and power to all of the seats. The client
software is based on Ubuntu, Firefox, and LibreOffice on the client side,
plus OpenLDAP directory service and the DansGuardian filtering system — which fulfills a legal requirement for most schools.
The talk examined three education deployments in depth, and the lessons interested projects could draw from each. The Mount Airy Learning Tree (MALT) is a community center in Philadelphia, and a Partimus-inspired effort by the Ubuntu Pennsylvania team worked with the center to build its first-ever computer lab. The deployment was initially a success, but it did not end well when MALT relocated to a new venue on the other side of the city. The volunteers who had been supporting the lab found it impossible to make the numerous trips required to support the new facility on an ongoing basis, and the new MALT staff were uninterested in the lab. Although community centers are often easier to work with than public schools, Krumbach said, the MALT experience underlines the necessity of having on-the-ground volunteers available, and of having buy-in by the community center staff itself.
The Creative Arts Charter School (CACS) is a San Francisco charter
school, meaning that it is publicly funded but can make autonomous
decisions apart from the general school district. CACS is one of
Partimus's flagship projects, an ongoing relationship that involves both
labs and individual installs for various teachers. In the CACS case,
Krumbach highlighted that supporting the computers required Partimus
volunteers willing to go to the schools and inspect the machines in person.
Teachers, being driven by the demands of the fixed academic calendar,
rarely call in to report hardware or software failures: they simply work
around them.
The ASCEND charter school in Oakland is another Partimus effort, but one
with a distinctly different origin story. Robert Litt, a teacher at
ASCEND, learned about Linux and open source from an acquaintance, and
sought out help himself. Partimus donated a server to the school, but acts
more like a technology consultancy, providing help and educational
resources, while the labs are run and maintained by Litt. Krumbach used
the example as evidence of the value of a local champion: Litt is
a forward-thinking, technology-aware teacher in other respects as well; he
runs multiple blogs to communicate with and provide assignments to his elementary-age classes.
Schools, grants, and budgets
A successful school deployment is not primarily a technological challenge, Krumbach said: the software is all there, and getting modern hardware donations is relatively easy. Instead, the challenges center around the individuals. She called attention to the "enthusiastic" leadership of Partimus director Christian Einfeldt, who is an effective advocate for the software and motivator of the volunteers. But on-the-ground supporters and strong allies at the school themselves are vital as well. Finally, she emphasized that "selling" schools on open source software required demonstrating it and providing training classes so that the teachers could gain firsthand experience — not merely enumerating a list of benefits.
The audience in the session included many who either worked in education or who had firsthand experience advocating open source software in the classroom, which at times made for impassioned discussion. The topic that occupied the most time was how to respond when a Linux lab is challenged by the sudden appearance of a grant-funded (or corporate-donated) rival lab built on Windows. Apparently, in those situations it is common for the donation or the grant to stipulate that the new hardware be used only in a particular way — which precludes installing another operating system. Krumbach said that Partimus had encountered such a dilemma, and quoted Einfeldt as saying "it's wrong, but it sort of makes me glad when I walk into that lab and one third of the Windows computers don't boot. And they call us back in when half of them don't boot."
Grants and corporate-sponsored donations relate to another important issue, which is that public schools do not deal with budgets like businesses do. They do have a budget (even a technology budget), Krumbach said, but the mindset is completely different: a school's budget is fixed, it is determined by outsiders, and the school has very little input into the process.
In other words, schools don't deal with income and expenses like businesses do, and thus the "you'll start saving money now" argument common in the small business and enterprise market simply carries no weight. A better strategy is to directly connect open source software to opportunities to do new things: a new course, an optional extra-curricular activity, or a faster and simpler way to teach a particular subject. That approach makes charter schools an especially viable market, she said; anyone interested in promoting open source software would do well to pay attention to when local charter schools are in the planning stages.
The higher-ed gap
While Partimus is interested in the primary and secondary education market (and generally only at the desktop-user level), Red Hat's POSSE targets college professors who teach computer science and software engineering. It has been run both as a week-long boot camp and as a weekend experience, but in either case, the professors are split into groups and learn about the open source development model by immersion: getting familiar with wikis, distributed source code management, and communicating only by online means. Dziallas mentioned that (in at least one case) the professors were instructed to only communicate with each other over IRC during the project; IRC like other tools common in open source projects is rarely used in academia.
At the end of a POSSE training course, the expectation is that the
professors will use real-world open source projects as exercises and
learning opportunities in their own classes — anywhere from serving
as source material to assigning semester-long projects that get the
students involved in actual development. In addition, the professors leave
POSSE with valuable contacts in the open source community, including people
who they can turn to when they have questions or when something goes wrong (such as a project delaying its next release to an inopportune time of year).
Dziallas is currently a student at Olin College, and had worked as an intern at Red Hat in the summer of 2011. Based on that internship and his experience with POSSE, he presented his insights on the cultural differences between open source software and academia, and how understanding them could help bridge the gap.
For starters, he pointed out that open source and academia have radically different timing on a number of fronts. Many Linux-related open source projects now operate on steady, six-month release cycles, while universities typically only re-evaluate their curriculum every four years. Planning is also different: open source projects vary from those with completely ad-hoc roadmaps to those that plan a year in advance — but academia thinks in two-to-five-year cycles for everything from hardware refreshes to accreditation. The "execution time" of the two worlds differs, too, with the lifecycle of a typical software release being six to twelve months, but the lifespan of a particular degree taking four to five years.
As a result, he said, from the open source perspective the academic world seems glacially slow, but from academia's vantage point, open source is chaotic and unpredictable. But the differences do not stop there. In open source, jumping in and doing something without obtaining permission first is the preferred technique — while in academia it is anathema. Open source is always preoccupied with the problem of finding and recruiting more contributors, he said, while academia is currently interested in "mentoring," "portfolio material," and the "workplace readiness" of students. Industry has been quick to connect with universities, recruiting interns and new employees, but open source has so far not been as successful.
Challenges for POSSE
POSSE is Red Hat's effort to bridge the gap and find common ground between open source in the wild and academia. The professors are encouraged to find an existing project that they care about, not to simply pick one at random, in the hopes of building a sustainable relationship. The "immersion" method of learning the open source methodology is supposed to be a quicker path to understanding it than any written explanation can provide. But ultimately, building connections between the interested professors and actual developers is one of the biggest benefits of the program.
Dziallas calculated that of all of the college professors with an
interest in learning more about open source, only 50% can make it to a
POSSE event (for budgetary or time reasons). In addition, about 30% have
some sort of "institutional blocker" that precludes their attendance beyond
just logistical issues, and a
tiny percentage drop out for loss of interest or other reasons.
Thus POSSE is only reaching a fraction of the educators it would like to, but the challenge does not stop there. Among POSSE alumni, the challenge is maintaining a long-term relationship. The amount of support a professor receives after POSSE corresponds to the success rate. Although some are able to use institutional funds to further their involvement with open source (such as travel support to attend a conference, or to bring in a developer to give a guest lecture), most are not. POSSE has only been in operation since 2009, so its long-term sustainability has yet to be proven. But, Dziallas noted, regardless of whether or not the current formula is sustainable, "we must keep trying."
As was the case with Krumbach's talk, the audience question-and-answer segment of the session was taken up largely by the question of how to make inroads into institutions where there is currently no Linux or open source presence. At the college level, of course, the specifics are different. One audience member asked how to combat purchasing decisions that locked out open source, to which Dziallas replied that there is a big difference between the software that students use to do their homework, and what shapes the education experience: if understanding open source and participating in the community is the goal, that goal can be accomplished on a computer running Microsoft software.
Another audience member weighed in on the topic by suggesting that open source advocates take a closer look at the community colleges and technical colleges in their area, not just the four year, "liberal arts" institutions. In the United States, "community" and "technical" colleges typically have a different mandate, the argument went, and one that puts more emphasis on job training and on learning real-world skills. As a result, they move at a different pace than traditional institutions and respond to different factors.
In both sessions, then, the speakers shared their
successes, but the audience expressed an ongoing frustration with cracking into
the educational computing space. Of course, selling Linux on the desktop
has always been a tougher undertaking than selling it in the server room,
but it is clear from the conversations at OSSIE that advocating open source
in education is far more complicated than substituting "administrator" for
"executive" and "classroom" for "office." Both Partimus and POSSE are
gaining valuable insights through their own work about the distinct
expectations, timing, and interaction it takes to present a compelling case
to educators. They still have more information to gather, but even now
other open source projects can learn from their progress.
Comments (3 posted)
By Jonathan Corbet
February 1, 2012
There are a lot of things that one does not learn in engineering school.
In your editor's case, anything related to character encodings has to be
put onto that list. That despite the fact that your editor's first
programs were written on a system with a six-bit character size; a special
"shift out" mechanism was needed to represent some of the more obscure
characters - like lower case letters. Text was not portable to machines
with any other architecture, but the absence of a network meant that one
rarely ran into such problems. And when one did, that was what EBCDIC
conversion utilities were for.
Later machines, of course, standardized on eight-bit bytes and the ASCII
character set. Having a standard meant that nobody had to worry about
character set issues anymore; the fact that it was ill-suited for use
outside of the United States didn't seem to matter. Even as computers
spread worldwide, usage of ASCII stuck around for a long time. Thus, your
editor has a ready-made excuse for not thinking much about character sets
when he set out to write the "new LWN site code" in 2002. Additionally,
the programming languages and web platforms available at the time did not
exactly encourage generality in this area. Anything that
wasn't ASCII by then was Latin-1 - for anybody with a sufficiently limited
world view.
Getting past the Latin-1 limitation took a long time and a lot of work, but
that seems to be accomplished and stable at this point. In the process,
your editor observed a couple of things that were not immediately obvious
to him. Perhaps those observations will prove useful to anybody else who
has had a similarly sheltered upbringing.
Now, too, we have a standard for character representation; it is called "Unicode."
In theory, all one needs to do is to work in Unicode, and all of those
unpleasant character set problems will go away. Which is a nice idea, but
there's a little detail that is easy to skip over: Unicode is not actually
a standard for the representation of characters. It is, instead, a mapping
between integer character numbers ("code points") and the characters
themselves. Nobody deals directly with Unicode; they always work with some
specific representation of the Unicode code points.
Suitably enlightened programming languages may well have a specific type
for dealing with Unicode strings. How the language represents those
strings is variable; many use an integer type large enough to hold any code
point value, but there are exceptions. The abortive PHP6 attempt used a variable-width
encoding based on 16-bit values, for example. With luck, the programmer
need not actually know how Unicode is handled internally to a given
language, it should Just Work.
But the use of a language-specific internal representation implies that any
string obtained from the world outside a given
program is not going to be represented in the same way. Of course, there
are standards for string representations too - quite a few standards. The
encoding used by LWN now - UTF8 - is a good choice for representing a wide
range of code points while being efficient in LWN's still mostly-ASCII
world. But there are many other choices, but, importantly, they are all
encodings; they are not "Unicode."
So programs
dealing in Unicode text must know how outside-world strings are represented
and convert those strings to the internal format before operating on them.
Any program which does anything more complicated to text than copying it
cannot safely do so if it does not fully understand how that text is
represented; any general solution almost certainly involves decoding
external text to a canonical internal form first.
This is an interesting evolution of the computing environment. Unix-like
systems are supposed to be oriented around plain text whenever possible;
everything should be human-readable. We still have the human-readable part
- better than before for those humans whose languages are not well served
by ASCII - but there is no such thing as "plain text" anymore. There is
only text in a specific encoding. In a very real sense, text has become a
sort of binary blob that must be decoded into something the program
understands before it can be operated on, then re-encoded before going back
out into the world. A lot of Unicode-related misery comes from a failure
to understand (and act on) that fundamental point.
LWN's site code is written in Python 2. Version 2.x of the language is
entirely able to handle Unicode, especially for relatively large values
of x. To that end, it has a unicode string type, but this
type is clearly a retrofit. It is not used by default when dealing with
strings; even literal strings must be marked explicitly as Unicode, or they
are just plain strings.
When Unicode was added to Python 2, the developers tried very hard to make
it Just Work. Any sort of mixture between Unicode and "plain strings"
involves an automatic promotion of those strings to Unicode. It is a nice
idea, in that it allows the programmer to avoid thinking about whether a
given string is Unicode or "just a string." But if the programmer does not
know what is in a string - including its encoding - nobody does. The
resulting confusion can lead to corrupted text or Python exceptions; as
Guido van Rossum put it in the
introduction to Python 3, "This value-specific behavior has
caused numerous sad faces over the years." Your editor's
experience, involving a few sad faces for sure, agrees with this; trying to
make strings "just work" leads to code containing booby traps that may not
spring until some truly inopportune time far in the future.
That is why Python 3 changed the rules. There are no "strings" anymore in
the language; instead, one works with either Unicode text or binary bytes.
As a general
rule, data coming into a program from a file, socket, or other source is
binary bytes; if the program needs to operate on that data as text, it must
explicitly decode it into Unicode. This requirement is, frankly, a pain;
there is a lot of explicit encoding and decoding to be done that didn't
have to happen in a Python 2 program. But experience says that it is
the only rational way; otherwise the program (and programmer) never really
know what is in a given string.
In summary: Unicode is not UTF8 (or any other encoding), and encoded text
is essentially binary data. Once those little details get into a
programmer's mind (quite a lengthy process, in your editor's case), most of
the difficulties involved in dealing with Unicode go away.
Much of the above is certainly obvious to anybody who has dealt with
multiple character encodings for any period of time. But it is a bit of a
foreign mind set to developers who have spent their time in specialized
environments or with languages that don't recognize Unicode - kernel
developers, for example. In the end, writing programs that are able to
function in a multiple-encoding world is not hard; it's just one more thing
to think about.
Comments (91 posted)
Page editor: Jonathan Corbet
Next page: Security>>