LWN.net Logo

LWN.net Weekly Edition for July 5, 2012

Akademy: Defensive publications

By Jake Edge
July 4, 2012

One way to reduce the number of software patents that are issued is to document interesting ideas before someone locks them up. Defensive publications are a way to express those ideas in an accessible form for patent examiners so that they will be aware of prior art during the application process. Open Invention Network (OIN) chief operating officer Raffi Gostanian came to Akademy in Tallinn, Estonia to describe defensive publications, explain how OIN can help in the process of creating them, and to encourage KDE developers to start filing them.

[Raffi Gostanian]

OIN has a broad mandate to create a patent "no-fly zone" around Linux, Gostanian said. It does that by purchasing patents which can be used by members. It entices companies to join by having interesting patents available. OIN focuses on particular segments of the market, like finance or automotive, to put together groups of patents in those areas. Companies can join OIN if they promise not to use their patents against Linux.

As part of the Linux Defenders project, OIN has also worked on various efforts to invalidate patents either before or after they issued (through the Peer to Patent projects). That depends on finding prior art that shows the idea was not new at the time of the application. Defensive publications are a way to codify ideas from the free and open source software world that could be used to reject patent applications.

Defensive publications don't have to be about a program or something that has been implemented, they could just be an idea that someone has (or has had). Linux Defenders will publish the defensive publications in a database that can be searched by patent examiners (on ip.com) so that patent applications will be stopped from proceeding, he said.

Patent suits

There have been lots of patent suits already, and there will be more. Typically, it is not a single patent that is used in lawsuits, but a cluster of related patents. There are entities out there antagonistic toward Linux, "we know that", Gostanian said. It is difficult for some to see how they can "compete with free", so they turn to the court system. One way to try to combat open source is to claim that "you get what you pay for", but Android serves as a counterexample, which is part of why it attracts so much lawsuit attention.

Gostanian pointed to the lawsuit filed by Microsoft against Barnes & Noble for its Android-based Nook tablet as an example of the tactics used. There was a "war of words" between the companies about the suit, but in the end, Barnes & Noble ended up in a relationship with Microsoft. He is certain that the next Nook will not be running Android, which shows that sometimes suits are filed to force outcomes other than one side paying patent royalties.

Another example he cited was Microsoft's FAT patent (really a patent on a way to have long file names in VFAT), which was applied for in 1996 and granted in 1998. It went through the reexamination process, which upheld the patent; so did several courts, which emboldened Microsoft, who went on to use it for patent aggression.

But, more recently, an administrative law judge for the US International Trade Commission (ITC) ruled that the patent is invalid based on a 1992 post by Linus Torvalds. In that note to comp.os.minix, Torvalds essentially describes the patented technique as an idea for the Minix filesystem. It was not implemented, but just the description of the idea was enough for the judge.

Had the patent examiner known of that, it is likely the patent would not have been granted. In that case, there would have been "less FUD around Linux" and the lawsuits would not have happened, Gostanian said.

Attacking the root cause

When you have a problem—bad software patents being issued for example—you want to look for a root cause. In this case, he asked, is the real problem that people are filing for junk patents? That is sometimes true, but in most cases people think they are describing something innovative.

It takes three or four years after an application is filed before a patent examiner looks at it; when they do, they have around eight hours to consider the patent. That is the total amount of time they can spend, which includes some amount of back and forth with the filer. There are things they rely on to try to find prior art, but that generally does not include scientific publications, technical reports, conference proceedings or talks, blog postings, mailing lists, and so on. The examiners don't have a lot of time, and it would be difficult for them to find important prior art in that time. The idea behind defensive publications is to make it easier for them to do their job, Gostanian said.

[Raffi Gostanian]

Defensive publications are "in a sense" the "anti-patent", he said. By taking various concepts and ideas that have already been "invented" and making them easier to find, those ideas can get in front of the examiners during the application process. Linux Defenders will be working with teams or individuals to help them prepare defensive publications to be filed at ip.com. The filing cost will be picked up by Linux Defenders as well. The idea is that it is easier (and cheaper) to invalidate a patent before it is granted rather than doing so once it has been.

Unlike patents, defensive publications are fairly simple documents. Typically one or two pages of text is all that is required, along with a figure that describes the interaction between the components. It "could literally take two hours" to create one, he said. The short length is helpful to the examiners who don't want to wade through a long description. Defensive publications can certainly be longer than a page or two, but they don't need to be, he said.

Gostanian encouraged anyone in the audience with an "idea that you think is cool" or one that they are enthusiastic about to consider defensive publication. It doesn't matter if the idea has ever been implemented, and additional defensive publications can be made on newer iterations of the idea, each of which has the potential to stop patents.

It costs nothing to do a defensive publication as OIN will pay for filing those that come in via Linux Defenders. In addition, Linux Defenders will review and assist in writing the document. "Whatever's necessary, we'll do", Gostanian said, to assist in getting more defensive publications filed.

Creating defensive publications is something concrete that developers or projects can do to help fight bad patents. Each submission to ip.com gets a public number assigned to it that can be used in resumes or CVs to publicize one's involvement, though defensive publications can also be submitted anonymously as well. While there are lots of different opinions about the patent system, it isn't going away anytime soon—if ever—so this is one thing that can be done to reduce patent problems in the interim.

There are tens of thousands of defensive publications in existence already, Gostanian said in answer to a question from the audience, though most have not come via OIN. This is the first conference he has spoken at to publicize the effort, though he will be speaking at GUADEC and other team members will be attending COSCUP to help bring the message to developers at those events. While "tens of thousands" sounds big, he said, they are scattered around in many different technical fields. Linux Defenders would like to see a more concentrated effort in the areas surrounding Linux and open source, eventually resulting in tens of thousands being filed via the project.

[ The author would like to thank KDE e.V. for travel assistance to Tallinn for Akademy. ]

Comments (21 posted)

Akademy: Plasma Active and Make Play Live

By Jake Edge
July 4, 2012

Creating an open device is a difficult challenge; the software is (mostly) there, but the hardware is a different story. Aaron Seigo has been working on the Vivaldi tablet as part of the Make Play Live effort and reported on some of the hurdles that have been encountered trying to produce the device at Akademy. There are lots of pieces that go into such a device, so finding a combination that works and can be sold is a non-trivial task.

Plasma Active

[Aaron Seigo]

Plasma Active—the touch-enabled version of KDE's Plasma environment—came out of a discussion that various people working on Plasma had about the technology and asking "where do we go from here?", Seigo said. KDE has a desktop suite, with office, email, and many other applications, but "is that all we want to do?". To he and others, it felt like KDE was treading water, but the discussions made it clear that some in the project were not happy with just that.

He believes very strongly in freedom and technology, and he started looking beyond the desktop and laptops where KDE has traditionally been focused. When you look at mobile devices, set top boxes, and other systems like those, you don't see the freedom and openness that we have come to expect. There is an inherent need for some humans to hack, but devices are "increasingly not places where you can hack, unless Apple says you can".

There is this idea that a "tablet is a tablet, a laptop is a laptop", he said, but that is "increasingly silly". There is a continuum of devices, without sharp divisions between them. We have started to see others picking up on that, and releasing hybrid devices recently, like a media center that is controlled by a tablet or phone, tablets with keyboards, and so on.

So, Plasma Active arose out of Plasma Netbook and Plasma Desktop. It provides one KDE and Qt-based technology that can be used across all kinds of devices. The difference between Plasma Active and the netbook/desktop versions is 10-15,000 lines of code—out of a code base of some third-of-a-million lines. So there are "tiny differences" between the two, and things written for one will work on the other. This is a "compelling reason" to use Plasma Active on all these different kinds of devices, Seigo said.

Android: best friend and worst enemy

Seigo cited Android as the "best friend and worst enemy of open devices". It uses the Linux kernel, and it is great that there are so many devices out there running Linux. But Google does no GPL enforcement, which results in mostly binary-only devices. For device manufacturers, getting Android to boot is the end goal so that the device can be sold. Once they can deliver a working binary kernel to their customers, they are done.

We are dealing with cultural and business barriers when trying to deliver open devices, he said. All the manufacturing for these devices is in China, increasingly the design is being done there as well. To build open devices, you must work with the cultures of Asia, but most KDE developers are based in Europe or North America and are not familiar with those cultures. In addition, the manufacturers are "all about volume" and, at least so far, open devices are not selling in quantities that make them interesting. He is not just reporting these problems "because it sucks", he said, but because "I think there are things we can do about it together".

He asked: Can we overcome these problems? His experience shows that it is "a very big mountain to climb", but it is something that the community has to take on itself. These problems are not something that big companies are interested in, so "we need to take our destiny in our own hands".

From an implementation standpoint, there are three pieces of software that need to work well together. The kernel, which needs a user space that works with it, and a "human experience" on top of that. Seigo uses the term "human experience" rather than "user experience" because "humans are not users", he said.

As a community, KDE does the human experience part, and there are folks in the project with some experience in the other two. Seigo asked how many in the room had written a kernel module and got a few raised hands. "We need you guys", he said, and asked that they bring their friends. These days, user space is tightly coupled to the kernel, he said, so the two need to be in sync.

Plasma Active itself is ready to go to provide the human experience; version 3 will be released in the next few months. One recent addition is a "nice touch-friendly file manager", Seigo said, and Plasma Active is more than just a desktop shell. It is enabling other applications, like Calligra and Marble, to work well on touch devices. In addition, a recent two or three day effort turned Okular into a touch enabled e-book reader using QML.

Lots of code has been taken from the desktop for Plasma Active, but there are parts that will flow back to Plasma in the future as well. Many KDE applications can be made touch-friendly relatively easily, he said, and developers of any applications that might ever run on a tablet, phone, set top box, etc. should be thinking about that. When he hears application developers talking about separating the business logic from the presentation, that's a sign things are headed in the right direction

Vivaldi progress

[Aaron Seigo]

Seigo and a partner started Make Play Live (MPL) to create "ethically correct devices" that are hackable. A business ecosystem has also been built around it to support the effort. The plan is to create a tablet called Vivaldi, but there have been some problems along the way.

Seigo held up the tablet, noting that it was the second revision of the hardware that was received from MPL's hardware partner. Using that hardware, the company got 98% of the way there, he said, and were demonstrating the device widely. It just needed a "little more polish" before it was ready to ship. Then, the third revision of the hardware arrived.

The new hardware looked identical—on the outside—but was "completely different" internally. MPL found out about the changes after the fact, and was not able to provide input into the new design. Because the volume of devices that MPL could promise to sell was fairly low, the manufacturer had little interest in consulting or even notifying the company about the changes.

The earlier revision had been running a modified Mer user space atop the Android kernel distributed by the manufacturer, but that no longer works on the most recent hardware revision. There is a "solution in the pipes" to that problem, Seigo said, but that set Vivaldi back. The device manufacturers don't really want to invest in Linux per se, but want to focus on Android, which is a different thing.

In the Q&A session, Seigo further explained some of the problems that MPL had run into. Unless it can promise a quarter of a million (or some other six-digit number) of units, MPL won't be able to get any input into the process. "Our order is a rounding error" on the total number of units the device manufacturers are targeting. He certainly doesn't blame the hardware companies as they are focused on their bottom line. It would be great if MPL (or open devices in general) could rely on large companies to take the baton and dangle that kind of volume in front of the manufacturers, but that doesn't seem likely.

Part of the problem is that there is "little respect for the GPL" in Asia, Seigo said. When you ask for the source to the kernel for a device, you first get pointed at kernel.org. Once you make it clear that there is more needed, you will get a tarball with "amazing stuff", some of which has nothing to do with the device in question. Comparing that to what's running on the device shows differences, so you have to ask: "Now can we really get the source?". There is also often resistance from the hardware salespeople to the whole idea of getting the source as they think the company will go bankrupt if they give it away. When setting out on this task, Seigo said that he had no idea "how hard it would be to get GPL source" from the vendors.

The MPL partner network consists of nine companies so far who are concentrating on various pieces of the problem, like human experience or device integration. There is room for more hardware and software companies in that network, he said. If some other company were to come out with a an open device, he would see that as a success for the project, even though it might be a competitor to Vivaldi.

Human-centric experience

The MPL philosophy is one of "human-centric experience" rather than the "app-centric experience" offered by other mobile OS vendors (e.g. Google and Apple). Vivaldi and other MPL devices are meant to be usable from the outset and not require the purchase of a lot of apps. That limits the "app store story" a bit, but it makes for a more compelling device. When he puts a Vivaldi tablet into people's hands, they start immediately talking about how they want to use it, Seigo said.

He noted that while tech pundits have written off the tablet market as a two-horse race, they are not seeing the full picture because MPL devices will not be competing in the same space. "If tech pundits were food critics, they would be fired," he said. He likened the way pundits looked at things to a food critic who said that French food is just great, so "Italian food will never sell". Once devices start shipping, "we'll do just fine", because MPL is not competing in the same space as iPad and Android.

When asked about where an interested person might be able to find paying work on the MPL project, Seigo noted that some of the partners have been employing people to work on it, as has his company, Coherent Theory LLC. That model is not sustainable in the long term, but once devices start shipping, there will be more money available for that kind of thing. There are volunteers as well, of course; "not everything is done for money", he said.

Enlisting aid from KDE developers and other interested people was one of the themes of Seigo's talk. Much has been accomplished, but there is lots still to be done. MPL needs "more people who care" to "join us and make this a reality". He and others are committed to making open devices available, with some help they can get there faster.

[ The author would like to thank KDE e.V. for travel assistance to Tallinn for Akademy. ]

Comments (21 posted)

Updates and an announcement from LWN

By Jonathan Corbet
July 4, 2012
LWN prefers to report news from the Linux development community over news about itself. But there have been some requests recently for a status update. Beyond that, we have some important news to pass on to our readers. So please bear with us for a brief exercise in journalistic self-examination.

Toward the beginning of this year, we announced our desire to bring in another author/editor with the goals of making the operation more robust and, eventually, expanding our content mix. That process seemingly came to an end with our announcement that Nathan Willis was joining the staff at the end of April. That whole process has gone better than expected, and LWN is better for it. But there is a part of that story that we have not been able to tell until now.

We had a surprising number of strong candidates for the position at LWN. In the end, it came down to two people, either of whom would have been an outstanding addition to LWN's staff. After agonizing over the decision for a while, we realized that the skills of the two candidates complemented each other nicely and that what we really needed to do was to hire both of them. Causing that to happen took a while — our second candidate is a busy person who needed some time to make a change — but things are finally falling into place.

Thus, we are pleased to announce that our other new editor will be Michael Kerrisk. Michael describes himself this way:

Michael is a software engineer, writer, and trainer who started using UNIX in 1987, and Linux in the late 1990s. Since 2004, Michael has maintained the Linux man-pages project and has been one of its most prolific contributors. He is also the author of "The Linux Programming Interface" (see Jake's review). Michael is a New Zealander, based in Munich, Germany.

We have big plans for Michael; he'll be supplementing our kernel-oriented coverage and helping us to expand it in a number of related areas including, possibly, embedded systems and software development. Expect to see his work showing up on LWN's pages later this month.

This move is a bit of a risk on everybody's part for the simple reason that LWN's current cash flow is not sufficient to carry two new editors. The good news is that we have been able to set aside some reserves over the last couple of years, so we have plenty of time in which to ramp things up and get back to a sustainable operating condition. Getting there will definitely require that we find ways to increase our subscriber base, though.

We have a number of ideas for how that might be achieved. An expanded and broader content mix, we hope, will appeal to a wider range of readers. LWN's "new site code" just celebrated its tenth anniversary; it's no secret that it could use updating in any number of ways. We need to find ways to provide additional value to the subscribers who keep us going. There are some interesting related ideas that we wish to pursue, once time allows. And we could maybe even try actively promoting the site rather than just sort of hoping that readers will find and appreciate us.

Certainly something needs to be done. In the last two years, the number of individual subscribers has leveled out and even declined slightly—not the sort of trend we were hoping to see. Group subscriptions have been a little more robust, fortunately. Special thanks are due to our "Supporter" subscribers who exist in sufficient numbers to make a real difference. Supporters: none of you have yet exercised your unique privilege to have the beverage of your choice at LWN's expense at any conference where we are present; we may yet find ourselves having to resort to sending you yet another laptop bag instead.

If we have learned anything over the years, it's the nature of businesses that something always needs to be done. It's a rare business that just generates the money needed to sustain it without constant adjustments. It has been almost exactly ten years since we posted The end of the road, wherein we explained our conclusion that the time had come to shut LWN down. Things have improved a lot since then. We are confident that, if we think and work hard toward the creation of a site that brings more value to our readers, things will continue to improve.

LWN's greatest strength is one of the best reader communities out there. We do not thank you all anywhere near often enough. But we'll say it now: thanks for your solid support for this site since its beginning in 1998; we wouldn't be here without you. And we are very much looking forward to making LWN better in the coming months—stay tuned!

Comments (102 posted)

Page editor: Jonathan Corbet

Security

Can FreedomBox be an alternative to commercial home routers?

By Nathan Willis
July 4, 2012

Recent actions by network hardware behemoth Cisco have irked a number of people who feel that the company is not respecting its customers' privacy. In response, members of the FreedomBox project have begun discussing whether the freedom-protecting device could adequately serve as a home router replacement. Such a move would mark a slight shift in focus for the project, but it may enable FreedomBox to offer the best alternative for those concerned over remote spying and other privacy threats.

Cisco raised the ire of online privacy advocates in June when it rolled out "Cisco Cloud Connect," a cloud-based configuration and management system for recent Linksys WiFi routers. The terms of service specifically state that Cisco may record users' Internet history (among several other types of information the service will track). In addition, the new cloud-based service was deployed to existing consumers' devices without their prior notice or consent. Device owners were first made aware of the change when they attempted to log in to their routers' web administration interfaces and could not — with a message instructing them to go register for the new cloud service instead.

Bloomberg reported on a response from Cisco's home networking chief, who said that the company was "absolutely not tracking Internet history, nor do we intend to" and chalked the issue up to "unclear" wording. Cisco has subsequently altered the wording in question, which now says that "usage" information is only associated with a randomly-generated ID number controlled by the device owner. The new wording also explains how consumers (including those whose devices have already been "upgraded" to the new cloud service) can opt-out of the service and revert to the old administration interface — by calling a Cisco telephone support number.

But that may not be enough to mollify privacy advocates. After all, court orders, warrants, or other means could force Cisco to reveal its stored information to other parties, at which point device owners have to trust that the randomly-generated ID is truly untraceable. Admittedly, the ISP has access to the same information, but replicating it elsewhere still makes one more vulnerable, not less. Add to that the fact that Cisco reserves the right to unilaterally modify its terms of service whenever it feels like it, and giving someone else control over one's router may not sound like a good trade-off just for the convenience of managing it through The Cloud.

Whither FreedomBox?

That chain of events led Sean Alexandre to write to the FreedomBox discussion list and ask whether or not serving as a home gateway router should be a target for the first stable FreedomBox release:

I remember from Eben's original talk on FreedomBox he described it as something people would use to replace their home wireless routers. They go to the store to buy a new wireless router, and buy a FreedomBox instead of a WeSpyOnYouBox.

FreedomBox, of course, is an effort to develop a "personal server" image that delivers secure, privacy-respecting software for common applications like email, social networking, and media delivery. Eben Moglen kickstarted the project in 2010, and the initial target hardware was so-called "plug computers." Thus, Alexandre's proposal does represent a shift in emphasis: although some routing tasks (such as firewalling) have been discussed, serving as a router-replacement or wireless access point has not been prominent on the development roadmap.

But replacing a WiFi router would be a useful, well-defined use case, he suggested, and allow the project to roll a usable release "sooner rather than later." Later releases of the software could add additional functionality. The practical problem, he said, was whether or not FreedomBox's Debian base could be made to run on home wireless router hardware with the features most consumers expect.

Alexandre's router-first concept would give FreedomBox an attainable goal, which would benefit the project. After all, despite its clout and technical prowess, the project is still a considerable ways from delivering the end goal of a plug-and-play email and cloud-computing experience with GnuPG-hardened encryption — not because the project isn't up to the challenge, but because of the sheer size of that challenge. FreedomBox developers are hard at work on a number of difficult problems, such as enabling two firewall-protected boxes to locate each other and establish a connection (the project's solution piggybacks on the Tor network). Rolling a routing-centric release would raise the project's profile while permitting development to continue.

The software angle

The FreedomBox distribution is intended to run on a range of hardware, and the project elected to build it on top of Debian in order to provide broad compatibility (among other goals). Clearly Debian itself is more than capable of serving as a NAT gateway, router, and firewall. But there are other considerations that might make building a router-centric FreedomBox release more difficult.

For starters, network configuration for a plug-and-play box needs to be straightforward, and ideally provide a working "first run" experience. Even the aftermarket router firmware projects (such as OpenWRT) struggle to make configuration simple, and FreedomBox strives to eventually enable the user to configure all sorts of additional services — some of which require tasks like key generation. The project has yet to select a configuration system; OpenWRT's Unified Configuration Interface (UCI) seems like a natural choice for the router use-case, but it may not extend easily to FreedomBox's other applications.

A separate issue, raised on the list by Jonathan Wilkes, is whether ISPs will allow users to bring their own routers. Some service providers rent wireless routers to customers, others supply their own devices (which do NAT and firewalling) that are combination units with DSL or cable modem functionality built in to a wireless router. In both cases, the area of concern is that the ISP requires that their device be the one doing NAT. A double-NAT configuration might be possible, but would not be simple to configure or troubleshoot. As Wilkes put it, such a departure from the plug-and-play server concept is more complicated from a user's point of view:

I think from the user perspective, plugging in a FB _behind_ what their ISP already has installed is way easier to set up and immediately start using, but less powerful (I'm thinking of the setup discussed recently where it's basically piggybacking over Tor make connections). Of course replacing one's router with a FB-- if there isn't a double-NAT-- opens up many more possibilities for what you can do with it.

Maybe the best of both worlds would be to make the UI for the easy solution (i.e., FB behind the router), at least initially. Even though it's less power for the non-techie user, it's less potential frustration. (A FB that the user can't get working certainly won't improve their privacy.)

In the ensuing discussion, the big unknown remained that no one has adequate data on which ISPs (or what percentage of all ISP users) face such restrictions. But then again, ISP restrictions are not a new problem for FreedomBox; the project has always been interested in running its own services, which inherently involves making incoming connections accessible from the outside — and which many ISPs frown upon.

The hardware angle

The other challenge to deploying FreedomBox on a home router is the availability of suitable hardware at an affordable price point. For the plug-and-play server design, there are a number of inexpensive plug computer options already known to the project. But few of them offer multiple network interfaces, which is a necessity for routers.

On the other hand, the aftermarket router firmware community typically must maintain multiple builds targeted at individual products, in order to cope with peculiarities of design (such as the vendor changing the internal flash memory without changing the model number) and with binary-blob drivers. Consequently, getting Debian to run on a commercially-available router is likely to prove difficult. Alexandre noted that Debian already runs on some Linksys routers, but with major caveats: "The wireless driver is a binary kernel module (first problem), and it needs a 2.4 kernel (second problem.)"

A third possibility he discusses is ALIX boards, which are low-power x86 devices available in several configurations, including some with multiple network interfaces. There is an active Debian port to the ALIX, although Alexandre admitted he was unsure if it was free of binary-only drivers.

The proposed router-centric milestone release is still an ongoing discussion topic at FreedomBox. As the Cisco incident reveals, there is clearly a need for a privacy-and-freedom-respecting router. OpenWRT and similar projects are decent options for those comfortable flashing the firmware and voiding their warranty, but those projects can never provide an out-of-the-box experience. Taking on that challenge may be too far afield for FreedomBox, though. It is at least feature-creep, which is generally taken to be a bad thing. But it may be a more attainable target, in which case it could do a lot to attract new talent to the FreedomBox project, which would be a win in the long run.

Comments (25 posted)

Brief items

Security quotes of the week

Virology is not computer science. A biological virus is not the same as a computer virus. A vulnerability that affects every individual copy of Windows is not as bad as a vulnerability that affects every individual person. Still, the lessons from computer security are valuable to anyone considering policies intended to encourage life-saving research in virology while at the same time prevent that research from being used to cause harm. This debate will not go away; it will only get more urgent.
-- Bruce Schneier

Considering that the members of the security disclosure list are public (http://www.xen.org/projects/security_vulnerability_process.html) and considering that some of them are service providers, if I am a [customer], why would I ever choose a provider that is not in that list?

Having that list on the website is like writing: "please choose one of the providers in the list below as they have a better security response".

-- Stefano Stabellini (Thanks to George Dunlap.)

To defend against hackers, filtered computers are standard in the government, but they are problematic for officials who are trying to discover dishonest activity on the Web; it's a bit like telling a cop he can't patrol in high-crime neighborhoods. A handful of unfiltered computers are available in restricted labs at the FTC's [US Federal Trade Commission] headquarters on Pennsylvania Avenue and its satellite offices on New Jersey Avenue and M Street, but this is an ungainly setup. Rather than leaving their office, waiting for an elevator, swiping their ID badges across a sensor at the lab's locked door and logging into a computer soaked with malware (because the lab computers are used to test suspicious applications and websites), the technologists have instead stayed in their office and tethered their personal laptops to their personal cellphones. The office does not have a window, and the cell signals are not strong; even by phone standards, their Web connection is slow.
-- Peter Maass at ProPublica

The [UK] Government has been forced to suspend an online consultation into pornography controls after a security breach exposed respondents’ confidential answers and contact details.
-- Nick Clark in The Independent

Comments (none posted)

New vulnerabilities

accountsservice: file permission bypass

Package(s):accountsservice CVE #(s):CVE-2012-2737
Created:June 29, 2012 Updated:April 8, 2013
Description:

From the Ubuntu advisory:

Florian Weimer discovered that AccountsService incorrectly handled privileges when copying certain files to the system cache directory. A local attacker could exploit this issue to read arbitrary files, bypassing intended permissions.

Alerts:
Ubuntu USN-1485-1 2012-06-28
Fedora FEDORA-2012-10120 2012-07-02
openSUSE openSUSE-SU-2012:0845-1 2012-07-06
Mageia MGASA-2012-0153 2012-07-10
Mandriva MDVSA-2013:060 2013-04-08

Comments (none posted)

bcfg2: code execution

Package(s):bcfg2 CVE #(s):CVE-2012-3366
Created:June 29, 2012 Updated:October 29, 2012
Description:

From the Debian advisory:

It was discovered that malicious clients can trick the server component of the Bcfg2 configuration management system to execute commands with root privileges.

Alerts:
Debian DSA-2503-1 2012-06-28
Fedora FEDORA-2012-10391 2012-10-28
Fedora FEDORA-2012-10402 2012-10-28

Comments (none posted)

boost: code execution

Package(s):boost CVE #(s):CVE-2012-2677
Created:June 28, 2012 Updated:March 22, 2013
Description:

From the Red Hat bugzilla:

A security flaw was found in the way ordered_malloc() routine implementation in Boost, the free peer-reviewed portable C++ source libraries, performed 'next-size' and 'max_size' parameters sanitization, when allocating memory. If an application, using the Boost C++ source libraries for memory allocation, was missing application-level checks for safety of 'next_size' and 'max_size' values, a remote attacker could provide a specially-crafted application-specific file (requiring runtime memory allocation it to be processed correctly) that, when opened would lead to that application crash, or, potentially arbitrary code execution with the privileges of the user running the application.

Alerts:
Fedora FEDORA-2012-9818 2012-06-28
Fedora FEDORA-2012-9029 2012-07-03
Mageia MGASA-2012-0151 2012-07-10
Red Hat RHSA-2013:0668-01 2013-03-21
CentOS CESA-2013:0668 2013-03-21
Oracle ELSA-2013-0668 2013-03-22
Scientific Linux SL-boos-20130321 2013-03-21
Mandriva MDVSA-2013:065 2013-04-08

Comments (none posted)

chromium: multiple vulnerabilities

Package(s):chromium, v8 CVE #(s):CVE-2012-2807 CVE-2012-2815 CVE-2012-2816 CVE-2012-2817 CVE-2012-2818 CVE-2012-2819 CVE-2012-2820 CVE-2012-2821 CVE-2012-2823 CVE-2012-2825 CVE-2012-2826 CVE-2012-2829 CVE-2012-2830 CVE-2012-2831 CVE-2012-2834
Created:July 3, 2012 Updated:September 26, 2012
Description: From the openSUSE advisory:

- Update Chromium to 22.0.1190

  • * Security Fixes (bnc#769181):
  • * CVE-2012-2815: Leak of iframe fragment id
  • * CVE-2012-2816: Prevent sandboxed processes interfering with each other
  • * CVE-2012-2817: Use-after-free in table section handling
  • * CVE-2012-2818: Use-after-free in counter layout
  • * CVE-2012-2819: Crash in texture handling
  • * CVE-2012-2820: Out-of-bounds read in SVG filter handling
  • * CVE-2012-2821: Autofill display problem
  • * CVE-2012-2823: Use-after-free in SVG resource handling
  • * CVE-2012-2826: Out-of-bounds read in texture conversion
  • * CVE-2012-2829: Use-after-free in first-letter handling
  • * CVE-2012-2830: Wild pointer in array value setting
  • * CVE-2012-2831: Use-after-free in SVG reference handling
  • * CVE-2012-2834: Integer overflow in Matroska container
  • * CVE-2012-2825: Wild read in XSL handling
  • * CVE-2012-2807: Integer overflows in libxml
  • * Fix update-alternatives within the spec-file
Alerts:
openSUSE openSUSE-SU-2012:0813-1 2012-07-03
Mageia MGASA-2012-0177 2012-07-21
Debian DSA-2521-1 2012-08-04
Mandriva MDVSA-2012:126 2012-08-08
openSUSE openSUSE-SU-2012:0975-1 2012-08-09
Mageia MGASA-2012-0213 2012-08-12
Gentoo 201208-03 2012-08-14
Red Hat RHSA-2012:1288-01 2012-09-18
CentOS CESA-2012:1288 2012-09-18
Oracle ELSA-2012-1288 2012-09-18
Oracle ELSA-2012-1288 2012-09-18
Scientific Linux SL-libx-20120918 2012-09-18
CentOS CESA-2012:1288 2012-09-20
Fedora FEDORA-2012-13820 2012-09-26
Fedora FEDORA-2012-13824 2012-09-27
Ubuntu USN-1587-1 2012-09-27
Mandriva MDVSA-2013:047 2013-04-05
Mandriva MDVSA-2013:056 2013-04-08

Comments (none posted)

gallery3: multiple vulnerabilities

Package(s):gallery3 CVE #(s):
Created:June 28, 2012 Updated:July 4, 2012
Description:

From the Gallery release notes:

After several extensive internal and external security audits which discovered 22 distinct vulnerabilities, we are releasing Gallery 3.0.4 as a security release. All of the issues require that someone with malicious intent either have an account with edit permissions, or trick a user with edit permissions into clicking on a malicious link. In most cases, this can only lead to a possible XSS vulnerability, but in several instances it allows arbitrary PHP code execution.

Alerts:
Fedora FEDORA-2012-9666 2012-06-28
Fedora FEDORA-2012-9705 2012-06-28

Comments (none posted)

gc: code execution

Package(s):gc CVE #(s):CVE-2012-2673
Created:June 28, 2012 Updated:October 3, 2012
Description:

From the Red Hat bug report:

A security flaw was found in the way malloc() and calloc() routines implementation of gc, a Boehm-Demers-Weiser conservative garbage collector, performed parameters sanitization, when allocating memory. If an application using the gc collector was missing application-level malloc() and calloc() routines parameters validity checks, a remote attacker could provide a specially-crafted application-specific input file that, when opened in that application would lead to application crash or, potentially, arbitrary code execution with the privileges of the user running the application.

Alerts:
Fedora FEDORA-2012-9637 2012-06-28
Fedora FEDORA-2012-9556 2012-06-28
Ubuntu USN-1546-1 2012-08-28
Mageia MGASA-2012-0249 2012-08-30
Mandriva MDVSA-2012:158 2012-10-03

Comments (none posted)

kvm: symlink attacks

Package(s):kvm CVE #(s):CVE-2012-2652
Created:July 4, 2012 Updated:August 10, 2012
Description: From the openSUSE advisory:

- fix vulnerability to temporary file symlink attacks in snapshot file mode.

Alerts:
openSUSE openSUSE-SU-2012:0832-1 2012-07-04
Mageia MGASA-2012-0185 2012-07-30
Ubuntu USN-1522-1 2012-08-02
Fedora FEDORA-2012-11305 2012-08-09
Fedora FEDORA-2012-11302 2012-08-09
Debian DSA-2542-1 2012-09-08
Debian DSA-2545-1 2012-09-08
SUSE SUSE-SU-2012:1202-1 2012-09-18
Gentoo 201210-04 2012-10-18
Mandriva MDVSA-2013:121 2013-04-10

Comments (none posted)

libapache-mod-security: cross-site scripting

Package(s):libapache-mod-security CVE #(s):CVE-2012-2751
Created:July 3, 2012 Updated:December 24, 2012
Description: From the Debian advisory:

Qualys Vulnerability & Malware Research Labs discovered a vulnerability in ModSecurity, a security module for the Apache webserver. In situations where both 'Content:Disposition: attachment' and 'Content-Type: multipart' were present in HTTP headers, the vulernability could allow an attacker to bypass policy and execute cross-site script (XSS) attacks through properly crafted HTML documents.

Alerts:
Debian DSA-2506-1 2012-07-02
Mageia MGASA-2012-0158: 2012-07-10
Mandriva MDVSA-2012:118 2012-07-27
Mandriva MDVSA-2012:182 2012-12-23

Comments (none posted)

libspring-2.5-java: information disclosure

Package(s):libspring-2.5-java CVE #(s):CVE-2011-2730
Created:June 29, 2012 Updated:August 20, 2012
Description:

From the Debian advisory:

It was discovered that the Spring Framework contains an information disclosure vulnerability in the processing of certain Expression Language (EL) patterns, allowing attackers to access sensitive information using HTTP requests.

Alerts:
Debian DSA-2504-1 2012-06-28
Mageia MGASA-2012-0217 2012-08-18

Comments (none posted)

libtiff: code execution

Package(s):libtiff CVE #(s):CVE-2012-2088 CVE-2012-2113
Created:July 3, 2012 Updated:July 20, 2012
Description: From the Red Hat advisory:

libtiff did not properly convert between signed and unsigned integer values, leading to a buffer overflow. An attacker could use this flaw to create a specially-crafted TIFF file that, when opened, would cause an application linked against libtiff to crash or, possibly, execute arbitrary code. (CVE-2012-2088)

Multiple integer overflow flaws, leading to heap-based buffer overflows, were found in the tiff2pdf tool. An attacker could use these flaws to create a specially-crafted TIFF file that would cause tiff2pdf to crash or, possibly, execute arbitrary code. (CVE-2012-2113)

Alerts:
Red Hat RHSA-2012:1054-01 2012-07-03
CentOS CESA-2012:1054 2012-07-03
Mandriva MDVSA-2012:101 2012-07-04
Oracle ELSA-2012-1054 2012-07-03
Oracle ELSA-2012-1054 2012-07-03
openSUSE openSUSE-SU-2012:0829-1 2012-07-04
Ubuntu USN-1498-1 2012-07-05
Scientific Linux SL-libt-20120705 2012-07-05
Mageia MGASA-2012-0137 2012-07-09
Scientific Linux SL-libt-20120709 2012-07-09
CentOS CESA-2012:1054 2012-07-10
Fedora FEDORA-2012-10081 2012-07-15
Fedora FEDORA-2012-10089 2012-07-15
SUSE SUSE-SU-2012:0894-1 2012-07-19
Gentoo 201209-02 2012-09-23
Debian DSA-2552-1 2012-09-26
Mandriva MDVSA-2013:046 2013-04-05

Comments (none posted)

nova: privilege escalation

Package(s):nova CVE #(s):CVE-2012-3360 CVE-2012-3361
Created:July 3, 2012 Updated:August 23, 2012
Description: From the Ubuntu advisory:

Matthias Weckbecker discovered that, when using the OpenStack API to setup libvirt-based hypervisors, an authenticated user could inject files in arbitrary locations on the file system of the host running Nova. A remote attacker could use this to gain root privileges. This issue only affects Ubuntu 12.04 LTS. (CVE-2012-3360)

Pádraig Brady discovered that an authenticated user could corrupt arbitrary files of the host running Nova. A remote attacker could use this to cause a denial of service or possibly gain privileges. (CVE-2012-3361)

Alerts:
Ubuntu USN-1497-1 2012-07-03
Fedora FEDORA-2012-10418 2012-07-19
Fedora FEDORA-2012-10420 2012-07-19
Ubuntu USN-1545-1 2012-08-22

Comments (none posted)

openjpeg: code execution

Package(s):openjpeg CVE #(s):CVE-2009-5030
Created:June 28, 2012 Updated:July 11, 2012
Description:

From the Red Hat bug report:

An out-of heap-based buffer bounds read and write flaw, leading to invalid free, was found in the way a tile coder / decoder (TCD) implementation of OpenJPEG, an open-source JPEG 2000 codec written in C language, performed releasing of previously allocated memory for the TCD encoder handle by processing certain Gray16 TIFF images. A remote attacker could provide a specially-crafted TIFF image file, which once converted into the JPEG 2000 file format with an application linked against OpenJPEG (such as 'image_to_j2k'), would lead to that application crash, or, potentially arbitrary code execution with the privileges of the user running the application.

Alerts:
Fedora FEDORA-2012-9628 2012-06-28
Fedora FEDORA-2012-9602 2012-06-28
Mageia MGASA-2012-0152 2012-07-10
Red Hat RHSA-2012:1068-01 2012-07-11
CentOS CESA-2012:1068 2012-07-11
Mandriva MDVSA-2012:104 2012-07-12
Oracle ELSA-2012-1068 2012-07-11
Scientific Linux SL-open-20120711 2012-07-11
Debian DSA-2629-1 2013-02-25
Mandriva MDVSA-2013:110 2013-04-10

Comments (none posted)

rubygem-actionpack: restriction bypass

Package(s):rubygem-actionpack CVE #(s):CVE-2012-2694
Created:July 2, 2012 Updated:August 21, 2012
Description: From the CVE entry:

actionpack/lib/action_dispatch/http/request.rb in Ruby on Rails before 3.0.14, 3.1.x before 3.1.6, and 3.2.x before 3.2.6 does not properly consider differences in parameter handling between the Active Record component and the Rack interface, which allows remote attackers to bypass intended database-query restrictions and perform NULL checks via a crafted request, as demonstrated by certain "['xyz', nil]" values, a related issue to CVE-2012-2660.

Alerts:
Fedora FEDORA-2012-9606 2012-06-30
Fedora FEDORA-2012-9636 2012-06-30
openSUSE openSUSE-SU-2012:0978-1 2012-08-09
SUSE SUSE-SU-2012:1012-1 2012-08-21
SUSE SUSE-SU-2012:1014-1 2012-08-21
SUSE SUSE-SU-2012:1015-1 2012-08-21
openSUSE openSUSE-SU-2012:1066-1 2012-08-30
Red Hat RHSA-2013:0582-01 2013-02-28

Comments (none posted)

rubygem-activerecord: SQL injection

Package(s):rubygem-activerecord CVE #(s):CVE-2012-2695
Created:July 2, 2012 Updated:August 21, 2012
Description: From the CVE entry:

The Active Record component in Ruby on Rails before 3.0.14, 3.1.x before 3.1.6, and 3.2.x before 3.2.6 does not properly implement the passing of request data to a where method in an ActiveRecord class, which allows remote attackers to conduct certain SQL injection attacks via nested query parameters that leverage improper handling of nested hashes, a related issue to CVE-2012-2661.

Alerts:
Fedora FEDORA-2012-9635 2012-06-30
Fedora FEDORA-2012-9639 2012-06-30
openSUSE openSUSE-SU-2012:0978-1 2012-08-09
SUSE SUSE-SU-2012:1011-1 2012-08-21
SUSE SUSE-SU-2012:1012-1 2012-08-21
SUSE SUSE-SU-2012:1014-1 2012-08-21
openSUSE openSUSE-SU-2012:1066-1 2012-08-30
openSUSE openSUSE-SU-2013:0278-1 2013-02-12
openSUSE openSUSE-SU-2013:0280-1 2013-02-12
Red Hat RHSA-2013:0582-01 2013-02-28
SUSE SUSE-SU-2013:0508-1 2013-03-20

Comments (none posted)

sticky-notes: multiple vulnerabilities

Package(s):sticky-notes CVE #(s):
Created:July 2, 2012 Updated:December 3, 2012
Description: Verson 0.3.09062012.4 fixes some security issues (Cross-site scripting and SQL Injections).
Alerts:
Fedora FEDORA-2012-9739 2012-06-30
Fedora FEDORA-2012-9714 2012-06-30
Fedora FEDORA-2012-18396 2012-12-01

Comments (none posted)

viewvc: multiple vulnerabilities

Package(s):viewvc CVE #(s):CVE-2012-3356 CVE-2012-3357
Created:July 4, 2012 Updated:July 23, 2012
Description: From the

Version 1.1.15 of viewvc contains a couple of security fixes.

The viewvc changelog has details.

Alerts:
openSUSE openSUSE-SU-2012:0831-1 2012-07-04
Fedora FEDORA-2012-9371 2012-07-11
Fedora FEDORA-2012-9433 2012-07-11
Mageia MGASA-2012-0175 2012-07-21
Debian DSA-2563-1 2012-10-23
Mandriva MDVSA-2013:134 2013-04-10

Comments (none posted)

vte: denial of service

Package(s):vte CVE #(s):CVE-2012-2738
Created:July 3, 2012 Updated:April 11, 2013
Description: From the Red Hat bugzilla:

A denial of service flaw was found in the way VTE, a terminal emulator widget, processed certain escape sequences with large repeat counts. A remote attacker could provide a specially-crafted file, which once opened in a terminal using the VTE terminal emulator could lead to excessive CPU consumption.

Alerts:
Fedora FEDORA-2012-9575 2012-07-03
Fedora FEDORA-2012-9546 2012-07-03
Mageia MGASA-2012-0163 2012-07-14
openSUSE openSUSE-SU-2012:0931-1 2012-08-01
openSUSE openSUSE-SU-2012:0933-1 2012-08-01
Mandriva MDVSA-2013:135 2013-04-10

Comments (none posted)

zendframework: information disclosure

Package(s):zendframework CVE #(s):CVE-2012-3363
Created:July 2, 2012 Updated:April 3, 2013
Description: From the Debian advisory:

An XML External Entities inclusion vulnerability was discovered in Zend Framework, a PHP library. This vulnerability may allow attackers to access to local files, depending on how the framework is used.

Alerts:
Debian DSA-2505-1 2012-06-29
Fedora FEDORA-2012-9979 2012-07-14
Fedora FEDORA-2012-9978 2012-07-14
Mageia MGASA-2012-0200 2012-08-06
Fedora FEDORA-2013-4387 2013-04-03
Fedora FEDORA-2013-4404 2013-04-03

Comments (none posted)

Page editor: Jake Edge

Kernel development

Brief items

Kernel release status

The current development kernel is 3.5-rc5, released on June 30. Linus says: "So nothing really worrisome in here. Despite the networking merge (which tends to be fairly big), -rc5 is a smaller patch than -rc4 was, even if there are a couple more commits in there. So things seem to be going in the right direction."

Stable updates: no stable updates have been released in the last week. The 3.2.22 update is in the review process as of this writing; it can be expected at any time.

Comments (none posted)

Quotes of the week

Please find large crayon and write on forehead "when fixing a bug, be sure to describe the end-user impact of that bug".
Andrew Morton

Fundamentally, 8k stacks on x86-64 are too small for our increasingly complex storage layers and the 100+ function deep call chains that occur.
Dave Chinner

Comments (4 posted)

Kernel patchwork returns

Kernel.org administrator John Hawley has announced that the kernel patchwork system is finally back on the air. "All the old user account still exist, though it is *HIGHLY* recommended that once you log in you change your password."

Full Story (comments: none)

A UEFI secure boot and TianoCore info page

James Bottomley has distilled his hard-earned knowledge of how to set up UEFI secure boot with QEMU and the TianoCore system and placed it into a web page. It has a lot of information for anybody needing to work in this area. "Intel has produced a project called TianoCore as an open firmware reference implementation of UEFI. One of the sub projects within TianoCore is OVMF which stands for Open Virtual Machine Firmware. It is OVMF that we are using to produce the virtual machine image for qemu that will run the UEFI secure boot environment. TianoCore secure boot is only really working as of version r13466 of the svn repository. This version has not yet been released as a downloadable zip file."

Full Story (comments: 2)

Kernel development news

Missing the AF_BUS

By Jonathan Corbet
July 3, 2012
The D-Bus interprocess communication mechanism has, over the years, become a standard component of the Linux desktop. For almost as long, developers have been trying to find ways to make D-Bus faster. The latest attempt comes in the form of a kernel patch set adding a new socket address family (called AF_BUS) to the networking layer. Significant performance improvements are claimed, but, like previous attempts, this one may have a hard time getting into the mainline kernel.

D-Bus implements a mechanism by which processes can send messages to each other. Multicast functionality is inherently a part of the protocol; one message can be sent to multiple recipients. D-Bus promises reliable delivery, where "reliable" means that messages arrive in the order in which they were sent and multicast messages will either be delivered to all recipients or, if that is not possible, to none. There is a security model built into the protocol whereby messages can be limited to specific recipients. All of these features are used by contemporary systems, which expect the system to be robust, secure, and with as little latency and overhead as possible.

The current D-Bus implementation uses Unix-domain sockets and a central routing daemon. It works, but the routing daemon adds context switches, overhead, and latency to each message it handles. The kernel is unable to help get high-priority messages delivered first, so all messages cause wakeups that slow down the processing of the most important ones; see this message for a description of how these problems can affect a running system. It has been evident for some time to the developers involved that a better solution must be found.

There have been a number of attempts in that direction. The previous time this topic came up, it was around a set of patches adding multicast capabilities to Unix-domain sockets. This idea was rejected with the claim that the Unix-domain socket code is already too complicated and there was not enough justification to make things worse by adding multicast capabilities. The D-Bus developers were told to simply use IPv4 sockets, which already have multicast support, instead.

What those developers actually did was to implement AF_BUS, a new address family designed to meet the needs of D-Bus. It provides the reliable delivery that D-Bus requires; it also has the ability to pass file descriptors and credentials from one process to another. The security mechanism is built in, with the netfilter code (augmented with a new D-Bus message parser) used to control which messages can actually be delivered to any specific process. The end result, it is claimed, is a significant reduction in D-Bus overhead due to reduced system calls; submitter Vincent Sanders claims "a doubling in throughput and better than halving of latency." See the associated documentation for details on how this address family works.

A factor-of-two improvement in a component that is widely used in Linux systems would certainly be welcome. The patch set, however, was not; networking maintainer David Miller immediately stated his intention to simply ignore the patch set entirely. His objections seem to be that IPv4 sockets are sufficient for the task and that reliable delivery of multicast messages cannot be done, even in the limited manner needed by D-Bus. He expressed doubts that the IPv4 approach had even been tried, and decreed: "We are not creating a full address family in the kernel which exists for one, and only one, specific and difficult user."

Vincent responded that a number of approaches have been tried and found wanting. IPv4 sockets cannot provide the needed delivery guarantees and do not allow for the passing of file descriptors and credentials. It is also important, he said, for D-Bus to be up and running before the networking subsystem has been configured; setting up IP interfaces on a contemporary system often requires communication over D-Bus. There really is no better solution, he said.

He found support from a few other developers, including Alan Cox, who pointed out that there is no shortage of interprocess communication systems out there with requirements similar to D-Bus:

In fact if you look up the stack you'll find a large number of multicast messaging systems which do reliable transport built on top of IP. In fact Red Hat provides a high level messaging cluster service that does exactly this. (as well as dbus which does it on the desktop level) plus a ton of stuff on top of that (JGroups etc)

Everybody at the application level has been using these 'receiver reliable' multicast services for years (Websphere MQ, TIBCO, RTPGM, OpenPGM, MS-PGM, you name it). There are even accelerators for PGM based protocols in things like Cisco routers and Solarflare can do much of it on the card for 10Gbit.

He added that latency concerns are paramount on contemporary systems and that one of the best ways of reducing latency is to cut back on context switches and middleman processes. Chris Friesen added that his company uses "an out-of-tree datagram multicast messaging protocol family based on AF_UNIX" that could almost certainly be replaced by something like AF_BUS, were AF_BUS to be added to the mainline kernel.

There have been various other local messaging patch sets posted over the years. So it seems clear that there is a significant level of interest in having this sort of capability built into the Linux kernel. But interest alone is not sufficient justification for the merging of a large patch set; there must also be agreement from the developers who are charged with ensuring that Linux has a top-quality networking stack in the long term. That agreement is not yet there, so there may be a significant amount of multicast interpersonal messaging required before we have multicast interprocess messaging in the kernel.

Comments (46 posted)

Better documentation: the window of naive interest

July 3, 2012

This article was contributed by Neil Brown

Sometimes a casual comment can capture your imagination and not let go until you do something with it. So it was for me with a comment made by Heikki Orsila on some observations that Greg Kroah-Hartman made about documentation in the Linux kernel tree:

Greg, the documentation is very bad.

The specific documentation that he or Greg were thinking of may well be very different from the specific documentation that my thoughts turned to, but as both a producer and a consumer of some parts of linux/Documentation I can at least agree that some of it isn't very good.

Heikki continued: "Linux is badly documented, but so what?" and often that would have been the end of it. But we are an open development community where putting up with mediocrity is neither necessary nor encouraged. If things are broken then there is always the possibility of fixing them if only we know how. How, then, can we fix the documentation?

"Documentation" is a broad category and I would like to start by narrowing our focus a little and excluding reference documentation from consideration. By this I mean documents used by a person knowledgeable in the subject who needs to clarify some detail such as the arguments to some function or the required ordering between two locks. For these details the source code is by far the best resource - as it cannot get out-of-date - and, when the code itself is not sufficient, placing the documentation in the source code will provide the greatest likelihood of it being found, read, and kept up-to-date. It doesn't really belong in a separate Documentation directory.

The class of documentation that is of interest is documentation for the new developer, not necessarily new to development but new to a particular project or subsystem. Such a developer combines a lack of knowledge with a genuine interest and this is a combination that is not stable: if one component does not disappear soon, the other is likely to. The task of good documentation is to ensure the lack of knowledge disappears before the interest.

I was exposed to this instability when trying to understand some details of power management in Linux. The documentation simply didn't help and I had to look elsewhere. However when I went back to assess the documentation while preparing for this article I discovered that it wasn't as bad as I remembered. I now had enough experience that it all made sense. The paucity of the documentation was now only in my memory and I couldn't be sure I had given that documentation a fair trial. The temptation to just move on might have won had Heikki Orslia's observation not encouraged me on.

To understand what makes good documentation we need to mine the experiences from that short window of naive interest to find out what works and what doesn't. A question that seems most suited for digging is "What were you looking for that you didn't find". I'm sure my kind reader will have their own answers to offer, but here are three that I have found on my travels.

Wire-frame outlines

When I first went to the Linux power management documentation I was after a "big picture" understanding. I wanted more detail than "this code manages power" but not quite "these are the entry points that a driver must provide". I wanted to know what the important parts were and, significantly, how they connected together and impacted each other. I picture this as a collection of key concepts together with the linkage between them. These are nodes and edges in a graph, entities and relationships, or for the more spatially oriented, vertices and edges of a wire-frame polyhedron. This gives the shape of the project without getting bogged down in details.

For me it is vital to have this framework first as I can only take in and retain new details if I have something to attach them to and a place to attach them. Without it I'll either attach new ideas to the wrong place, or forget them completely - which is probably the safer of the two.

The image of a wire-frame is a little misleading as it presents all vertices as of equal value and this is rarely the case. Some concepts are bigger and should be named and described first. Others can come later. So maybe a ball-and-stick model might be a better picture, with big and small balls, joined by thin and fat sticks.

In the case of Linux power management, one key concept that gives shape to the whole is the number of multiplicities: there are multiple sequencing states when moving away from or towards full functionality, multiple power saving approaches such as runtime, suspend, and hibernate, and many multiple different sorts of devices that need to fit into the frameworks. Another concept, already hinted at but often recurring, is that there is generally one "fully functional" state but several "low power" states, where moving between two low power states involves returning to fully-functional and then reducing power a different way.

Why, not what.

"Swap over NFS" is a set of functionality that some people find valuable, but is not at all straightforward to implement. There is a need to avoid deadlocks in memory management, and to do so without slowing down either the networking code or the memory allocation code, both of which are quite performance sensitive. There is a set of patches which provides this functionality but getting it ready for mainline inclusion has been a slow process.

Andrew Morton was recently good enough to provide some review of these patches and, while reading the commit-log entries and code comments is a little different from seeking out more coherent documentation, it does provide a good window into the thoughts of someone who, while generally knowledgeable, is both new to the project specifics and still interested. It can thus answer the question "what were you looking for that you didn't find?".

One observation that he made repeatedly is most clearly embodied in

The comment should explain "why", not "what". Particularly when the "what" was bleedin obvious ;)

or more humorously in: "s/"what"/"why"/ !".

Documenting what a function does is very important in closed-source projects, but less so in open source where the code can be directly read. Of course if the code is long and complex it might be easier to read some documentation, however the effort of writing the documentation might be better spent in breaking up the code and making it more readable.

Documenting why is much more valuable, whether it is "why do it this way" or "why even do this at all". The "why" of a project is rarely explicit in just one place of the code. Rather it permeates throughout and can touch various fragments in different ways. Sometimes the "why" is not technical at all but is historical, cultural, or simply subjective. In these cases it really cannot be extracted by reading the code and must be documented, or lost.

Were I to properly document the Linux "md" driver, for which I get occasional requests, I would need to explain its relationship with "dm" - for it isn't only internal edges of our wire-frame that are interesting, but also external edges. The "why"s here are mostly historical accident, though there would be value in observing that "md" focuses on reliability through redundancy, while "dm" focuses more on flexibility by hiding all the other restrictions imposed by storage hardware. This, I think, gives the "why" for continuing to have two separate frameworks, even if it isn't a strong technical justification.

To continue with the analogy of the wire-frame model, if the concepts and relationships provide the shape of the model, then the "Why"s provide the fabric that they give shape to. They are the substance that gives purpose and the force that gives direction. They may not always be visible, especially once we put some skin on our model, but understanding them is key to understanding the whole.

Examples, examples, examples.

One of the documents that I maintain is the set of manual pages for mdadm. I recall some years ago being challenged that there weren't enough examples in that documentation. At the time I didn't really know what to do with the challenge as, after all, there was an "Examples" section at the end of the man page and there was plenty of explanatory material from which you can deduce your own examples. Though I didn't give it much attention then, this challenge clearly stuck in my mind even to today and on reflection I now think quite differently to how I thought then. Examples matter.

For those of us who enjoy binary taxonomies, there are two sorts of reasoning processes: deductive and inductive. These are described in various ways in the literature. One that is particularly succinct and helpful is from Naked Science which describes the distinction as:

Deductive reasoning arrives at a specific conclusion based on generalizations. Inductive reasoning takes events and makes generalizations.

In the context of documentation, reasoning is the process of turning the words in the document into a model in your mind. Different people appear to vary in which style of reasoning they are most comfortable with, so good documentation must attempt to play to both styles.

Documentation that plays to deductive reasoning will be filled with generalizations. This doesn't mean that it avoid details (as generalities would) but that it attempts to describe exactly - in complete generality - what each interface does, or how each concept applies, or what role each interaction plays. Such documentation can be very useful, but is can also lead to a feeling that you are drowning in detail. It can be a challenge to extract meaning and importance from such details. A lot of technical documentation seems to tend to this extreme.

Documentation that plays to inductive reasoning will be full of examples of specific cases. It may explain each case very well but the coverage of the cases can never be perfect and it will inevitably leave out some information, typically the particular information that the reader is looking for. "How-tos" are a good example of this sort of document with maybe the extreme case being recipe books for cooking - they are full of sample recipes with very little space dedicated to explaining what makes a good recipe. These are very good if they chose just the right example, fairly good and quite accessible if they have chosen a good variety of examples, but usually lacking when you want to get down to the nitty-gritty.

Documentation that plays to both types of reasoning will mix examples in with the generalizations, using them to embellish and explain those generalizations and as an excuse to make diversions into tangentially related topics. Examples are particularly good at highlighting contrasts which are themselves an important part of describing key concepts and clarifying why choices are made. The various multiplicities noted for Linux power management can doubtlessly provide lots of contrasts such as that between a "UART" serial driver that must be ready to receive full-rate data whenever it is not off, as opposed to a "USB" serial driver which only needs to be able to respond to a wake-up signal and has plenty of time to prepare itself for full data-rate messages. These would necessarily make different decisions about allowable power states.

Returning to our wire-frame model which gives shape to some substance, it hopefully is not too much of a stretch to see examples as the skin on the model. These are the bits we can directly see, they reveal the texture or taste of the whole, and only hint at the bigger picture behind them. But they are an important part in closing the gaps that are left out of the big-picture descriptions.

A worked example?

Having all these goals for introductory documentation may be nice, but are they actually useful? Can they lead to truly "good" documentation? Clearly they are not enough by themselves, but when combined with enough knowledge and experience, with some story-telling ability and an occasional touch of humor I believe that they can. To put this to the test, I've used them as a guide to producing some introductory documentation on Linux power management. The results will be presented next week when you, dear reader, can be the judge of whether the resulting documentation is actually "good".

Comments (10 posted)

Leaping seconds and looping servers

By Jonathan Corbet
July 2, 2012
As most of the net is likely to have heard by now, Linux servers displayed a notable tendency to misbehave during the leap second event at the end of the day on June 30. The problem often presented itself as abrupt and sustained load spikes on the affected machines. The bug that caused this behavior has been tracked down (thanks to a determined effort by John Stultz); a look at what happened shines an interesting light on the trickiness of dealing with time in software systems.

The earth's rotation is slowing over time; contrary to some public claims, this slowing is not caused by Republican administrations, government spending, or proprietary software. In an attempt to keep the official Coordinated Universal Time (UTC) in sync with the earth's behavior, the powers that be occasionally insert an additional second (a "leap second") into a day; 25 such seconds have been inserted since the practice began in 1972. This habit is not without its detractors, and there are constant calls for its abolition, but, for now, leap seconds are a reality that the world (and the kernel) must deal with. For the curious, the Wikipedia leap second page has more detail than almost anybody could want.

The kernel's core time is kept in a timespec structure:

    struct timespec {
	__kernel_time_t	tv_sec;			/* seconds */
	long		tv_nsec;		/* nanoseconds */
    };

It is, in essence, a count of seconds since the beginning of the epoch. Unfortunately, that count is defined to not include leap seconds. So when a leap second happens, the system time must be explicitly corrected; that is done by setting the system clock back one second at the end of that leap second. The code that handles this change is quite old and works pretty much as advertised. It is the source of this message that most Linux systems should have (in some form) in their logs:

    Jun 30 19:59:59 dt kernel: Clock: inserting leap second 23:59:60 UTC

The kernel's high-resolution timer (hrtimer) code does not use this version of the system time, though — at least, not directly. Instead, hrtimers have a couple of internal time bases that are offset from the system time. These time bases allow the implementation of different clocks; the "realtime" clock should adjust with the time, while the "monotonic" clock must always move forward, for example. Importantly, these timer bases are CPU-specific, since realtime clocks can differ between one CPU and the next in the same system. The hrtimer offsets allow the timer subsystem to quickly turn a system time into a time value appropriate for a specific processor's realtime clock.

If the system time changes, those offsets must be adjusted accordingly. There is a function called clock_was_set() that handles this task. As long as any system time change is followed by a call to clock_was_set(), all will be well. The problem, naturally, is that the kernel failed to call clock_was_set() after the leap second adjustment, which certainly qualifies as a system time change. So the hrtimer subsystem's idea of the current time moved forward while the system time was held back for a second; hrtimers were thereafter operating one second in the future. The result of that offset is that timers started expiring one second sooner than they should have; that is not quite what the timer developers had in mind when they used the term "high resolution."

For many applications, having a timer go off one second early is not a big problem. But there are plenty of situations where timers are set for less than one second in the future; all such timers will naturally expire immediately if the timer subsystem is operating one second ahead of the system time. Many of these timers are also recurring timers; they will be re-set immediately after expiration, at which point they will immediately expire again — and so on. The resulting loop is the source of the load spikes reported by victims of this bug across the net.

The fix is to call clock_was_set() in the leap second code—a call that had been removed in 2007. But it's not quite that simple. The work done by clock_was_set() must happen on every CPU, since each CPU has its own set of timer bases. That's not something that can be done in atomic context. So John's patch detects a call in atomic context and defers the work to a workqueue in that case. With this patch in place, the kernel's leap second handling should work again.

How could such a bug come about? Time-related code is notoriously tricky in general; bugs are common. But the situation is far worse when the code in question is almost never executed. Prior to June 30, 2012, the last leap second was at the end of 2008. That is 3½ years in which the leap second code could have been broken without anybody noticing. If the kernel had a regularly-run regression test that verified the correct functioning of hrtimers in the presence of leap second adjustments, this problem might just have been caught before it affected production systems, but nobody has made a habit of running such tests thus far.

Perhaps that will change in the future; if nothing else, distributors with support obligations are likely to run some tests ahead of the next scheduled leap second adjustment. Hopefully, that will catch any problems in this particular little piece of code, should they happen to slip in again. Beyond that, one can always hope for an end to leap seconds. The kernel could also contemplate a switch to international atomic time (TAI), which does not have leap seconds, for its internal representation. Using TAI internally has its own challenges, though, including a need to avoid changing the time representation as seen by user space—meaning that the kernel would still have to track leap seconds internally. So it seems likely that, one way or another, leap seconds are likely to continue to be a source of irritation and bugs in the future.

Comments (90 posted)

Patches and updates

Kernel trees

Core kernel code

Development tools

Device drivers

Documentation

Filesystems and block I/O

Memory management

Networking

Architecture-specific

Security-related

Virtualization and containers

Page editor: Jonathan Corbet

Distributions

An early CyanogenMod 9.0 review

By Jonathan Corbet
July 4, 2012
The CyanogenMod project produces the best-known rebuild of the Android operating system; unlike a lot of other "modders," CyanogenMod rebuilds its distribution from the Android Open Source Project source and functions increasingly like an ordinary free software project. The announcement for the first CyanogenMod 9.0 release candidate hit the net on June 26. Your editor, never one to miss a chance to brick a nice handset with pre-release software, decided it was time to see how CM9.0-rc1 would behave on a Galaxy Nexus device.

The current stable CyanogenMod release is 7.2.0, based on the Android "Gingerbread" release. 7.2.0 adds a lot to stock Android, making the switch to CyanogenMod worthwhile even for those with reasonably good stock Android installations. Given that the project is going from 7.2.0 to the upcoming 9.0 release, one might well wonder what happened to 8.0; might the CyanogenMod developers be engaging in some sort of version number inflation? The truth is rather more boring than that. CyanogenMod release numbers are tied to the first letter of the associated Android release name. CM8.x would have been based on the Honeycomb release, but, since that release never happened, CM8.x didn't happen either. The upcoming CM9.0 release, of course, is based on the famous Android "Ice Cream Sandwich" version.

One of the advantages of buying an Android device directly from Google is that there is very little hassle involved in unlocking the device. No jailbreaking required. A simple fastboot command is enough to unlock the bootloader; one should take the warning that the device will be wiped seriously, though. Another fastboot command installs the ClockworkMod recovery image which, in turn, can be used to flash the actual CM9.0 installation. The hardest part, arguably, is figuring out the magic sequence needed by each device to get it into the recovery mode; with the Galaxy Nexus, one has to use the volume keys to scroll through hidden options to get to the "recovery" choice. That done, the CM9.0-rc1 installation went without a hitch.

Except, of course, for the part about having to completely reconfigure the device from scratch again. Lots of the requisite information is now helpfully stored on the Google mothership, offering a degree of convenience that can make one overlook the fact that somebody else is holding a lot of your important data. But one still must configure K9, re-pair Bluetooth devices, set various display options, turn off annoying notifications, and so on. The life of a device distribution reviewer is often difficult and unglamorous, but somebody has to do it.

What's new

Previous CyanogenMod releases have featured vast numbers of configuration options, allowing the user to tweak just about any aspect of the experience. For 9.x, the developers have decided that maybe they needed to cut back a bit. Such a decision might lead one to fear a GNOME-like dedication to removing any feature that proves unable to run away quickly enough. The truth of the matter, though, is that, from your editor's point of view, they have not taken away much of great importance. The knobs that really make a difference are still there; a lot of clutter is gone. So far, so good.

[Cid] But one interesting result of this decision is that CyanogenMod 9.x actually looks a lot like the Android release upon which is it based. It is much closer to the original than 7.x ever was. It is, in fact, close enough that one might well wonder whether it's worth the trouble to make the change. What does CM9.x provide that an Android device doesn't offer from the outset — beyond the new, slightly creepy mascot?

To start with, there's still a higher degree of control over the interface. There are little things, like the ability to get a numeric percentage value for the current battery charge, for example. The "Trebuchet" launcher, while lacking the massive set of configuration options found in previous CyanogenMod launchers, still allows a lot of basic tweaks, including the number of rows and columns on the home screen. Trebuchet restores the ability for the home screen to rotate to match the handset's orientation. It also allows the removal of the Google search bar at the top of the home screen, something Google's own distribution won't let the user do.

There is a built-in "profiles" feature that can be used to store and load sets of configuration options for different places. As with traditional cellphone profile implementations, basic features like ringer volume can be controlled, but there is lot more than that. Profiles can control wireless behavior, synchronization, notifications on a per-application level, and more. One could easily set up, for example, an "international travel" profile that turns off synchronization and cuts out most, but not all, notifications. Unlike the profiles implemented by some add-on applications, CyanogenMod profiles can't be tied to times or locations, but that is probably good enough for most users.

Your editor still misses the highly configurable CM7.x power widget; 9.0-rc1 only contains the stock Android version. There is, however, a separate power widget that can be enabled when the notification bar is dragged down, and that one is configurable indeed. That makes it easily to quickly toggle features like airplane mode or mobile data. (Those wanting a massively configurable power widget can get it by installing an application like Widgetsoid).

Stock Android on this device included an option to go straight into the camera application from the lock screen—useful for grabbing a quick picture. CyanogenMod extends that functionality to make it possible to go directly to a number of applications directly from the lock screen—or that's what the intent seems to be; it did not want to work on your editor's device. One can also do some limited reconfiguration of the dedicated buttons at the bottom of the screen; if one really wants the search button back, one can have it.

Those who truly want to tweak things can go into the "performance" menu, ignoring the scary warning on the way in. There, things like CPU frequency policies can be tweaked. It is also possible to turn on kernel samepage merging (KSM), allowing the kernel to merge pages with duplicate contents and making more memory available. Unlike sometimes in the past, there is little in this menu that looks truly scary.

There's one other thing worth mentioning, even though it's not really a CyanogenMod feature: recent versions of the Android browser have a "quick controls" option found under "Labs" in the settings menu. Turning that on enables a nice two-dimensional menu obtained by swiping in from the side; it makes the browsing experience much faster and more straightforward.

In summary: CyanogenMod 9.0 looks set to be another solid release with some nice functionality and little in the way of obvious problems. Users who appreciate the more open nature of the CyanogenMod community or who want the extra configurability will certainly want to make the switch at some point. For the rest, given how stock Android has caught up to CyanogenMod in a number of ways, it may well be worth asking whether switching to CyanogenMod is worth the effort. Gaining control over a mobile device and flashing new firmware into it is not always a simple or stress-free exercise, after all. Users who have a pure Android installation to start with (as found on devices from Google, for example) might happily stay where they are. On the other hand, devices afflicted with extra "features" imposed by carriers, or those that will not otherwise be upgraded to a current Android release, may be significantly improved by a CyanogenMod installation. Either way, the upcoming CyanogenMod 9.0 release will be a nice option to have.

Comments (13 posted)

Brief items

Distribution quote of the week

No problems here, it [the leap second] has affected debian is a big way from all reports, but ya know what they're like, their modern distro is still 5 years behind redhat, letalone near as current as Slackware or Gentoo
-- Noel Butler

Comments (none posted)

The FSF's advice to distributors on UEFI secure boot

The Free Software Foundation has published a paper describing its position on UEFI secure boot and how Fedora and Ubuntu are implementing it. "Software signed with self-generated keys has the downside of not working on the majority of computers right off the shelf, without the user taking some extra steps. We acknowledge that this is an issue, but in addition to insisting on (and contributing to) documentation to make the necessary process easy to follow, we will strive to solve this problem through political action against manufacturers and proprietary software companies who impede free software adoption. Encouraging free software distributors and users to trust Microsoft or any other proprietary software company as a precondition to exercising their freedoms is simply not an acceptable solution."

Comments (61 posted)

Fedora 17 for IBM System z 64bit official release

A port of Fedora 17 for the IBM System z (s390x) architecture has been released. Architecture specific release notes are available.

Full Story (comments: 1)

Linaro 12.06 released

Linaro 12.06 has been released. "Linaro 12.06 contains components delivered by all Linaro Teams --Working Groups, Landing Teams and Platform Teams-- and brings an abundance of exciting updates and new features which are integrated on top of Android and Ubuntu. Linaro through these updates, fixes, and new features continue to build the future of Linux on ARM and the 12.06 Linaro release delivers another winning combination off these components."

Full Story (comments: none)

Oracle Linux 6.3

Oracle has announced the release of Oracle Linux 6.3. Oracle's enterprise kernel is installed and booted by default, and the Red Hat compatible kernel is also installed by default. See the release notes for more information.

Comments (none posted)

Ubuntu 12.10 (Quantal Quetzal) Alpha 2 Released

Ubuntu has released the second alpha of Ubuntu 12.10, scheduled for final release in October 2012. Notably, 12.10 will deprecate Python 2.x from the main image: "For 12.10, we intend to ship only Python 3 with the Ubuntu desktop image, not Python 2. Alpha-2 continues this process. If you have your own programs based on Python 2, fear not! Python 2 will continue to be available (as the python package) for the foreseeable future."

Full Story (comments: none)

Distribution News

Debian GNU/Linux

bits from the DPL: June 2012

Debian Project Leader Stefano Zacchiroli has a few bits for June. Topics include the Wheezy freeze, DebConf12, secure boot, a spring tour, sprints, Debian assets, and more.

Full Story (comments: none)

Zacchiroli: working with FSF on Debian Free-ness assessment

Debian Project Leader Stefano Zacchiroli has announced a new initiative to make Debian qualify for the Free Software Foundation's list of free distributions. "What I'm proposing is basically a soft approach in verifying if all remaining issues that cause friction among Debian and the FSF can be solved in the most typical Debian way. The approach might fail, e.g. due to disagreements on bug validity. But at that point we will have obtained a list of blockers, that could than be used as documentation for Debian users who wonder why Debian and FSF disagree on the Free-ness of Debian."

Full Story (comments: 54)

Newsletters and articles of interest

Distribution newsletters

Comments (none posted)

Google plans to ease the Android update problem (The H)

The H reports on Google's plan to alleviate the OS upgrade struggles of Android device vendors. "To achieve this, Android executive Hugo Barra announced a 'Platform Development Kit' (PDK). Barra said that the kit contains the 'required source code' to allow manufacturers to port a forthcoming Android version to their hardware. He added that Google will make the PDK available to its partners two to three months before a new version is released. The executive didn't mention what criteria Google will use to select these partners." The PDK is intended to help vendors port newer Android releases to older hardware, then push the update out to end users.

Comments (43 posted)

Linux Distro Digest (The H)

The H covers the spring releases of several Linux distributions. "Not all Linux distributions have to be an everyday operating system; bootable Linux-based systems like Parted Magic, Clonezilla Live and SystemRescueCD don't require installation and can be used for hard disk partitioning and duplication, as well as fixing other operating systems and even removing viruses on Windows systems. Unlike traditional desktop distributions, these tend to be updated more frequently to upgrade the included applications and add new features."

Comments (none posted)

Page editor: Rebecca Sobol

Development

Data mining with Orange

By Nathan Willis
July 4, 2012

Orange is a GPLv3 Python module for mining, classifying, and visualizing data. The main problem it endeavors to help you solve is machine learning — analyzing and modeling a set of test data so that you can use it to make predictions about new data collected in the wild. Although you can use it to write standard interpreted Python scripts, the project also comes with a "visual programming" interface. Whether visual programming proves useful may depend as much on the programmer as on the data, but Orange makes it simple to explore your data set either way.

The Orange project site provides nightly-build tar downloads, as well as a .deb package repository. In addition to Python (2.6 or 2.7 only), Orange uses the Graphviz library extensively to build visualizations. Orange Canvas, the visual programming tool, requires Qt4.

The fundamentals

Orange is designed to ingest text-based data files; it understands the C4.5 file format popular in the machine learning crowd, but it has a native, tab-delimited file format, too. In Orange's format, the first three lines are reserved for domain information: the first line holds the attribute names, the second line holds the data type for each attribute, and the third line lets you denote special features of specific attributes. The most important special feature is class, which designates an attribute as the distinguishing characteristic of the statistical classes you are out to investigate. The remainder of the file is data, one case per line.

For example, if you have collected data on people who have registered for the forum on your project site, you may have a range of attributes including their country of origin, number of posts, age, whether they use a custom avatar, number of "thumbs up" votes, and OS. If you are interesting in exploring what makes a forum member eventually become a contributor, however, a submittedPatch attribute is the one you would mark as a class. That way you can have Orange examine all of the other attributes to find out which ones (or which combinations) accurately predict what will turn a forum member into a contributor.

Orange provides functions to automatically compute simple statistical summaries of your data: means, mean square errors, frequencies and other basic facts. For example, the following example code loads a data file from forum_users.tab, calls the orange.DomainDistributions() function on it, and iterates through the data's discrete attributes, reporting the frequency in which each value occurs.

    import orange
    data = orange.ExampleTable("forum_users")
    datastats = orange.DomainDistributions(data)

    print "Distributions:"
    for i in range(len(data.domain.attributes)):
    	a = data.domain.attributes[i]
	if a.varType == orange.VarTypes.Discrete:
	    print "%s:" % a.name
	    for j in range(len(a.values)):
	    	print "  %s: %d" % (a.values[j], int(datastats[i][j]))

The output would be of the form:

    country:
      US: 123
      UK: 87
      Germany: 38
      El Salvador: 19
    os:
      linux: 200
      windows: 15
      osx: 9
      vms: 36
      unknown: 7

Orange exposes continuous attributes (i.e., those with floating point data) in a similar fashion through orange.VarTypes.Continuous. There are functions for paring down your data set by filtering on attribute values or by pseudo-random sampling. You can even take a "stratified" sample, which means that Orange will grab a sample set that has the same proportions of each class attribute as the entire set.

Classifying, predicting, and evaluating

You can also see from the example code that Orange tips its hand a little in its class names. The Python class into which you load your data is named ExampleTable rather than (say) DataTable. That is because Orange is intended to load in "example" data sets that have already been classified (i.e., there is at least one class attribute), so that you can use them to generate a classifier algorithm that can correctly predict to which class items will belong.

Orange includes several different learning algorithms into which you can feed data sets in order to deduce a good classifier. Called learners in Orange-speak, some of them are simplistic and are useful mostly for testing, such as the k-nearest neighbor algorithm, while others are more robust, such as the Bayes theorem algorithm. But the project's centerpiece is its own implementation of a decision tree algorithm, which is implemented in its own module named orngTree.

A decision tree classifies a sample by stepping through each attribute — hopefully in the most efficient order possible based on the available examples. The edges of the tree take you either to a final decision (such as "this user will not become a contributor") or to the next attribute to evaluate. In Orange, you create the tree by "training" the learner in one fell swoop with the orngTree.TreeLearner() function, passing it your data set and any options you require. This function creates and returns a classifier that you can subsequently call (on new, unclassified data) through orngTree.TreeClassifier.

Or at least that is how it is supposed to work. In reality, you typically have to train, evaluate, tweak, re-train, and re-evaluate several times. Orange provides a slew of regression tests, statistical and probability features, and tuning options for better modeling your data. Most of them will feel vaguely familiar to people whose last statistics class was more than a couple of years ago.

There are also a number of related features for digging into your data set and finding correlations and hidden relationships between attributes. Among the available options are association rules (i.e., learning that certain combinations of values tend to occur together), clustering (attempting to partition the data into discrete groups), and self-organizing maps (which attempts to find patterns in the data by examining its topological features in 2D or 3D). In some cases, the end goal of these analytical techniques might be to build or optimize your classifier, but you may simply be out to find unusual properties in the data set or locate interesting outliers.

A lot of the data mining options implemented in Orange are outside my personal or professional experience, although the reference documentation does an admirable job of providing background information. That said, I was a bit disappointed that the tutorial section on the Orange site covers only a smattering of the feature set. I worked through as many as I could, however, and I will say Orange provides a very easy to explore data mining tool set. Most (and perhaps all) of the statistical functions are available in other open source packages (such as R), but both the convenience of working in Python and the number of built-in analysis functions make getting started simple.

Visualize!

[Orange Canvas]

The ease-of-getting-started point goes double for Orange Canvas, the project's visual programming interface. Rather, the data mining process is easy, once you figure out the "visual programming" paradigm itself. Orange Canvas works by letting you drag function blocks from the toolbar onto an infinite-in-two-dimensions workspace, then connect the blocks together with hoses. The output from A goes to the input for B, and so on.

It is the same basic idea as a dozen other visual programming editors. My only real criticism with it is that on the canvas too few of the properties of the blocks in question are visible, and you must hover the mouse over each block to see more about it and right-click it to open a property editor.

The reason that this matters is that many of Orange's classes have complex relationships; they often require multiple input connections (such as both data and a learner) that look essentially the same on screen. Some of the block structures struck me as counter-intuitive, too. Conceptually, when you are writing in Python, the TreeLearner is a function that accepts input, and creates a TreeClassifier as its output. But in the visual interface, a "Classification Tree" block has a "learner" as its output node. The reason is that in the block-and-hose design you are intended to hook the learner output directly into a "prediction" block and make use of it, not to manipulate it on its own. But it takes some getting used to.

Still, the visual interface has one killer feature: the ability to instantly hook up blocks to visualizations and manipulate them. Everything from scatterplots to "sieve multigrams" are provided, including a number of chart types you may need to look up in a reference book. Double-clicking on any of the visualization blocks opens a separate window, in which you can manipulate the variables included (and most other properties) and get a live-updated graph of the results. For example, you can plot various attributes against each other in a scatterplot and look for clumps, which would tell you that those attributes are tightly correlated. In contrast, with your own Python code you naturally must edit and re-execute your script to look at each new permutation of attributes.

[Orange Canvas scatterplot]

The Graphviz library does the heavy lifting on the visualizations. When you launch a visualization, you can also use VizRank, a feature that iteratively searches through the possible parameters looking for the ones that produce the most interesting results. For instance, in our forum users example, VizRank will iterate through all possible pairs of attributes (age versus OS, age versus avatar, avatar versus OS, etc.) and rank the pairings by clumpiness. The upshot is that you are spared the time it takes to plot every pair and assess the outcome (not to mention the work of ensuring that you did not accidentally forget a permutation). You can export the visualizations (or selected regions) to image files with a single click.

I am one of those "visual learners" you hear about every now and then; to me, searching through the plots looking for patterns is a far more revealing way of getting to know a data set. I worked with data sets in the hundreds-of-entries range, which does not give one a feel for how Orange might perform on terabyte-scale "big data" mining. That does not mean that Orange is incapable of scaling that high; I simply cannot vouch for it. But that is hardly the point: hundreds and thousands of records are more than enough that a specialized data exploration tool is well worth your while. Orange gives you access to a good Python data-mining toolbox, and Orange Canvas gives you a nearly foolproof way to examine it visually. The project has an active community and is running multiple Google Summer of Code projects to develop new tools, so if the module does not provide the classification or visualization method you need, the odds are good that you can still make it work.

Comments (3 posted)

Brief items

Quotes of the week

There’s absolutely no way that the Mozilla Foundation can personally host events to teach web making to the world. But I do think that if we do this right, we can build a movement that others build on, and make their own.

We can teach people to teach people to teach people to fish. And soon there will be no fish left in the sea. (But in this case, that means “everyone learns web making” so it’s not quite so ecological disastery. … I hope.)

Michelle Levesque

I'm all for code cleanups, UI cleanups, and anything that can improve our state of affairs. Forgive me for using a strong word, but destroying working functionality without explaining exactly what made it bad, and without trying to fix it first, is just vandalism.
Federico Mena Quintero

A quick clarification - -building- LibreOffice tends to discover poor thermal management in systems better than anything else I've seen ;-)
Michael Meeks

Comments (5 posted)

GNU C library 2.16 released

Version 2.16 of the GNU C library is out. Significant changes include support for the x32 ABI, various bits of ISO C11 support, a number of performance improvements, and lots of bug fixes. This version of glibc is not supported on Linux kernels prior to 2.6.

Full Story (comments: 16)

GRUB 2.00 released

Version 2.00 of the GNU GRUB bootloader has been released. "Since this version has a round number it has been paid special attention to, and hopefully, represents higher quality." Improvements include ports to a number of relatively obscure architectures (Itanium, Fuloong2F, ...), improved device and filesystem support, some additional boot protocols, and more.

Full Story (comments: 49)

Rakudo Star release 20120.06

The Rakudo Perl distribution has released "Rakudo Star," which it describes as "a useful, usable, "early adopter" distribution of Perl 6." There are numerous improvements listed, including enhanced list and .map handling and the addition of the same regular expression engine used in user-space, which fixes several parsing bugs.

Full Story (comments: none)

Newsletters and articles

Development newsletters from the last week

Comments (none posted)

Linksvayer: 5 years of GPLv3

Mike Linksvayer shares his reflections on the importance of the GPLv3, which was released five years ago today. "I suggest that number (add qualifiers of and scaling by importance, quality, etc, as you wish) of works under GPLv3 or use of GPLv3 relative to other licenses are less important markers of GPLv3′s success, and that of the broader FLOSS community, than the number and preponderance of works under GPLv3-compatible terms."

Comments (73 posted)

Vinyl cutting on Linux: the real deal (Libre Graphics World)

Libre Graphics World is running a comparison of Linux applications used to drive cutting machines for vinyl, cardstock, and other materials. "Most cutting devices rely on HPGL printer control language and its versions such as CAMM-HPGL (Roland). So the job is, essentially, to take a vector graphics file and convert it to HPGL, then send it to the device along with control commands such as blade speed and pressure." Looks like several options are available, including Inkscape extensions and stand-alone programs.

Comments (3 posted)

Why learn C? (O'Reilly Radar)

O'Reilly Radar has published an interview with author David Griffiths on the continued relevance of teaching C. The main interview in in video form, with text excerpts highlighting specific points, such as "For example, it teaches how memory works in a more profound way (a concept systems programmers will likely already know, though new programmers in specialized fields might not)" and "It's an important, foundational language that requires you to understand the full stack of the technology."

Comments (175 posted)

Page editor: Nathan Willis

Announcements

Brief items

Boot to Gecko phones coming in 2013

Mozilla Corporation has announced plans for the first set of phone handsets to be built using its "Boot to Gecko" technology. "Device manufacturers TCL Communication Technology (under the Alcatel One Touch brand) and ZTE today announced their intentions to manufacture the first devices to feature the new Firefox OS, using Snapdragon processors from Qualcomm Incorporated, the leader in smartphone platforms. The first Firefox OS powered devices are expected to launch commercially in Brazil in early 2013 through Telefónica’s commercial brand, Vivo."

Comments (18 posted)

Articles of interest

Controversial anti-piracy agreement rejected by EU (BBC)

The BBC reports that the "Anti-Counterfeiting Trade Agreement" (ACTA) has been rejected by the European Parliament. "Wednesday's vote is seen by most observers as the final blow to the treaty in its current form. It means no member states will be able to join the agreement. A total of 478 MEPs voted against the deal, with 39 in favour. There were 165 abstentions."

Comments (15 posted)

Harihareswara: Be Bold: An Origin Story

"Be Bold: An Origin Story" is the title of a keynote address by Sumana Harihareswara at the Open Source Bridge conference. She has shared her notes for the talk on how she became an open source contributor. "One of the greatest gifts you can give your children, your employees, the people to whom you are a role model, is the knowledge that some field of endeavor is in a sense No Big Deal. Knowledge -- belief backed up by experience -- that they can do interesting and rewarding projects in it without fear of public embarrassment. I grew up thinking that writing, editing, publishing, public speaking, and community leadership were No Big Deal. I use these lessons in my open source work all the time. We know how to do this."

Comments (2 posted)

FSFE Newsletter - July 2012

The Free Software Foundation Europe newsletter for July is available. Topics include secure boot, European Court of Justice fines Microsoft, OpenRelief, and more.

Full Story (comments: none)

Free Software Supporter -- Issue 51, June 2012

The Free Software Foundation (FSF) has released Issue 51 of it's monthly newsletter. This edition looks at ACTA, software patents in Europe, Secure Boot, GNU GPLv3 turns 5, and several other topics.

Full Story (comments: none)

New Books

The Art of Community, 2nd Edition--New from O'Reilly Media

O'Reilly Media has released "The Art of Community, 2nd Edition" by Jono Bacon.

Full Story (comments: none)

Mobile JavaScript Application Development--New from O'Reilly Media

O'Reilly Media has released "Mobile JavaScript Application Development" by Adrian Kosmaczewski.

Full Story (comments: none)

Calls for Presentations

14th Real Time Linux Workshop - 2nd Call for Papers

The 2012 Real Time Linux Workshop will be held October 18-20 in Chapel Hill, North Carolina. The call for papers will be open until July 23. "Authors from regulatory bodies, academics, industry as well as the user-community are invited to submit original work dealing with general topics related to Open Source and Free Software based real-time systems research, experiments and case studies, as well as issues of integration of open-source real-time and embedded OS. A special focus will be on industrial case studies and safety related systems."

Full Story (comments: none)

Upcoming Events

Elizabeth Garbee to Keynote at OhioLinuxFest 2012

The OhioLinuxFest has announced that Elizabeth Garbee will be a keynote speaker at this year's event. OhioLinuxFest takes place September 28-30, 2012 in Columbus, Ohio. "Elizabeth is a second-year undergraduate student who is part of a pulsar astronomy research team at Oberlin College in the US, with a long history of involvement in the open source community. She installed her first Debian machine (with some help!) at age 9 - since then, her main computing interest has been digital art, when she isn't serving as a revision control goddess and programmer for her research team. She is a frequent speaker at Linux conferences, and spoke last year at Linux Conf Australia. She last spoke at Ohio LinuxFest in 2008."

Full Story (comments: none)

Events: July 5, 2012 to September 3, 2012

The following event listing is taken from the LWN.net Calendar.

Date(s)EventLocation
June 30
July 6
Akademy (KDE conference) 2012 Tallinn, Estonia
July 1
July 7
DebConf 2012 Managua, Nicaragua
July 2
July 8
EuroPython 2012 Florence, Italy
July 5 London Lua user group London, UK
July 6
July 8
3. Braunschweiger Atari & Amiga Meeting Braunschweig, Germany
July 7
July 8
10th European Tcl/Tk User Meeting Munich, Germany
July 7
July 12
Libre Software Meeting / Rencontres Mondiales du Logiciel Libre Geneva, Switzerland
July 8
July 14
DebConf12 Managua, Nicaragua
July 9
July 11
GNU Tools Cauldron 2012 Prague, Czech Republic
July 10
July 11
AdaCamp Washington, DC Washington, DC, USA
July 10
July 15
Wikimania Washington, DC, USA
July 11 PuppetCamp Geneva @RMLL/LSM Geneva, Switzerland
July 11
July 13
Linux Symposium Ottawa, Canada
July 14
July 15
Community Leadership Summit 2012 Portland, OR, USA
July 16
July 20
OSCON Portland, OR, USA
July 26
July 29
GNOME Users And Developers European Conference A Coruña, Spain
August 3
August 4
Texas Linux Fest San Antonio, TX, USA
August 8
August 10
21st USENIX Security Symposium Bellevue, WA, USA
August 18
August 19
PyCon Australia 2012 Hobart, Tasmania
August 20
August 21
Conference for Open Source Coders, Users and Promoters Taipei, Taiwan
August 20
August 22
YAPC::Europe 2012 in Frankfurt am Main Frankfurt/Main, Germany
August 25 Debian Day 2012 Costa Rica San José, Costa Rica
August 27
August 28
GStreamer conference San Diego, CA, USA
August 27
August 28
XenSummit North America 2012 San Diego, CA, USA
August 27
August 29
Kernel Summit San Diego, CA, USA
August 28
August 30
Ubuntu Developer Week , IRC
August 29
August 31
LinuxCon North America San Diego, CA, USA
August 29
August 31
2012 Linux Plumbers Conference San Diego, CA, USA
August 30
August 31
Linux Security Summit San Diego, CA, USA
August 31
September 2
Electromagnetic Field Milton Keynes, UK
September 1 Panel Discussion Indonesia Linux Conference 2012 Malang, Indonesia
September 1
September 2
VideoLAN Dev Days 2012 Paris, France
September 1
September 2
Kiwi PyCon 2012 Dunedin, New Zealand

If your event does not appear here, please tell us about it.

Page editor: Rebecca Sobol

Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds