User: Password:
|
|
Subscribe / Log in / New account

LWN.net Weekly Edition for February 10, 2011

Moglen on Freedom Box and making a free net

By Jonathan Corbet
February 8, 2011
Eben Moglen is usually an inspiring speaker, and his keynote at FOSDEM 2011 did not disappoint. Free software remains, as always, at the core of his talks, but he has adopted a wider political vision and thinks that the community should do the same. Our freedom, he said, depends on reengineering the network to replace vulnerable, centralized services with alternatives which resist government control.

The publication of Larry Lessig's Code, Eben said, drew our attention to the fact that, in the world we live in, code increasingly functions as law. Code does the work of the state, but it can also serve revolution against the state. We are seeing an enormous demonstration of the power of code now, he said. At the same time, there is a lot of attention being paid to the publication of Evgeny Morozov's The Net Delusion, which makes the claim that the net is being co-opted to control freedom worldwide. The book is meant to be a warning to technology optimists. Eben is, he said, one of those optimists. The lesson he draws from current events is that the right net brings freedom, but the wrong net brings tyranny.

We have spent a lot of time making free software. In the process, we have joined forces with other elements of the free culture world. Those forces include people like Jimmy Wales, but also people like Julian Assange. Wikipedia and Wikileaks, he said, are two sides of the same coin. At FOSDEM, he said, one could see "the third side" of the coin. We are all people who have organized to change the world without creating new hierarchies in the process. At the end of 2010, Wikileaks was seen mainly as a criminal operation. Events in Tunisia changed that perception, though. Wikileaks turns out to be an attempt to help people learn about their world. Wikileaks, he said, is not destruction - it's freedom.

But now there are a lot of Egyptians out there whose freedom depends on the ability to communicate through commercial operations which will respond to pressure from the government. We are now seeing in real time the vulnerabilities which come from the bad engineering in the current system.

Social networking, he said, changes the balance of power away from the state and toward people. Events in countries like Iran, Tunisia, and Egypt demonstrate its importance. But current forms of social communication are "intensely dangerous" to use. They are too centralized and vulnerable to state control. Their design is motivated by profit, not by freedom. As a [Eben Moglen] result, political movements are resting on a fragile foundation: the courage of Mr. Zuckerberg or Google to resist the state - the same state which can easily shut them down.

Likewise, real time information for people trying to build freedom currently depends on a single California-based microblogging service which must turn a profit. This operation is capable of deciding, on its own, to donate its entire history to the US Library of Congress. Who knows what types of "donations" it may have made elsewhere?

We need to fix this situation, he said, and quickly. We are "behind the curve" of freedom movements which depend heavily on code. The longer we wait, the more we become part of the system. That will bring tragedy soon. Egypt is inspiring, but things there could have been far worse. The state was late to control the net and unready to be as tough as it could have been. It is, Eben said, not hard to decapitate a revolution when everybody is in Mr. Zuckerberg's database.

It is time to think about the consequences of what we have built - and what we have not built yet. We have talked for years about replacing centralized services with federated services; overcentralization is a critical vulnerability which can lead to arrests, torture, and killings. People are depending on technology which is built to sell them out. If we care about freedom, we have to address this problem; we are running out of time, and people are in harm's way. Eben does not want people who are taking risks for freedom to be carrying an iPhone.

One thing that Egypt has showed us, like Iran did before, is that closed networks are harmful and network "kill switches" will harm people who are seeking freedom. What can we do when the government has clamped down on network infrastructure? We must return to the idea of mesh networks, built with existing equipment, which can resist governmental control. And we must go back to secure, end-to-end communications over those networks. Can we do it, he asked? Certainly, but will we? If we don't, the promise of the free software movement will begin to be broken. Force will intervene and we will see more demonstrations that, despite the net, the state still wins.

North America, Eben said, is becoming the heart of a global data mining industry. When US President Dwight Eisenhower left office, he famously warned about the power of the growing military-industrial complex. Despite that warning, the US has, since then, spent more on defense than the rest of the world combined. Since the events of September 11, 2001, a new surveillance-industrial complex has grown. Eben strongly recommended reading the Top Secret America articles published by the Washington Post. It is eye-opening to see just how many Google-like operations there are, all under the control of the government.

Europe's data protection laws have worked, in that they have caused all of that data to move to North America where its use is uncontrolled. Data mining, like any industry, tends to move to the areas where there is the least control. There is no way that the US government is going to change that situation; it depends on it too heavily. As a presidential candidate, Barack Obama was against giving immunity to the telecom industry for its role in spying on Americans. That position did not even last through the general election. Obama's actual policies are not notably different from those of his predecessor - except in the areas where they are more aggressive.

Private industry will not change things either; the profit motive will not produce privacy or defense for people in the street. Companies trying to earn a profit cannot do so without the good will of the government. So we must build under the assumption that the net is untrustworthy, and that centralized services can kill people. We cannot, he said, fool around with this; we must replace things which create these vulnerabilities.

We know how to engineer our way out of this situation. We need to create plug servers which are cheap and require little power, and we must fill them with "sweet free software." We need working mesh networking, self-constructing phone systems built with tools like OpenBTS and Asterisk, federated social services, and anonymous publication platforms. We need to keep our data within our houses where it is shielded by whatever protections against physical searches remain. We need to send encrypted email all the time. These systems can also provide perimeter defense for more vulnerable systems and proxy servers for circumvention of national firewalls. We can do all of it, Eben said; it is easily done on top of the stuff we already have.

Eben concluded with an announcement of the creation of the Freedom Box Foundation, which is dedicated to making all of this stuff available and "cheaper than phone chargers." A generation ago, he said, we set out to create freedom, and we are still doing it. But we have to pick up the pace, and we have to aim our engineering more directly at politics. We have friends in the street; if we don't help them, they will get hurt. The good news is that we already have almost everything we need and we are more than capable of doing the rest.

[Editor's note: as of this writing, the Freedom Box Foundation does not appear to have a web site - stay tuned.]

[Update: Added link to The FreedomBox Foundation]

Comments (70 posted)

FOSDEM: Mapping WikiLeaks using open-source tools

February 9, 2011

This article was contributed by Koen Vervloesem

This year, FOSDEM had a Data Analytics developer room, which turned out to be quite popular to the assembled geeks in Brussels: during many of the talks the room was fully packed. This first meeting about analyzing and learning from data had talks looking at information retrieval, large scale data processing, machine learning, text mining, data visualization and Linked Open Data, all of which was implemented using open source tools.

Mapping WikiLeaks cables

[Zoomed view]

One of the most inspiring talks in the data analytics track, which showed just how much you can do with open source tools in data visualization, was Mapping WikiLeaks' Cablegate using Python, mongoDB, Neo4j and Gephi by Elias Showk and Julian Bilcke, two software engineers at the Centre National de la Recherche Scientifique (National Center for Scientific Research in France). Their goal was to analyze the full text of all published WikiLeaks diplomatic cables, to produce occurrence and co-occurrence networks of topics and cables, and finally to visualize how the discussions in the cables relate to each other. In short, they did this by analyzing the 3,300 cables with Python and some data extraction libraries, then they used MongoDB and Neo4j to store the documents and generate graphs, and finally they visualized and explored the graphs with Gephi.

The first step in this process, presented by Showk, is importing the cables. Luckily, the WikiLeaks cables follow a simple structure that makes this relatively easy. Showk based his work on the cablegate Python code by Mark Matienzo that scrapes data from the cables in HTML form and converts this to Python objects. For the HTML scraping, the code is using Beautiful Soup, a well-known Python HTML/XML parser that automatically converts the web pages to Unicode and can cope with errors in the HTML tree. Moreover, with a SoupStrainer object, you can tell the Beautiful Soup parser to target a specific part of the document and forget about all the boilerplate parts such as the header, footer, sidebars, and supporting information.

After the parsing, The Python natural language toolkit NLTK is used on the text body to bring more structure to the word scramble with the goal of extracting some topics. The first step is tokenization: NLTK allows easily breaking up a text into sentences and each sentence into its separate words. Then for each word the stem is determined, which means that all words are grouped by their root. For example, to analyze the topics of the WikiLeaks cables, it doesn't matter if the word in a text is "language" or "languages", so they are both grouped by their root "languag". An SHA-256 hash value of each stem is then used as a database index.

MongoDB, a document-oriented database, is used as document storage for all this data. MongoDB allows transparently inserting and reading records as Python dictionaries, as well as automatic serializing and deserializing of the objects. Then Showk queried the MongoDB database to extract the heaviest occurrences and co-occurrences of words, and converted that to a graph using the Neo4j graph database.

[Gephi view]

For the final step, visualizing and analyzing the data, Bilcke used Gephi, an open source desktop application for the visualization of complex networks. Gephi, to which Bilcke is an active contributor, is a research-oriented graph visualization tool that has been used in the past to visualize some interesting graphs, like open source communities and social networks on LinkedIn. It's based on Java and OpenGL, but it also has a headless library, the Gephi Toolkit.

So Bilcke imported the graph from the Neo4j graph database into Gephi, and then did some manual data cleaning. The graph is quite dense and has a lot of meaningless content, so there is some post-processing needed, like sorting and filtering. Bilcke chose the OpenOrd layout, which is one of the few force-directed layout algorithms that can scale to over 1 million nodes, which makes it ideal for the WikiLeaks graph. He only had to remove some artifacts, tweak the appearance slightly, and finally export the graph to PDF and GEXF (Gephi's native file format).

In total the two French researchers did a full week of coding, during which they have written 600 lines of code using four external libraries, two database systems, and one visualization program. All the tools they used are open source, as is their code, so this is quite a nice testimonial of what you could do with open source tools in the field of visualization of big data sources. Performing the whole work flow from the original WikiLeaks HTML files to the final graph requires around five hours.

Showk and Bilcke did all this during their free time in order to learn about all these technologies. Their goal was to show that every hacker can convert a corpus of textual data to a graph that is easier for exploring topics. This could be used to find some interesting new things, but the two researchers lacked the time to do this and were more interested in the technical side. In an email, Bilcke clarifies:

Since we worked on the publicly released cables, we didn't expect any more secrets than what had been already published by media like The Guardian. So, we didn't find any unexpected secrets. Moreover, don't forget that there is a publication bias: we only see what has been released and censored by WikiLeaks. Our maps are mostly a tool for exploration, to help people dig into large datasets of intricate topics. In a sense, our visualization helps seeing the general topics and dig into the hierarchy, level after level. You can see potentially interesting cable stories at a glance, just by looking at what seem to be clusters (sub-networks) in the map, and zoom in for details. We believe this can be used as a complement to other cablegate tools we have seen so far.

The result is published in the form of two graphs, which can be explored by anyone who wants to dig into the WikiLeaks cables. One graph, with 43,179 nodes and 237,058 edges, links topics to the cables they occur in. The other graph, with 39,808 nodes and 177,023 edges, only shows the topics and links them when they co-occur in the same cable. Interested readers can view the PDF or SVG files, but the best way is to load the .gephi files into Gephi, so you can interactively explore the graphs. For graphs of this size, though, the Gephi system requirements suggest 2 GB of RAM.

Semantic data heaven

One of the other talks was about Datalift, an experimental research project funded by the already mentioned National Center for Scientific Research in France. Its goal is to convert structured data in various formats (like relational databases, CSV, or XML) to semantic data that can be interlinked. According to François Scharffe, only when open data meets the semantic web, will we truly see a data revolution. Because big chunks of data on an island are difficult to re-use, while data in a common format with semantic information (like RDF) paves the way for richer web applications, more precise search engines, and a lot of other advanced applications. Scharffe referred to the five stars rating system of Tim Berners-Lee:

If your data is available on the web with an open license, it gets one star. If it's available as machine-readable structured data, such as an Excel file instead of an image scan of a table, it gets two stars. The interesting things begin when you use a non-proprietary format like CSV, which gives the data three stars. But to become part of the semantic web, you need open standards to identify and search for things, like RDF and SPARQL, which amounts to four points. And to reach data heaven (five points), you finally need to link your data to other people's data to let you benefit from the network effect.

The Datalift project is currently developing tools to facilitate all steps in the process from raw data to published linked data, from selection of the right vocabulary (e.g. FOAF for persons, or GeoNames for geographic locations), conversion to RDF, and publishing of the data on the web, to interlinking the data with other existing data sources. There are already open source solutions for all these steps. For example, D2R Server maps a relational database on-the-fly to RDF, and Triplify does the same for web applications like a blog or content management system. For the publication of RDF in a human-readable form, there is the Tabulator Firefox extension. The Datalift project is trying to streamline this whole process.

No shortage of tools

All of the talks in the data analytics developer room where quite short, from 15 to 30 minutes. This allowed a lot of projects to pass in review. Apart from the WikiLeaks and Datalift talks, there were talks about graph databases and NoSQL databases, about query languages, about analyzing and understanding large corpora of text data using Apache Hadoop, about various tools and methods for data extraction from HTML pages, about machine learning with Python, and about a real-time search engine using Apache Lucene and S4.

The whole data analytics track showed that there's no shortage of open source tools to deal with big amounts of data. That's good news for statisticians and other "data scientists", which Google's Hal Varian called "the sexy job in the next ten years". In an article in The McKinsey Quarterly from January 2009, he wrote: "The ability to take data-to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it-that's going to be a hugely important skill in the next decades." Looking at all the talks at the data analytics track at FOSDEM, it's clear that open source software will play a big role in this trend. If the track will be hosted again at FOSDEM 2012, it's going to need a bigger room.

Comments (1 posted)

The eyeOS web desktop

February 9, 2011

This article was contributed by Nathan Willis

The eyeOS project describes itself as an open source "web desktop" — by which it means an operating system that emulates a functional desktop environment entirely within a single web page. The benefits certainly sound appealing: privacy, access to your files from anywhere on the Internet, and the ability to collaborate in real-time with other users. In addition, eyeOS is AGPLv3-licensed, which makes it high on the list of software-freedom protecting web services. Still, as of February 2011, there are scores of competing ways to accomplish remote-desktop-access and real-time collaboration, so eyeOS has a tough case to make for many users.

To get a clearer picture of what makes eyeOS different from other Web-enabled desktop suites, one has to look at both the interface and the architecture. After all, AbiWord, Zoho, Google Docs, EtherPad, and even Microsoft Office now allow online collaboration on standard office tasks.

Although it runs inside the browser, to the user eyeOS looks and feels like a standard office desktop: from the menu system and toolbars to the file manager and user preferences. To get started, you have to have an account on the remote eyeOS server, and once you log in, a full desktop session starts on the server and in the browser window. In fact, the desktop computing paradigm is built into the core of eyeOS's design. The back-end of the desktop environment runs as a PHP application on the server, and the browser is used only to run a JavaScript-based interface, which communicates to the server process via AJAX.

Any application that the user opens (word processor, email client, or even file manager) is launched as a separate server-side process that communicates with the eyeOS desktop process on the back-end, while its front-end communicates with the JavaScript GUI running in the browser. That may sound straightforward — the JavaScript front-end mirroring the behavior of the window system, and the back-end behaving like a traditional OS — but it is a distinctly different model than used by other web-based services like Zoho, which does not use a separate overseer process to manage individual application components.

The base eyeOS system comes with a small contingent of installed applications: networking, office tools, utilities, and so on, but both the front-end and back-end APIs are open. Over the years a respectable collection of third-party eyeOS software has been developed by the community. Installing a new application requires administrative access to the eyeOS server, but it can be done through the eyeOS desktop interface itself.

Test driving

[EyeOS opening folder]

You can create a new user account and test an eyeOS session on the public eyeOS server to get a feel for the system. Visiting the try it now page on eyeOS.org, you will notice two options, version 1.x and version 2.x, both of which as marked as "stable" and "to be used in production environments." The version 1.x code (which appears to be at release 1.9) has been in development since 2007. Version 2.x is a rewrite begun in March of 2010. The instance running on the public server sometimes shows 2.1 and sometimes 2.2 as its version number; presumably it is 2.2 with the occasional overlooked HTML update.

The project advertises more than 250 applications available for eyeOS 1.x and more than 20 for 2.x, but those numbers count third-party applications (most hosted at eyeos-apps.org), not what is available on the demo servers. I spent some time with version 2.x first, but the lack of application availability inspired me to test-drive 1.x as well. Even after digging through the wiki, discussion forum, and main web site, I am still not sure whether or not the public "try it now" server is a free service offered by eyeOS, or a demo limited in storage space, time, or some other resource. You are required to create an account to sign on as a new user, and provide an email address, but the email address is not validated as part of the user creation process. I hope I have not been subscribed to a marketing list as a result.

After having worked with both incarnations of eyeOS for about a day, I have mixed feelings about the implementation. On the one hand, the system is surprisingly fast: the GUI is understandably slower than working with native desktop applications, but it is approximately on par with running a local virtual machine inside VirtualBox. Features that I expected to be flaky, such as sound and mouse wheel support, worked without incident. On the other hand, there is a surprising lack of polish, particularly in the older 1.x system, where one would expect the kinks to have been worked out years ago.

[EyeOS theme change]

The user interface is inconsistent from application to application — particularly with the desktop widgets — as is the terminology (for example, there is an application variously called "eyeOS Board" and "eyeBoard" at different places in the interface). The icons are a mix of flat, 3-D, head-on and 45-degree angle perspectives, drawn in different styles and color schemes. Some of the default UI widgets are impossible-to-read, such as white text on light gray. Changing the theme requires you to "restart" your session — but there is no session restart option, only logging out entirely and logging back in manually. The system menu is in the bottom-right-corner, and uses an icon that is not quite the eyeOS logo, and looks confusingly similar to the familiar "power button" symbol. Most surprisingly, when I attempted to open the "Documents" folder in my home directory, the built-in file manager did not know how to do so, and popped up an application-selection window asking me to find the right launcher.

2.x is a bit better in terms of UI consistency, although it too suffers from mix-and-match iconography. More seriously, the application launchers on the desktop did not work, although that hiccup is ironically solved due to there being two additional sets of launchers always visible on the upper and lower desktop panels. Several of the default applications had non-functioning menu items (most noticeably the calendar, where calendar feed properties cannot be edited). If you open a new file in the text editor, it will throw away all of your changes to the current document without affording you the opportunity to save them.

Looking past the UI issues, however, there are some intrinsic properties of the system that I grew frustrated with rather quickly. For starters, I noticed that all applications start off at very small window sizes when launched, generally too small to be used without resizing. Upon reflection, though, that behavior is probably a workaround to account for the fact that the entire desktop is running in a frame within a window within your existing desktop environment: there is simply less real estate to go around.

The scarcity of screen space is exacerbated by the use of the desktop metaphor itself: things like having another task manager inside the browser and having window title bars for every application eat up space, but they don't make the applications more usable. By a similar token, eyeOS requires the user to manually log out of an eyeOS session (like one would on a desktop system) — simply closing the browser or navigating away does not close the session. That behavior makes sense if the goal is emulating a desktop OS, but it results in a security hole that undermines one of the stated goals of the project: keeping your files safe.

I was also disappointed in the default application set. Were it not for the novelty of running inside the browser, eyeOS would be a pretty weak desktop product: the calendaring application cannot subscribe to remote calendars, the word processor is minimalist, the calculator is four-function only. It is confusing to me why eyeOS 1.x needs to have a web browser application (although I was pleased to discover that you can run eyeOS from within the eyeOS browser).

Beyond the desktop

Perhaps eyeOS defenders would point me towards the still-growing library of additional applications available for installation as a way to enhance the experience. To a point, they are entirely right: if you run your own server, you can provide a considerably richer environment for your eyeOS users. It is even possible to enable Microsoft Office file format support within eyeOS's applications; this is done by installing OpenOffice.org on the server, along with the xvfb X server. There are eyeOS applications that enhance the default desktop experience with better PDF support, improved email, and additional communication tools like IRC.

But ultimately I am not persuaded that running eyeOS applications within the eyeOS environment in the browser offers a better computing experience than does simply running existing open source web applications. If you browse the eyeos-apps.org application repository, most of the non-trivial applications are ports of existing projects, like RoundCube, Moodle, or Zoho. Considering that you need access to a full LAMP stack to run your own instance of eyeOS, I see little advantage to running any of those applications within the containerized environment, simply because it emulated the existence of a desktop underneath. It is certainly not easier to deploy RoundCube within eyeOS than it is to deploy it on your own server, nor is it easier to secure, nor will it run faster or be easier for users to learn.

As always, there is a trade-off involved, including the configuration work required when you are talking about supporting a large group of users. In my estimation, the default eyeOS applications don't provide a powerful enough experience to say that its simpler deployment process ultimately makes the administrator's job easier than individually setting up other open source file sharing and collaboration tools. EyeOS has a basic new user "skeleton" directory system, but at the moment lacks robust tools for managing and pre-configuring applications for big deployments.

Standing alone, the term "web desktop" could be interpreted to mean a variety of different things. ChromeOS, for example, tries to be a "web desktop" by replacing all client-side applications with web apps. On the other end of the spectrum, more and more GNOME and KDE applications are gaining the ability to seamlessly integrate network collaboration, and with the popularity of "cloud storage" services like Ubuntu One, Dropbox, and the like, it is even possible for a Linux user to store his or her desktop preferences on a remote server, thus making the same environment available everywhere. EyeOS is certainly an innovative approach to the "web desktop," but at the moment, I'm not sure it offers a compelling advantage over the web-and-desktop integration already occurring in other areas.

Comments (1 posted)

Page editor: Jonathan Corbet

Security

Linux autorun vulnerabilities?

By Jake Edge
February 9, 2011

The Windows "AutoRun" feature, which automatically (or semi-automatically after a user prompt) runs programs from removable storage devices, has been a regular source of security problems. It has been present since Windows 95, but Microsoft finally recognized the problem and largely disabled the "feature" in Windows 7—and issued an update on February 8 that disables it for XP and Vista. Various attacks (ab)used AutoRun on USB storage devices to propagate, including Conficker and Stuxnet. Could Linux suffer from a similar flaw? The answer, from a SchmooCon 2011 presentation, is, perhaps unsurprisingly, "yes".

At SchmooCon, Jon Larimer demonstrated a way to circumvent the screensaver lock on an Ubuntu 10.10 system just by inserting a USB storage device. Because the system will automatically mount the USB drive and the Nautilus file browser will try to thumbnail any documents it finds there, he was able to shut down the screensaver and access the system. While his demo disabled both address-space layout randomization (ASLR) and AppArmor, that was only done to make the demo run quickly. On 32-bit systems, ASLR can be brute-forced to find needed library addresses, given some time. AppArmor is more difficult to bypass, but he has some plausible ideas on doing that as well.

Larimer's exploit took advantage of a hole in the evince-thumbnailer, which was fixed back in January (CVE-2010-2640). A crafted DVI file could be constructed and used to execute arbitrary code when processed by evince. In his presentation [PDF], he shows in some detail how to use this vulnerability to execute a program stored on the USB device.

Killing the screensaver is just one of the things that could be done from that shell script, of course. Larimer points to possibilities like putting a .desktop file into ~/.config/autostart, which will then be executed every time the user logs in. The same kind of thing could be done using .bash_profile or similar files. Either of those could make for a Conficker-like attack against Linux systems. In addition, because the user is logged in, any encrypted home directory or partition will be decrypted and available for copying the user's private data.

While Larimer's demonstration is interesting, even though the specifics of his attack may be of little practical use, there is much to be considered in the rest of his presentation. As he points out, automatically mounting USB storage devices and accessing their contents invokes an enormous amount of code, from the USB drivers and filesystem code, to the desktop daemons and applications that display the contents of those devices. Each of those components could have—many have had—security vulnerabilities.

That should give anyone pause about automatically mounting those kinds of devices. One could certainly imagine crafted devices or filesystems that exploit holes in the kernel code, which would be a route that would likely avoid AppArmor (or SELinux) entirely. While Linux may not automatically run code from USB storage devices, it does enough processing of the, quite possibly malicious, data on them that the effect may be largely the same.

Larimer offers some recommendations to avoid this kind of problem, starting with the obvious: turn off auto-mounting of removable storage. He also recommends disabling the automatic thumbnailing of files on removable media. In addition, using grsecurity/PaX makes brute-forcing ASLR harder on 32-bit systems because it uses more bits of entropy to randomize the library locations. Of course, a 64-bit system allows a much wider range of potential library addresses, so that makes breaking ASLR harder still.

One clear theme of his talk is that "automatically" doing things can be quite dangerous. It may be easier and more convenient, but it can also lead to potentially serious holes. Convenience and security are often at odds.

Comments (16 posted)

Brief items

Security quotes of the week

The world of open source is full of cases where openness of information and process allow properly-functioning open-by-rule communities to address security issues fast. This is the real meaning of the idea that open source is good for security; no magic, just symbiosis.
-- Simon Phipps

Okay, so he's an idiot. And a bastard. But the real piece of news here is how easy it is for a UK immigration officer to put someone on the no-fly list with absolutely no evidence that that person belongs there. And how little auditing is done on that list. Once someone is on, they're on for good.

That's simply no way to run a free country.

-- Bruce Schneier

Comments (2 posted)

PostgreSQL 9.0.3, 8.4.7, 8.3.14 and 8.2.20 released

The PostgreSQL project has issued a new set of releases to fix a security problem. "This update includes a security fix which prevents a buffer overrun in the contrib module intarray's input function for the query_int type. This bug is a security risk since the function's return address could be overwritten by malicious code." Sites which are not using the "intarray" contrib module are not vulnerable.

Full Story (comments: none)

Mozilla has published version 2.0 of its CA Certificate Policy

An updated version of the Mozilla CA Certificate Policy has been released. The policy governs how Mozilla will add Certification Authorities' (CAs) root certificates into Mozilla products, the responsibilities of the CAs so that their certificates remain in the Mozilla root stores, and how the policy will be enforced. The changes made from version 1.2 of the policy can be tracked in Mozilla bug #609945.

Comments (none posted)

New vulnerabilities

asterisk: arbitrary code execution

Package(s):asterisk CVE #(s):CVE-2011-0495
Created:February 4, 2011 Updated:February 21, 2011
Description: From the CVE entry:

Stack-based buffer overflow in the ast_uri_encode function in main/utils.c in Asterisk Open Source before 1.4.38.1, 1.4.39.1, 1.6.1.21, 1.6.2.15.1, 1.6.2.16.1, 1.8.1.2, 1.8.2.; and Business Edition before C.3.6.2; when running in pedantic mode allows remote authenticated users to execute arbitrary code via crafted caller ID data in vectors involving the (1) SIP channel driver, (2) URIENCODE dialplan function, or (3) AGI dialplan function.

Alerts:
Debian DSA-2171-1 asterisk 2011-02-21
Fedora FEDORA-2011-0794 asterisk 2011-01-26
Fedora FEDORA-2011-0774 asterisk 2011-01-26

Comments (none posted)

bugzilla: multiple vulnerabilities

Package(s):bugzilla CVE #(s):CVE-2010-4568 CVE-2010-2761 CVE-2010-4411 CVE-2010-4572 CVE-2010-4569 CVE-2010-4570 CVE-2010-4567 CVE-2011-0048 CVE-2011-0046
Created:February 3, 2011 Updated:October 10, 2011
Description:

From the bugzilla advisory:

CVE-2010-4568: It was possible for a user to gain unauthorized access to any Bugzilla account in a very short amount of time (short enough that the attack is highly effective). This is a critical vulnerability that should be patched immediately by all Bugzilla installations.

CVE-2010-2761, CVE-2010-4411, CVE-2010-4572: By inserting particular strings into certain URLs, it was possible to inject both headers and content to any browser.

CVE-2010-4569: Bugzilla 3.7.x and 4.0rc1 have a new client-side autocomplete mechanism for all fields where a username is entered. This mechanism was vulnerable to a cross-site scripting attack.

CVE-2010-4570: Bugzilla 3.7.x and 4.0rc1 have a new mechanism on the bug entry page for automatically detecting if the bug you are filing is a duplicate of another existing bug. This mechanism was vulnerable to a cross-site scripting attack.

CVE-2010-4567, CVE-2011-0048: Bugzilla has a "URL" field that can contain several types of URL, including "javascript:" and "data:" URLs. However, it does not make "javascript:" and "data:" URLs into clickable links, to protect against cross-site scripting attacks or other attacks. It was possible to bypass this protection by adding spaces into the URL in places that Bugzilla did not expect them. Also, "javascript:" and "data:" links were *always* shown as clickable to logged-out users.

CVE-2011-0046: Various pages were vulnerable to Cross-Site Request Forgery attacks. Most of these issues are not as serious as previous CSRF vulnerabilities. Some of these issues were only addressed on more recent branches of Bugzilla and not fixed in earlier branches, in order to avoid changing behavior that external applications may depend on. The links below in "References" describe which issues were fixed on which branches.

Alerts:
Gentoo 201110-03 bugzilla 2011-10-10
Debian DSA-2322-1 bugzilla 2011-10-10
Ubuntu USN-1129-1 perl 2011-05-03
SUSE SUSE-SR:2011:005 hplip, perl, subversion, t1lib, bind, tomcat5, tomcat6, avahi, gimp, aaa_base, build, libtiff, krb5, nbd, clamav, aaa_base, flash-player, pango, openssl, subversion, postgresql, logwatch, libxml2, quagga, fuse, util-linux 2011-04-01
SUSE SUSE-SR:2011:003 gnutls, tomcat6, perl-CGI-Simple, pcsc-lite, obs-server, dhcp, java-1_6_0-openjdk, opera 2011-02-08
Fedora FEDORA-2011-0741 bugzilla 2011-01-25
Fedora FEDORA-2011-0755 bugzilla 2011-01-25

Comments (none posted)

dhcp: denial of service

Package(s):dhcp CVE #(s):CVE-2011-0413
Created:February 4, 2011 Updated:April 19, 2011
Description: From the CVE entry:

The DHCPv6 server in ISC DHCP 4.0.x and 4.1.x before 4.1.2-P1, 4.0-ESV and 4.1-ESV before 4.1-ESV-R1, and 4.2.x before 4.2.1b1 allows remote attackers to cause a denial of service (assertion failure and daemon crash) by sending a message over IPv6 for a declined and abandoned address.

Alerts:
Fedora FEDORA-2011-0848 dhcp 2011-01-28
Debian DSA-2184-1 isc-dhcp 2011-03-05
Red Hat RHSA-2011:0256-01 dhcp 2011-02-15
Pardus 2011-36 dhcp 2011-02-14
SUSE SUSE-SR:2011:003 gnutls, tomcat6, perl-CGI-Simple, pcsc-lite, obs-server, dhcp, java-1_6_0-openjdk, opera 2011-02-08
Mandriva MDVSA-2011:022 dhcp 2011-02-07
openSUSE openSUSE-SU-2011:0098-1 dhcp 2011-02-04

Comments (none posted)

exim: symlink attack

Package(s):exim CVE #(s):CVE-2011-0017
Created:February 8, 2011 Updated:February 22, 2011
Description: From the CVE entry:

The open_log function in log.c in Exim 4.72 and earlier does not check the return value from (1) setuid or (2) setgid system calls, which allows local users to append log data to arbitrary files via a symlink attack.

Alerts:
Gentoo 201401-32 exim 2014-01-27
SUSE SUSE-SR:2011:004 exim, krb5, git, dbus-1 2011-02-22
Ubuntu USN-1060-1 exim4 2011-02-10
openSUSE openSUSE-SU-2011:0105-1 exim 2011-02-08

Comments (none posted)

kernel: denial of service

Package(s):kernel CVE #(s):
Created:February 8, 2011 Updated:February 9, 2011
Description: From rPath RPL-3199:

When Intel VT is enabled in the BIOS of some systems which use intel_iommu, a kernel oops, and possibly a system crash, may occur. Adding intel_iommu=off to the boot parameter list works around the issue.

Alerts:
rPath rPSA-2011-0010-1 kernel 2011-02-07

Comments (none posted)

krb5: denial of service

Package(s):krb5 CVE #(s):CVE-2010-4022 CVE-2011-0281 CVE-2011-0282
Created:February 9, 2011 Updated:April 15, 2011
Description: The krb5 server suffers from three independent vulnerabilities allowing a remote attacker to crash or hang the "key distribution center" process.
Alerts:
Gentoo 201201-13 mit-krb5 2012-01-23
CentOS CESA-2011:0199 krb5 2011-04-14
Pardus 2011-48 mit-kerberos 2011-02-28
SUSE SUSE-SR:2011:004 exim, krb5, git, dbus-1 2011-02-22
Fedora FEDORA-2011-1210 krb5 2011-02-09
Fedora FEDORA-2011-1225 krb5 2011-02-09
Ubuntu USN-1062-1 krb5 2011-02-15
openSUSE openSUSE-SU-2011:0111-1 krb5 2011-02-14
Red Hat RHSA-2011:0199-01 krb5 2011-02-08
Mandriva MDVSA-2011:025 krb5 2011-01-09
Mandriva MDVSA-2011:024 krb5 2011-01-09
Red Hat RHSA-2011:0200-01 krb5 2011-02-08

Comments (none posted)

opera: multiple vulnerabilities

Package(s):Opera CVE #(s):CVE-2011-0681 CVE-2011-0682 CVE-2011-0683 CVE-2011-0684 CVE-2011-0685 CVE-2011-0686 CVE-2011-0687
Created:February 7, 2011 Updated:February 9, 2011
Description: From the CVE entries:

Opera before 11.01 does not properly restrict the use of opera: URLs, which makes it easier for remote attackers to conduct clickjacking attacks via a crafted web site. (CVE-2011-0683)

Opera before 11.01 does not properly handle redirections and unspecified other HTTP responses, which allows remote web servers to obtain sufficient access to local files to use these files as page resources, and consequently obtain potentially sensitive information from the contents of the files, via an unknown response manipulation. (CVE-2011-0684)

The Delete Private Data feature in Opera before 11.01 does not properly implement the "Clear all email account passwords" option, which might allow physically proximate attackers to access an e-mail account via an unattended workstation. (CVE-2011-0685)

Unspecified vulnerability in Opera before 11.01 allows remote attackers to cause a denial of service (application crash) via unknown content on a web page, as demonstrated by vkontakte.ru. (CVE-2011-0686)

Opera before 11.01 does not properly implement Wireless Application Protocol (WAP) dropdown lists, which allows user-assisted remote attackers to cause a denial of service (application crash) via a crafted WAP document. (CVE-2011-0687)

The Cascading Style Sheets (CSS) Extensions for XML implementation in Opera before 11.01 recognizes links to javascript: URLs in the -o-link property, which makes it easier for remote attackers to bypass CSS filtering via a crafted URL. (CVE-2011-0681)

Opera before 11.01 does not properly handle large form inputs, which allows remote attackers to execute arbitrary code or cause a denial of service (memory corruption) via a crafted HTML document. (CVE-2011-0682)

Alerts:
Gentoo 201206-03 opera 2012-06-15
SUSE SUSE-SR:2011:003 gnutls, tomcat6, perl-CGI-Simple, pcsc-lite, obs-server, dhcp, java-1_6_0-openjdk, opera 2011-02-08
openSUSE openSUSE-SU-2011:0103-1 Opera 2011-02-07

Comments (none posted)

postgresql: arbitrary code execution

Package(s):postgresql-8.3 CVE #(s):CVE-2010-4015
Created:February 4, 2011 Updated:April 15, 2011
Description: From the CVE entry:

Buffer overflow in the gettoken function in contrib/intarray/_int_bool.c in the intarray array module in PostgreSQL 9.0.x before 9.0.3, 8.4.x before 8.4.7, 8.3.x before 8.3.14, and 8.2.x before 8.2.20 allows remote authenticated users to cause a denial of service (crash) and possibly execute arbitrary code via integers with a large number of digits to unspecified functions.

Alerts:
Gentoo 201110-22 postgresql-base 2011-10-25
CentOS CESA-2011:0198 postgresql84 2011-04-14
CentOS CESA-2011:0197 postgresql 2011-04-14
SUSE SUSE-SR:2011:005 hplip, perl, subversion, t1lib, bind, tomcat5, tomcat6, avahi, gimp, aaa_base, build, libtiff, krb5, nbd, clamav, aaa_base, flash-player, pango, openssl, subversion, postgresql, logwatch, libxml2, quagga, fuse, util-linux 2011-04-01
openSUSE openSUSE-SU-2011:0254-1 postgresql 2011-03-31
Pardus 2011-37 postgresql-doc postgresql-lib postgresql-pl postgresql-server 2011-02-14
Fedora FEDORA-2011-0963 postgresql 2011-02-01
Fedora FEDORA-2011-0990 postgresql 2011-02-01
Mandriva MDVSA-2011:021 postgresql 2011-02-07
CentOS CESA-2011:0197 postgresql 2011-02-04
Ubuntu USN-1058-1 postgresql-8.1, postgresql-8.3, postgresql-8.4 2011-02-03
Red Hat RHSA-2011:0198-01 postgresql84 2011-02-03
Red Hat RHSA-2011:0197-01 postgresql 2011-02-03
Debian DSA-2157-1 postgresql-8.3 2011-02-03

Comments (none posted)

vlc: code execution

Package(s):vlc vlc-firefox CVE #(s):CVE-2011-0522
Created:February 3, 2011 Updated:February 9, 2011
Description:

From the VUPEN advisory:

Two vulnerabilities have been identified in VLC Media Player, which could be exploited by attackers to compromise a vulnerable system. These issues are caused by buffer overflow errors in the "StripTags()" function within the USF and Text subtitles decoders ["modules/codec/subtitles/subsdec.c" and "modules/codec/subtitles/subsusf.c"] when processing malformed data, which could be exploited by attackers to crash an affected application or execute arbitrary by convincing a user to open a malicious media file.

Alerts:
Gentoo 201411-01 vlc 2014-11-05
Pardus 2011-23 vlc vlc-firefox 2011-02-02

Comments (none posted)

Page editor: Jake Edge

Kernel development

Brief items

Kernel release status

The current development kernel is 2.6.38-rc4, released on February 7. "There's nothing much that stands out here. Some arch updates (arm and powerpc), the usual driver updates: dri (radeon/i915), network cards, sound, media, scisi, some filesystem updates (cifs, btrfs), and some random stuff to round it all out (networking, watchpoints, tracepoints, etc)." The short-form changelog is in the announcement, or see the full changelog for all the details.

Stable updates: the 2.6.35.11 long-term update was released on February 7 with a long list of important fixes.

The 2.6.27.58 update was released on February 9. This one contains a couple dozen important fixes.

Comments (none posted)

Quotes of the week

Anyone who programs ATA controllers on the basis of common sense rather than documentation, errata sheets and actually testing rather than speculating is naïve.
-- Alan Cox

I'm invoking the anti-discrimination statutes here on behalf of those of us who don't like beer.
-- James Bottomley pushes the limits of tolerance

Realize that 50% of today's professional programmers have never written a line of code that had to be compiled.
-- Casey Schaufler

As you can see in these posts, Ralink is sending patches for the upstream rt2x00 driver for their new chipsets, and not just dumping a huge, stand-alone tarball driver on the community, as they have done in the past. This shows a huge willingness to learn how to deal with the kernel community, and they should be strongly encouraged and praised for this major change in attitude.
-- Greg Kroah-Hartman

There are no "rules", things have to work and that is the only rule.
-- Markus Rechberger (context here)

Comments (15 posted)

Linus Torvalds: looking back, looking forward (ITWire)

ITWire has posted a lengthy interview with Linus Torvalds. "On the other hand, one of the things I've always enjoyed in Linux development has been how it's stayed interesting by evolving. So maybe it's less 'fun' in the crazy-go-lucky sense, but on the other hand the much bigger development team and the support brought in by all the companies around Linux has also added its own very real fun. It's a lot more social, for example. So the project may have lost something, but it gained something else to compensate."

Comments (32 posted)

Increasing the TCP initial congestion window

By Jonathan Corbet
February 9, 2011
The TCP slow start algorithm, initially developed by Van Jacobson, was one of the crucial protocol tweaks which made TCP/IP actually work on the Internet. Slow start works by limiting the amount of data which can be in flight over a new connection and ramping the transmission speed up slowly until the carrying capacity of the connection is found. In this way, TCP is able to adapt to the actual conditions on the net and avoid overloading routers with more data than can be accommodated. A key part of slow start is the initial congestion window, which puts an upper bound on how much data can be in flight at the very beginning of a conversation.

That window has been capped by RFC 3390 at four segments (just over 4KB) for the better part of a decade. In the meantime, connection speeds have increased and the amount of data sent over a given connection has grown despite the fact that connections live for shorter periods of time. As a result, many connections never ramp up to their full speed before they are closed, so the four-segment limit is now seen as a bottleneck which increases the latency of a typical connection considerably. That is one reason why contemporary browsers use many connections in parallel, despite the fact that the HTTP specification says that a maximum of two connections should be used.

Some developers at Google have been agitating for an increase in the initial congestion window for a while; in July 2010 they posted an IETF draft pushing for this change and describing the motivation behind it. Evidently Google has run some large-scale tests and found that, by increasing the initial congestion window, user-visible latencies can be reduced by 10% without creating congestion problems on the net. They thus recommend that the window be increased to 10 segments; the draft suggests that 16 might actually be a better value, but more testing is required.

David Miller has posted a patch increasing the window to 10; that patch has not been merged into the mainline, so one assumes it's meant for 2.6.39.

Interestingly, Google's tests worked with a number of operating systems, but not with Linux, which uses a relatively small initial receive window of 6KB. Most other systems, it seems, use 64KB instead. Without a receive window at least as large as the congestion window, a larger initial congestion window will have little effect. That problem will be fixed in 2.6.38, thanks to a patch from Nandita Dukkipati raising the initial receive window to 10 segments.

Comments (23 posted)

Bounding GPL compliance times

A quick look at any development conference will reveal that quite a few Linux hackers are currently carrying phones made by HTC. They obviously like the hardware, but kernel developers have been getting increasingly annoyed by HTC's policy of delaying source code releases for up to 120 days after a given handset ships. In response, Matthew Garrett has suggested an addition to the top-level COPYING file in the kernel source:

While this version of the GPL does not place an explicit timeframe upon fulfilment of source distribution under section 3(b), it is the consensus viewpoint of the Linux community that such distribution take place as soon as is practical and certainly no more than 14 days after a request is made.

About the only response so far has been from Alan Cox, who has suggested that getting a lawyer's opinion on the matter might be useful. Linus, over whose name the new text would appear, has not commented on it. So it's not clear if the change will go in or whether it will inspire any changes in vendor behavior if it is merged. But it does, at least, make the developers' feelings on the matter known.

Comments (8 posted)

Kernel development news

Removing ext2 and/or ext3

By Jonathan Corbet
February 9, 2011
The ext4 filesystem has, at this point, moved far beyond its experimental phase. It is now available in almost all distributions and is used by default in many of them. Many users may soon be in danger of forgetting that the ext2 and ext3 filesystems even exist in the kernel. But those filesystems do exist, and they require ongoing resources to maintain. Keeping older, stable filesystems around makes sense when the newer code is stabilizing, but somebody is bound to ask, sooner or later, whether it is time to say "goodbye" to the older code.

The question, as it turns out, came sooner - February 3, to be exact - when Jan Kara suggested that removing ext2 and ext3 could be discussed at the upcoming storage, filesystems, and memory management summit. Jan asked:

Of course it costs some effort to maintain them all in a reasonably good condition so once in a while someone comes and proposes we should drop one of ext2, ext3 or both. So I'd like to gather input what people think about this - should we ever drop ext2 / ext3 codebases? If yes, under what condition do we deem it is OK to drop it?

One might protest that there will be existing filesystems in the ext3 (and even ext2) formats for the indefinite future. Removing support for those formats is clearly not something that can be done. But removing the ext2 and/or ext3 code is not the same as removing support: ext4 has been very carefully written to be able to work with the older formats without breaking compatibility. One can mount an ext3 filesystem using the ext4 code and make changes; it will still be possible to mount that filesystem with the ext3 code in the future.

So it is possible to remove ext2 and ext3 without breaking existing users or preventing them from going back to older implementations. Beyond that, mounting an ext2/3 filesystem under ext4 allows the system to use a number of performance enhancing techniques - like delayed allocation - which do not exist in the older implementations. In other words, ext4 can replace ext2 and ext3, maintain compatibility, and make things faster at the same time. Given that, one might wonder why removing the older code even requires discussion.

There appear to be a couple of reasons not to hurry into this change, both of which have to do with testing. As Eric Sandeen noted, some of the more ext3-like options are not tested as heavily as the native modes of operation:

ext4's more, um ... unique option combinations probably get next to no testing in the real world. So while we can say that noextent, nodelalloc is mostly like ext3, in practice, does that ever really get much testing?

There is also concern that ext4, which is still seeing much more change than its predecessors, is more likely to introduce instabilities. That's a bit of a disturbing idea; there are enough production users of ext4 now that the introduction of serious bugs would not be pleasant. But, again, the backward-compatible operating modes of ext4 may not be as heavily tested as the native mode, so one might argue that operation with older filesystems is more likely to break regardless of how careful the developers are.

So, clearly, any move to get rid of ext2 and ext3 would have to be preceded by the introduction of better testing for the less-exercised corners of ext4. The developers involved understand that clearly, so there is no need to be worried that the older code could be removed too quickly.

Meanwhile, there are also concerns that the older code, which is not seeing much developer attention, could give birth to bugs of its own. As Jan put it:

The time I spend is enough to keep ext3 in a good shape I believe but I have a feeling that ext2 is slowly bitrotting. Sometime when I look at ext2 code I see stuff we simply do differently these days and that's just a step away from the code getting broken... It would not be too much work to clean things up and maintain but it's a work with no clear gain (if you do the thankless job of maintaining old code, you should at least have users who appreciate that ;) so naturally no one does it.

Developers have also expressed concern that new filesystem authors might copy code from ext2, which, at this point, does not serve as a good example for how Linux filesystems should be written.

The end result is that, once the testing concerns have been addressed, everybody involved might be made better off by the removal of ext2 and ext3. Users with older filesystems would get better performance and a code base which is seeing more active development and maintenance. Developers would be able to shed an older maintenance burden and focus their efforts on a single filesystem going forward. Thanks to the careful compatibility work which has been done over the years, it may be possible to safely make this move in the relatively near future.

Comments (36 posted)

Supporting multiple LSMs

By Jake Edge
February 9, 2011

With some regularity, the topic of allowing multiple Linux Security Modules (LSMs) to all be active comes up in the Linux kernel community. There have been some attempts at "stacking" or "chaining" LSMs in the past, but nothing has ever made it into the mainline. On the other hand, though, every time a developer comes up with some kind of security hardening patch for the kernel, they are generally directed toward the LSM interface. Because the "monolithic" security solutions (like SELinux, AppArmor, and others) tend to have already taken the single existing LSM slot in many distributions, these simpler, more targeted LSMs are generally unable to be used. But a discussion on the linux-security-module mailing list suggests that work is being done that just might solve this problem.

The existing implementation of LSMs uses a single set of function pointers in a struct security_operations for the "hooks" that get called when access decisions need to be made. Once a security module gets registered (typically at boot time using the security= flag), its implementation is stored in the structure and any other LSM is out of luck. The idea behind LSM stacking would be to keep multiple versions of the security_operations structure around and to call each registered LSM's hooks for an access decision. While that sounds fairly straightforward, there are some subtleties that need to be addressed, especially if different LSMs give different answers for a particular access.

This problem with the semantics of "composing" two (or more) LSMs has been discussed at various points, without any real global solution for composing arbitrary LSMs. As Serge E. Hallyn warned over a year ago:

The problem is that composing any two security policies can quickly have subtle, unforeseen, but dangerous effects. That's why so far we have stuck with the status quo where only one LSM is 'active', but that LSM can manually call hooks from other LSMs.

There is one example of stacking LSMs as Hallyn describes in the kernel already; the capabilities LSM is called directly from other LSMs where necessary. That particular approach is not very general, of course, as LSM maintainers are likely to lose patience with adding calls for every other possible LSM. A more easily expandable solution is required.

David Howells posted a set of patches that would add that expansion mechanism. It does that by allowing multiple calls to the register_security() initialization function, each with its own set of security_operations. Instead of the current situation, where each LSM manages its own data for each kind of object (credentials, keys, files, inodes, superblocks, IPC, and sockets), Howell's security framework will allocate and manage that data for the LSMs.

The security_operations structure gets new *_data_size and *_data_offset fields for each kind of object, with the former filled in by the LSM before calling register_security() and the latter being managed by the framework. The data size field tells the framework how much space is needed for the LSM-specific data for that type of object, and the offset is used by the framework to find each LSM's private data. For struct cred, struct key, struct file, and struct super_block, the extra data for each registered LSM is tacked onto the end of the structure rather than going through an intermediate pointer (as is required for the others). Wrappers are defined that will allow an LSM to extract its data from an object based on the new fields in the operations table.

The framework then maintains a list of registered LSMs and puts the capabilities LSM in the first slot of the list. When one of the security hooks is called, the framework iterates over the list and calls the corresponding hook for each registered LSM. Depending on the specific hook, different kinds of iterators are used, but the usual iterator looks for a non-zero response from an LSM's hook, which would indicate a denial of some kind, and returns that to the framework. The other iterators are used for specialized calls, for example when there is no return value or when only the first hook found should be called. The upshot is that the hooks for registered LSMs get called in order (with capabilities coming first), and the first to deny the access "wins". Because the capabilities calls are pulled out separately, that also means that the other LSMs no longer have to make those calls themselves; instead the framework will handle it for them.

But there are a handful of hooks that do not work very well in a multi-LSM environment, in particular the secid (an LSM-specific security label ID) handling routines (e.g. secid_to_secctx(), task_getsecid(), etc.). Howells's current implementation just calls the hook of the first LSM it finds that implements it, which is not going to make it possible to use multiple LSMs that all implement those hooks (currently just SELinux and Smack). Howells's solution is to explicitly ban that particular combination:

I think the obvious thing is to reject any chosen module that implements any of these interfaces if we've already selected a module that implements them. That would mean you can choose either Smack or SELinux, but not both.

But Smack developer Casey Schaufler isn't convinced that is the right course: "That kind of takes the wind out of the sails, doesn't it?" He would rather see a more general solution that allows multiple secids, and the related secctxs (security contexts), to be handled by the framework:

It does mean that there needs to be a standard for a secctx that allows for the presence of multiple concurrent LSMs. There will have to be an interface whereby either the composer/stacker can break a secctx into its constituent parts or with which an LSM can pull the bit it cares about out. In either case the LSMs may need to be [updated] to accept a secctx in a standardized format.

Another interesting part of Schaufler's message is that he has been working on an "alternative approach" to the multi-LSM problem that he calls "Glass". The code is, as yet, unreleased, but Schaufler describes Glass as an LSM that composes other LSMs:

The Glass security blob is an array of pointers, one for each available LSM, including commoncap, which is always in the last slot. The Glass LSM is always registered first. As subsequent LSMs register they are added to the glass LSM vector. When a hook is invoked glass goes through its vector and if the LSM provides a hook it gets called, and the return remembered. If any other LSM provided a hook the commoncap hook is skipped, but if no LSM was invoked commoncap is called.

Unlike Howells's proposal, Glass would leave the calls to the capabilities LSM (aka commoncap) in the existing LSMs, and only call commoncap if no LSM implemented a given hook. The idea is that the LSMs already handle the capabilities calls in their hooks as needed, so it is only when none of those get called that requires a call into commoncap. In addition, Glass leaves the allocation and management of the security "blobs" (LSM-specific data for objects) to the LSMs rather than centralizing them in the framework as Howells's patches do.

In addition to various other differences, there is a more fundamental difference in the way that the two solutions handle multiple LSMs that all have hooks for a particular security operation. Glass purposely calls each hook in each registered LSM, whereas Howells's proposal typically short-circuits the chain of hooks once one of them has denied the access. Schaufler's idea is that an LSM should be able to maintain state, which means that skipping its hooks could potentially skew the access decision:

My dreaded case is an LSM that bases controls on statistical frequency of access to files. There is no way you could skip any of its hooks, and I don't see off hand any file access hook it wouldn't use. I have heard people (think credit card companies) suggest such things, so although I don't have use for it I can't discount the potential for it.

There are plenty of other issues to resolve, including things like handling /proc/self/attr/current (which contains the security ID for the current process) because various user-space programs already parse the output of that file, though it is different depending on which LSM is active. A standardized format for that file, which takes multiple LSMs into account, might be better, but it would break the kernel ABI and is thus not likely to pass muster. Overall, though, Howells and Schaufler were making some good progress on defining the requirements for supporting multiple LSMs. Schaufler is optimistic that the collaboration will bear fruit: "I think that we may be able to get past the problems that have held multiple LSMs back this time around."

So far, there is only the code from Howells to look at, but Schaufler has promised to make Glass available soon. With luck, that will lead to a multi-LSM solution that the LSM developers can coalesce behind, whether it comes from Howells, Schaufler, or a collaboration between them. There may still be a fair amount of resistance from Linus Torvalds and other kernel hackers, but the lack of any way to combine LSMs comes up too often for it to be ignored forever.

Comments (2 posted)

Mesh networking with batman-adv

By Jonathan Corbet
February 8, 2011
Your editor has recently seen two keynote presentations on two continents which, using two very different styles, conveyed the same message: the centralization of the Internet and the services built on it has given governments far too much control. Both speakers called for an urgent effort to decentralize the net at all levels, including the transport level. An Internet without centralized telecommunications infrastructure can be hard to envision; when people try the term that usually comes out is "mesh networking." As it happens, the kernel has a mesh networking implementation which made the move from the staging tree into the mainline proper in 2.6.38.

Mesh networking, as its name implies, is meant to work via a large number of short-haul connections without any sort of centralized control. A proper mesh network should configure itself dynamically, responding to the addition and removal of nodes and changes in connectivity. In a well-functioning mesh, networking "just happens" without high-level coordination; such a net should be quite hard to disrupt. What the kernel offers now falls somewhat short of that ideal, but it is a good demonstration of how hard mesh networking can be.

The "Better Approach To Mobile Ad-hoc Networking" (BATMAN) protocol is described in this draft RFC. A BATMAN mesh is made up of a set of "originators" which communicate via network interfaces - normal wireless interfaces, for example. Every so often, each originator sends out an "originator message" (OGM) as a broadcast to all of its neighbors to tell the world that it exists. Each neighbor is supposed to note the presence of the originator and forward the message onward via a broadcast of its own. Thus, over time, all nodes in the mesh should see the OGM, possibly via multiple paths, and thus each node will know (1) that it can reach the originator, and (2) which of its neighbors has the best path to that originator. Each node maintains a routing table listing every other node it has ever heard of and the best neighbor by which to reach each one.

This protocol has the advantage of building and maintaining the routing tables on the fly; no central coordination is needed. It should also find near-optimal routes to each. If a node goes away, the routing tables will reconfigure themselves to function in its absence. There is also no node in the network which has a complete view of how the mesh is built; nodes only know who is out there and the best next hop. This lack of knowledge should add to the security and robustness of the mesh.

Nodes with a connection to the regular Internet can set a bit in their OGMs to advertise that fact; that allows others without such a connection to route packets to and from the rest of the world.

The original BATMAN protocol uses UDP for the OGM messages. That design allows routing to be handled with the normal kernel routing tables, but it also imposes a couple of unfortunate constraints: nodes must obtain an IP address from somewhere before joining the mesh, and the protocol is tied to IPv4. The BATMAN-adv protocol found in the Linux kernel has changed a few things to get around these problems, making it a rather more flexible solution. BATMAN-adv works entirely at the link layer, exchanging non-UDP OGMs directly with neighboring nodes. The routing table is maintained within a special virtual network device, which makes all nodes on the mesh appear to be directly connected via that virtual interface. Thus the system can join the mesh before it has a network address, and any protocol can be run over the mesh.

BATMAN-adv removes some of the limitations found in BATMAN, but readers who have gotten this far will likely be thinking of the limitations that remain. The flooding of broadcast OGMs through the net can only scale so far before a significant amount of bandwidth is consumed by network overhead. The protocol trims OGMs which are obviously not of interest - those which describe a route which is known to be worse than others, for example - but the OGM traffic will still be significant if the mesh gets large. The routing tables will also grow, since every node must keep track of every other node in existence. The overhead for these tables is probably manageable for a mesh of 1,000 nodes; it is probably hopeless for 1,000,000 nodes. Mobile devices - which are targeted by this protocol - are especially likely to suffer as the table gets larger.

Security is also a concern in this kind of network. Simple bandwidth-consuming denial of service attacks would seem relatively straightforward. Sending bogus OGMs could cause the size of routing tables to explode or disrupt the routing within the mesh. A more clever attack could force traffic to route through a hostile node, enabling man-in-the-middle exploits. And so on. The draft RFC quickly mentions some of these issues, but it seems clear that security has not been a major design goal.

So it would seem clear that BATMAN-adv, while interesting, is not the solution to the problem of an overly-centralized network. It could be a useful way to extend connectivity through a building or small neighborhood, but it is not meant to operate on a large scale or in an overtly hostile environment. The bigger problem is a hard one to solve, to say the least. The experience gained with protocols like BATMAN-adv may will prove valuable in the search for that solution, but there is clearly some work to be done still.

Comments (5 posted)

Patches and updates

Kernel trees

Architecture-specific

Build system

Core kernel code

Development tools

Device drivers

Documentation

Filesystems and block I/O

Memory management

Networking

Security-related

Virtualization and containers

Benchmarks and bugs

Miscellaneous

Page editor: Jonathan Corbet

Distributions

FOSDEM: Collaboration (or the lack thereof) between distributions

By Jonathan Corbet
February 9, 2011
The Free and Open Source Developers' European Meeting (FOSDEM) is an interesting event. Entry is free, so there is no way to know how many people show up - except that the number is clearly large. The organization sometimes seems chaotic, the rooms packed beyond their capacity, and the hallways are often impassible. With over a dozen tracks running simultaneously, choosing what to see can be a challenge. But all the right people tend to attend, and a great deal of work and valuable discussion happens. As an example, consider the various distribution-related sessions described below which, as a whole, combine to give a good picture of what the distributors are concerned about.

Distribution collaboration manifesto. One session which arguably didn't live up to its potential was the "distribution collaboration manifesto." It did, however, let us see Debian leader Stefano Zacchiroli, Fedora leader Jared Smith, and openSUSE community manager Jos Poortvliet together on the same stage.

[Project leaders]

The discussion wandered around the topics of how nice it would be to cooperate more, cooperative application installer work, and making better use of the distributions list on freedesktop.org. It was friendly, but somewhat lacking in specifics.

Downstream packaging collaboration. A more focused session was led by Hans de Goede and Michal Hrušecký of Red Hat and openSUSE, respectively. According to Hans, packaging software is not normally a difficult task. When one is dealing with less-than-optimal upstreams, though, things get harder. One must make sure that the entire package is freely licensed; ancillary files (like artwork) can often be problematic. The package must be tweaked for filesystem hierarchy standard compliance, integration with distribution policies (including writing man pages if needed), fixing build problems, getting rid of bundled libraries, and fixing the occasional bug.

That's all just part of a packager's job, but, Hans asked, what should be done with the results of that work? The obvious thing to do is to send it back upstream, and to educate the upstream project about problems like bundled libraries. But what happens if the upstream is unresponsive - or if there is no functioning upstream at all? This situation arises more often than one might expect, especially with games, it seems. Assuming that the code itself is still worth shipping, it would make sense for distributors to work together to provide a working upstream for this kind of project.

Again, specific suggestions were relatively scarce, but Hans did say that having a set of package-specific email aliases at freedesktop.org would be useful. For any given problematic package (xaw3d was one such listed), packagers at each distribution could subscribe to the appropriate list to discuss the work they have done. The list would also receive commit notifications from each distributor's version control system, so everybody could see changes being made by other distributors, comment on them, and, perhaps, pick them up as well.

Michal talked about setting up a mechanism designed specifically to let packagers share patches. He seemed to envision a shared directory somewhere where packagers would put their specific changes; subsequent discussion made it clear that some people, at least, would rather see some sort of source code management system used. Michal also called for the adoption of a set of conventions for patch metadata to describe the purpose of the patch, who shipped it, etc. Bdale Garbee suggested from the audience that what people really seem to want is a set of simple pointers to everybody's git repositories. He added that anybody who is a package maintainer and does not know who his or her counterparts are in other distributions is failing at the job and needs to go out and start meeting people.

Forking is difficult. A rather different approach to collaboration - and the lack thereof - could be found in a session led by Anne Nicolas and Michael Scherer. Anne and Michael are two of the founders of the Mageia distribution, which is a fork of Mandriva. According to Anne, Mandriva was built on a good foundation and with a great "users first" policy, but, when things started to go bad, it was a "disturbing experience." From that experience Mageia was born.

Mageia is built on the principles of "people first" and trust in the community. The distribution wants to make life as easy as possible for both users and packagers. Actually getting there is proving to be a challenge, though, with every step on the way taking far longer than had been expected. There is now a legal association in place, though, and an initial pass at a design for project governance has been done. The build system is mostly ready, and training of packagers is underway.

In the process, the developers have found that simply forking an established distribution is a lot of work. It has taken about three months to get a base set of 4100 packages ready. As they do this job, they are trying to make the job of changing the name and the look of the distribution easier for the next group that has to take it on. That should improve life for anybody who might, down the road, choose to fork Mageia; it is also aimed at making the creation of Mageia derivatives easier.

The first Mageia alpha will, with luck, be released on February 15. Current plans are to make the first stable release on June 1. June is also the target for having the full organization and governance mechanism in place. This governance is expected to be made up of around ten teams, an elected council and an elected board.

The other challenge that the Mageia developers are facing is that of creating an "economic model" which will support the work going forward. From the discussion, it seems that the main source of income at the moment is donations and T-shirts, which is unlikely to sustain a serious effort in the long term.

Fixing Gentoo. Finally, Petteri Räty led a session on the reform and future of Gentoo. Contrary to what some people may think, the Gentoo distribution is alive and well with some 235 developers maintaining packages. That said, there are some issues which need attention.

Many of these problems are organizational; the project's meta-structure has been neglected over the years. There is little accountability for people working in specific roles. Nobody can really say what Gentoo projects are ongoing, and which of those are really alive. Nobody really knows what to do about dead projects either. The relationship between the Gentoo Council and the Gentoo Foundation is not particularly clear. And there is an unfortunate split between Gentoo's users and its developers. Mentoring for new developers is in short supply.

There are plans to reinvigorate Gentoo's meta-structure project, giving it the responsibility of tracking the other outstanding projects. That should give some visibility into what is going on. The current corporate structure was described as a "two-headed monster" that needs to be straightened out. To that end, the Gentoo Foundation is finally getting close to its US 501c(3) status, making it an official nonprofit organization. The Foundation is expected to handle legal issues and the distribution's "intellectual property," while the council will be charged with technical leadership.

In summary: it seems that the Gentoo project has a number of challenges to overcome, but the project remains strong and people are working on addressing the issues.

Conclusion: the FOSDEM cross-distro track included far more talks than are listed here; there's only so many that your editor was able to attend. It's clear that the conference served as a valuable meeting point for developers who are often working independently of each other. Linux distributors are, at one level, highly competitive with each other. But they are all based on the work of one community. If they can do more of their work at the community level, that will give each distributor more time to work on the things which makes their project special. The discussions at FOSDEM can only have helped to increase understanding and collaboration across distributions, and that must be a good thing.

Comments (12 posted)

Brief items

Distribution quotes of the week

I just squeezed Debian all over my BeagleBoards!
-- Jon Masters (posted on Facebook)

openSUSE is on a bit of an unusual release schedule. On one hand, you've got Fedora and Ubuntu which come out every six months (give or take, in the case of Fedora). On the other, you've got Debian, which comes out whenever the Hell the Debian team decides that it's bloody well ready. Somewhere in the middle, there's openSUSE, which is on an eight-month release cycle.
-- Joe "Zonker" Brockmeier (by way of Linux Magazine)

Comments (1 posted)

Debian squeeze (6.0) released

The Debian 6.0 release is now available. "Debian 6.0 includes over 10,000 new packages like the browser Chromium, the monitoring solution Icinga, the package management frontend Software Center, the network manager wicd, the Linux container tools lxc and the cluster framework Corosync. With this broad selection of packages, Debian once again stays true to its goal of being the universal operating system. It is suitable for many different use cases: from desktop systems to netbooks; from development servers to cluster systems; and for database, web or storage servers. At the same time, additional quality assurance efforts like automatic installation and upgrade tests for all packages in Debian's archive ensure that Debian 6.0 fulfils the high expectations that users have of a stable Debian release. It is rock solid and rigorously tested." The next development phase, code-named "wheezy," starts now.

Comments (40 posted)

Distribution News

Debian GNU/Linux

bits from the DPL: collab.

Stefano Zacchiroli has few (pre-squeeze release) bits on collaboration, communication, delegations, and several other topics.

Full Story (comments: none)

squeeezy - wheezy ready for development

Debian 6.0 "squeeze" is out and Wheezy is open for development. The first point release for Squeeze is planned for about a month from now with Jonathan Wiltshire coordinating security fixes. Wheezy will bring in some changes: "In terms of expected larger changes, the upload of KDE 4.6 to the archive is anticipated in early March, the Ocaml team would like to move to a new upstream version and GNOME 3 is due for release in April. The GNOME team is already staging packages in experimental but this is a major upstream release and will certainly lead to its fair share of disruption when it hits unstable."

Full Story (comments: 1)

The Debian website has a new layout

The Debian website has received a face lift. "After about 13 years with nearly the same design, the layout and design of the website changed with today's release of Debian Squeeze."

Full Story (comments: 1)

Fedora

FUDCon EMEA Bidding now open

The bidding process has opened for FUDCon EMEA 2011. "Any interested parties are invited to submit their bids. Once you have prepared a bid, please send an email to the fudcon-planning list. Bids will be accepted up until the end of the day on March 15th, 2011."

Full Story (comments: none)

Ubuntu family

Natty Alpha-2 Released

Natty Narwhal Alpha 2 (Ubuntu 11.04) is available for testing. New packages showing up for the first time include LibreOffice 3.3 (has replaced OpenOffice.org 3.2), X.org Server 1.10 and Mesa 7.10, and Linux Kernel 2.6.38-rc2. "Unity is now the default in the Ubuntu Desktop session. It is only partially implemented at this stage, so keep an eye on the daily builds, new features and bug fixes are emerging daily!"

Full Story (comments: none)

Minutes from the Ubuntu Technical Board meeting, 2011-02-08

Click below for the minutes from the February 8 meeting of the Ubuntu Technical Board. Topics include default ntpd configuration and a review of open bugs.

Full Story (comments: none)

Newsletters and articles of interest

Distribution newsletters

Comments (none posted)

Honeycomb is here: Google unveils Android 3.0, new Web-based Market (ars technica)

Ars technica reports on the launch event for Google's Android 3.0 ("Honeycomb"), which has a new UI targeted for tablet devices. "The company also emphasized its commitment to offering richer APIs for developers. Google regards the home screen as a part of the platform, not just a dumping ground for application icons. The home screen widget system has been greatly enhanced to enable the development of more interactive data-driven widgets. The notification system also got an overhaul for Honeycomb, making it possible for developers to expose more information and interactive functionality in notifications."

Comments (13 posted)

Why Debian matters more than ever (NetworkWorld)

Joe "Zonker" Brockmeier looks at the relevance of Debian. "Debian is the raw material that is used to create Ubuntu, Linux Mint, and dozens of other Linux distributions that look more modern and easier to use. However, the reason that Ubuntu and the rest are able to ship fancified Linux distros that are easier to use is because they're able to start with Debian. If the Debian Project ceased tomorrow, it would be an enormous — possibly fatal — blow to its derivatives."

Comments (none posted)

Page editor: Rebecca Sobol

Development

Moving to Python 3

February 9, 2011

This article was contributed by Ian Ward

Python 3.0 was released at the end of 2008, but so far only a relatively small number of packages have been updated to support the latest release; the majority of Python software still only supports Python 2. Python 3 introduced changes to Unicode and string handling, module importing, integer representation and division, print statements, and a number of other differences. This article will cover some of the changes that cause the most problems when porting code from Python 2 to Python 3, and will present some strategies for managing a single code base that supports both major versions.

The changes that made it into Python 3 were originally part of a plan called "Python 3000" as sort of a joke about language changes that could only be done in the distant future. The changes made up a laundry list of inconsistencies and inconvenient designs in the Python language that would have been really nice to fix, but had to wait because fixing them meant breaking all existing Python code. Eventually the weight of all the changes led the Python developers to decide to just fix the problems with a real stable release, and accept the fact that it will take a few years for most packages and users to make the switch.

So what's the big deal?

The biggest change is to how strings are handled in Python 3. Python 2 has 8-bit strings and Unicode text, whereas Python 3 has Unicode text and binary data. In Python 2 you can play fast and loose with strings and Unicode text, using either type for parameters and conversion is automatic when necessary. That's great until you get some 8-bit data in a string and some function (anywhere — in your code or deep in some library you're using) needs Unicode text. Then it all falls apart. Python 2 tries to decode strings as 7-bit ASCII to get Unicode text leaving the developer, or worse yet the end user, with one of these:

    Traceback (most recent call last):
    ...
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xf4 in position 3: \
                        ordinal not in range(128)

In Python 3 there are no more automatic conversions, and the default is Unicode text almost everywhere. While Python 2 treats 'all\xf4' as an 8-bit string with four bytes, Python 3 treats the same literal as Unicode text with U+00F4 as the fourth character.

Files opened in text mode (the default, including for sys.stdin, sys.stdout, and sys.stderr) in Python 3 return Unicode text from read() and expect Unicode text to be passed to write(). Files opened in binary mode operate on binary data only. This change affects Python users in Linux and other Unix-like operating systems more than Windows and Mac users — files in Python 2 on Linux that are opened in binary mode are almost indistinguishable from files opened in text mode, while Windows and Mac users have been used to Python at least munging their line breaks when in text mode.

This means that much code that used to "work" (where work is defined for uses with ASCII text only) is now broken. But once that code is updated to properly account for which inputs and outputs are encoded text and which are binary, it can then be used comfortably by people whose native languages or names don't fit in ASCII. That's a pretty nice result.

Python 3's bytes type for binary data is quite different from Python 2's 8-bit strings. Python 2.6 and later have defined bytes to be the same as the str type, which a little strange because the interface has changed significantly:

    >>> bytes([2,3,4]) # Python 2
    '[2, 3, 4]'
    >>> [x for x in 'abc']
    ['a', 'b', 'c']

In Python 3 b'' is used for byte literals:

    >>> bytes([2,3,4]) # Python 3
    b'\x02\x03\x04'
    >>> [x for x in b'abc']
    [97, 98, 99]

Python 3's byte type can be treated like an unchanging list with values between 0 and 255. That's convenient for doing bit arithmetic and other numeric operations common to dealing with binary data, but it's quite different from the string-of-length-1 Python 2 programmers expect.

Integers have changed as well. There is no distinction between long integers and normal integers and sys.maxint is gone. Integer division has changed too. Anyone with a background in Python (or C) will tell you that:

    >>> 1/2
    0
    >>> 1.0/2
    0.5

But no longer. Python 3 returns 0.5 for both expressions. Fortunately Python 2.2 and later have an operator for floor division (//). Use it and you can be certain of an integer result.

The last big change I'll point out is to comparisons. In Python 2 comparisons (<, <=, >=, >) are always defined between all objects. When no explicit ordering is defined then all the objects of one type will either be arbitrarily considered greater or less than all the objects of another type. So you could take a list with a mix of types, sort it, and all the different types will be grouped together. Most of the time though, you really don't want to order different types of objects and this feature just hides some nasty bugs.

Python 3 now raises a TypeError any time you compare objects with incompatible types, as it should. Note that equality (==, !=) is still defined for all types.

Module importing has changed. In Python 2 the directory containing the source file is searched first when importing (called a "relative import"), then the directories in the system path are tried in order. In Python 3 relative imports must be made explicit:

    from . import my_utils

The print statement has become a function in Python 3. This Python 2 code that prints a string to sys.stderr with a space instead of a newline at the end:

    import sys
    print >>sys.stderr, 'something bad happened:',

becomes:

    import sys
    print('something bad happened:', end=' ', file=sys.stderr)

These are just some of the biggest changes. The complete list is here.

That list is huge. How do I deal with all that?

Fortunately a large number of the little incompatibilities are taken care of by the 2to3 tool that ships with Python. 2to3 takes Python 2 source code and performs some automated replacements to prepare the code to run in Python 3. Print statements become functions, Unicode text literals drop their "u" prefix, relative imports are made explicit, and so on.

Unfortunately the rest of the changes need to be made by hand.

It is reasonable to maintain a single code base that works across Python 2 and Python 3 with the help of 2to3. In the case of my library "Urwid" I am targeting Python 2.4 and up, and this is part of the compatibility code I use. When you really have to write code that takes different paths for Python 2 and Python 3 it's nice to be clear with an "if PYTHON3:" statement:

    import sys
    PYTHON3 = sys.version_info >= (3, 0)

    try: # define bytes for Python 2.4, 2.5
        bytes = bytes
    except NameError:
        bytes = str
    
    if PYTHON3: # for creating byte strings
        B = lambda x: x.encode('latin1')
    else:
        B = lambda x: x

String handling and literal strings are the most common areas that need to be updated. Some guidelines:

  • Use Unicode literals (u'') for all literal text in your source. That way your intention is clear and behaviour will be the same in Python 3 (2to3 will turn these into normal text strings).

  • Use byte literals (b'') for all literal byte strings or the B() function above if you are supporting versions of Python earlier than 2.6. B() uses the fact that the first 256 code points in Unicode map to Latin-1 to create a binary string from Unicode text.

  • Use normal strings ('') only in cases where 8-bit strings are expected in Python 2 but Unicode text is expected in Python 3. These cases include attribute names, identifiers, docstrings, and __repr__ return values.

  • Document whether your functions accept bytes or Unicode text and guard against the wrong type being passed in (eg. assert isinstance(var, unicode)), or convert to Unicode text immediately if you must accept both types.

Clearly labeling text as text and binary as binary in your source serves as documentation and may prevent you from writing code that will fail when run under Python 3.

Handling binary data across Python versions can be done a few ways. If you replace all individual byte accesses such as data[i] with data[i:i+1] then you will get a byte-string-of-length-1 in both Python 2 and Python 3. However, I prefer to follow the Python 3 convention of treating byte strings as lists of integers with some more compatibility code:

    if PYTHON3: # for operating on bytes
        ord2 = lambda x: x
        chr2 = lambda x: bytes([x])
    else:
        ord2 = ord
        chr2 = chr

ord2 returns the ordinal value of a byte in Python 2 or Python 3 (where it's a no-op) and chr2 converts back to a byte string. Depending on how you are processing your binary data, it might be noticeably faster to operate on the integer ordinal values instead of byte-strings-of-length-1.

Python "doctests" are snippets of test code that appear in function, class and module documentation text. The test code resembles an interactive Python session and includes the code run and its output. For simple functions this sort of testing is often enough, and it's good documentation. Doctests create a challenge for supporting Python 2 and Python 3 from the same code base, however.

2to3 can convert doctest code in the same way as the rest of the source, but it doesn't touch the expected output. Python 2 will put an "L" at the end of a long integer output and a "u" in front of Unicode strings that won't be present in Python 3, but print-ing the value will always work the same. Make sure that other code run from doctests outputs the same text all the time, and if you can't you might be able to use the ELLIPSIS flag and ... in your output to paper over small differences.

There are a number easy changes you need to make as well, including:

  • Use // everywhere you want floor division (mentioned above).

  • Derive exception classes from BaseException.

  • Use k in my_dict instead of my_dict.has_key(k).

  • Use my_list.sort(key=custom_key_fn) instead of my_list.sort(custom_sort).

  • Use distribute instead of Setuptools.

There are two additional resources that may be helpful: Porting Python Code to 3.0 and Writing Forwards Compatible Python Code.

So if I do all that, what's in it for me?

Python 3 is unarguably a better language than Python 2. Many people new to the language are starting with Python 3, particularly users of proprietary operating systems. Many more current Python 2 users are interested in Python 3 but are held back by the code or a library they are using.

By adding Python 3 support to an application or library you help:

  • make it available to the new users just starting with Python 3

  • encourage existing users to adopt it, knowing it won't stop them from switching to Python 3 later

  • clean up ambiguous use of text and binary data and find related bugs

And as a little bonus that software can then be listed among the packages with Python 3 support in the Python Packaging Index, one click from the front page.

Many popular Python packages haven't yet made the switch, but it's certainly on everyone's radar. In my case I was lucky. Members of the community already did most of the hard work porting my library to Python 3, I only had to update my tests and find ways to make the changes work with old versions of Python as well.

There is currently a divide in the Python community because of the significant differences between Python 2 and Python 3. But with some work, that divide can be bridged. It's worth the effort.

Comments (86 posted)

Brief items

Quotes of the week

Hence, PyPy 50% faster than C on this carefully crafted example. The reason is obvious - static compiler can't inline across file boundaries. In C, you can somehow circumvent that, however, it wouldn't anyway work with shared libraries. In Python however, even when the whole import system is completely dynamic, the JIT can dynamically find out what can be inlined.
-- Maciej Fijalkowski

PostgreSQL does not have query hints because we are not a for-profit company
-- Josh Berkus

Comments (6 posted)

GNU Octave 3.4.0

Version of 3.4.0 of the GNU Octave "quite similar to Matlab" language interpreter has been released. The list of changes and new features appears to be quite long; see the NEWS file for the details.

Comments (none posted)

OpenSSH 5.8 released

OpenSSH 5.8 is available. This version fixes a vulnerability in legacy certificate signing and some bugs in Portable OpenSSH.

Full Story (comments: 17)

Psycopg 2.4 beta1 released

Psycopg is a popular PostgreSQL adapter for Python. The first 2.4 beta release is out; it adds support for PostgreSQL composite types, but the biggest news is probably that Python 3 is now supported. Now is probably a good time for people with Python 3-compatible programs to test it out.

Full Story (comments: none)

ulatencyd 0.4.0 released

Ulatencyd is "a scriptable daemon which constantly optimises the Linux kernel for best user experience." In particular, it uses a set of rules written in Lua to dynamically put processes into scheduler groups with the idea of increasing desktop interactivity. The 0.4.0 release adds a D-Bus interface, improved task grouping, some GNOME and KDE workarounds, and more.

Full Story (comments: none)

Newsletters and articles

Development newsletters from the last week

Comments (none posted)

Neary: Drawing up a roadmap

Dave Neary looks at the importance of roadmaps. "The end result of a good roadmap process is that your users know where they stand, more or less, at any given time. Your developers know where you want to take the project, and can see opportunities to contribute. Your core team knows what the release criteria for the next release are, and you have agreed together mid-term and long-term goals for the project that express your common vision. As maintainer, you have a powerful tool to explain your decisions and align your community around your ideas. A good roadmap is the fertile soil on which your developer community will grow."

Comments (2 posted)

Page editor: Jonathan Corbet

Announcements

Brief items

The Ada Initiative launches

The Ada Initiative - an organization intended to promote women in open technology and culture - has announced its existence. "The Ada Initiative will concentrate on focused, direct action programs, including recruitment and training for women, education for community members, and working with companies and projects to improve their outreach to women." The Initiative is the work of longtime community members Valerie Aurora and Mary Gardiner.

Full Story (comments: 11)

Mandriva Joins Open Invention Network as a Licensee

Mandriva has joined OIN. "Open Invention Network (OIN), the company formed to enable and protect Linux, today extended its community with the signing of Mandriva as a licensee. By becoming a licensee, Mandriva has joined the growing list of organizations that recognize the importance of leveraging the Open Invention Network to further spur open source innovation."

Comments (none posted)

Articles of interest

Sony lawyers now targeting anyone who posts PlayStation 3 hack (ars technica)

Ars technica covers Sony's escalating lawsuits over the PS3 hacks. "Sony is also trying to haul the so-called "failOverflow hacking team" into court. But first, Sony needs to learn the identities and whereabouts of the group's members. They are accused of posting a rudimentary hack in December. It was refined by Hotz weeks later when he accessed the console's so-called "metldr keys," or root keys that trick the system into running unauthorized programs."

Comments (20 posted)

Hemel: Long-term Management

Adriaan de Groot has posted a message from Armijn Hemel on his blog about considering what might happen to your free software (or other) project after you are gone. It's "a post that can easily ruin your mood", but one worth thinking about. "The common theme is that these people were very passionate about what they did. They truly loved their work and it work was appreciated by many. But when fate struck it turned out that they had not taken care of what would happen after they would pass away. I am very sure that they didn't expect this to happen so soon, or never realized that this could be an issue. But in the digital world, with lapsing domain name registrations, databases and webspace being deleted because of unpaid bills, offline development trees and uninformed heirs this is becoming more and more of a risk."

Comments (none posted)

Open hardware can yield dividends (ITWire)

Sam Varghese talks with Jonathan Oxer about open hardware. ""This is where one of the interesting differences between open hardware and open software come in - with open software it's quite easy to publish the source code and the whole tool chain, like compilers or whatever else is necessary. You can give everybody, at zero cost essentially, everything they need to reproduce your work and to develop and build on it. With open hardware it's quite different. I can give someone the design parts for a project but then they need the actual materials or the tools and resources to reproduce it in order to improve on it or collaborate with me.""

Comments (1 posted)

Review: Hands on LibreOffice 3.3 (Linux.com)

Linux.com has a review of LibreOffice. "The remainder of LibreOffice Writer's new features were also useful. I liked the page numbering tool, and I really appreciated the new Print dialog box (which is present in all of the LibreOffice tools). I know, it's a little odd to get excited about a dialog box, but I always have found the OpenOffice.org Print dialog box rather clunky, so it's LibreOffice counterpart is a breath of fresh air."

Comments (3 posted)

A possible game changer for invalidating bad software patents (Opensource.com)

Red Hat's VP and assistant general counsel Rob Tiller writes about the amicus brief filed with the US Supreme Court by Red Hat and a diverse group of other companies in the Microsoft v. i4i case. "Once a patent is inappropriately granted, it is possible, in theory, for a party accused of infringing it to show that it is invalid. In practice, this is quite difficult. When software patents are at issue, the technical issues are often complicated and difficult for a lay jury to understand. Jurors frequently mistakenly assume that the patent examination process was careful and exhaustive, and so have a tendency to assume that a patent must be valid. On top of all this potential confusion, jurors are instructed under current rules that they may only invalidate a patent if they find the evidence for invalidity clear and convincing. Even when there's strong evidence that a patent should never have been granted, it's difficult for lay juries to conclude that the technical issues are clear."

Comments (2 posted)

Nokia drops MeeGo phone before launch (Reuters)

Reuters has a vague report saying that Nokia is dropping MeeGo. "In a leaked internal memo, Chief Executive Stephen Elop wrote: 'We thought MeeGo would be a platform for winning high-end smartphones. However, at this rate, by the end of 2011, we might have only one MeeGo product in the market.'"

Update: the full memo has been posted on Engadget; what Nokia will do is far from clear at this point.

Comments (78 posted)

New Books

Using JRuby--New from Pragmatic Bookshelf

Pragmatic Bookshelf has released "Using JRuby", by Charles O Nutter, Thomas Enebo, Nick Sieger, Ola Bini and Ian Dees.

Full Story (comments: none)

Contests and Awards

2010 LinuxQuestions.org Members Choice Award Winners

The winners of the LinuxQuestions.org Members Choice Awards have been announced. "Ubuntu, Android, MySQL, Cassandra, VLC, Puppet and Django are among the winners."

Full Story (comments: none)

Calls for Presentations

Linux Australia seeks 2013 conference hosts (builder.au)

John Ferlito, president of Linux Australia, has announced that applications are open for hosting linux.conf.au 2013. Formal submissions will be accepted until May 15, 2011. The winner will be announced at the close of linux.conf.au 2012, which will be hosted at Ballarat University in Victoria.

Comments (none posted)

Camp KDE 2011

The 3rd annual Camp KDE has been announced. This year's Camp KDE will take place in San Francisco, California on April 4-5, 2011, preceding the Linux Foundation's Collaboration Summit. Registration is open and proposals for talks will be accepted until March 2, 2011.

Comments (none posted)

Upcoming Events

Bacon: Community Leadership Summit 2011 Announced!

Jono Bacon has announced this year's Community Leadership Summit on his blog. It will be held the weekend before OSCON, July 23-24, at the Oregon Convention Center in Portland. "For those of you who are unfamiliar with the CLS, it is an entirely free event designed to bring together community leaders and managers and the projects and organizations that are interested in growing and empowering a strong community. The event provides an unconference style schedule in which attendees can discuss, debate and explore topics. This is augmented with a range of scheduled talks, panel discussions, networking opportunities and more."

Comments (none posted)

conf.kde.in Announces Talks, Keynotes and Registration

Three keynote speakers have been announced for conf.kde.in, along with a list of talks and presentations. Early bird registration ends February 25. conf.kde.in will be held in Bengaluru (Bangalore), India, March 9-13, 2011.

Comments (none posted)

FSFE: Celebrating Document Freedom Day 2011

The Free Software Foundation Europe (FSFE) will be celebrating Document Freedom Day (DFD) on March 30, 2011. "DFD is a global day to celebrate Open Standards and open document formats and its importance. Open Standards ensure the freedom to access your data, and the freedom to build Free Software to write and read data in specific formats."

Full Story (comments: none)

PyCon 2011 - Announcing "Startup Stories"

PyCon 2011 (March 9-17, 2011 in Atlanta, GA) will be featuring some "startup stories". "These are the stories of companies that have built and shipped (or, in the case of Threadless - about to ship) Python systems at scale. Or, in the case of Open Stack - it is the story of the next generation "Open Cloud" platform for Python at scale."

Comments (none posted)

PyCon Australia 2011

The second PyCon Australia will be held August 20-21, 2011 in Sydney. "International guests should note that Kiwi PyCon is to run on the following weekend, making it a great opportunity to attend a couple of awesome Down Under conferences and hopefully do some sprinting with the locals."

Full Story (comments: none)

Python Game Programming Challenge 12

The 12th Python Game Programming Challenge (PyWeek) starts April 3, 2011. Entrants have one week to write a game from scratch either as an individual or as a team.

Full Story (comments: none)

All about IPv6 At SCALE 9X

SCALE will have a couple of talks on getting ready for IPv6. "Most people in the tech industry know that the IPv4 address space will run dry over the next few months. The forced march to IPv6, with all of its potential downsides -- including network slowdowns and downright outages -- has only just begun. Has your organization started the move? If not, consider attending SCALE!" The 9th Annual Southern California Linux Expo (SCALE 9X) will be held February 25-27, 2011 in Los Angeles, CA.

Full Story (comments: 1)

Events: February 17, 2011 to April 18, 2011

The following event listing is taken from the LWN.net Calendar.

Date(s)EventLocation
February 25 Build an Open Source Cloud Los Angeles, CA, USA
February 25
February 27
Southern California Linux Expo Los Angeles, CA, USA
February 25 Ubucon Los Angeles, CA, USA
February 26 Open Source Software in Education Los Angeles, CA, USA
March 1
March 2
Linux Foundation End User Summit 2011 Jersey City, NJ, USA
March 5 Open Source Days 2011 Community Edition Copenhagen, Denmark
March 7
March 10
Drupalcon Chicago Chicago, IL, USA
March 9
March 11
ConFoo Conference Montreal, Canada
March 9
March 11
conf.kde.in 2011 Bangalore, India
March 11
March 13
PyCon 2011 Atlanta, Georgia, USA
March 19 Open Source Conference Oita 2011 Oita, Japan
March 19
March 20
Chemnitzer Linux-Tage Chemnitz, Germany
March 19 OpenStreetMap Foundation Japan Mappers Symposium Tokyo, Japan
March 21
March 22
Embedded Technology Conference 2011 San Jose, Costa Rica
March 22
March 24
OMG Workshop on Real-time, Embedded and Enterprise-Scale Time-Critical Systems Washington, DC, USA
March 22
March 25
Frühjahrsfachgespräch Weimar, Germany
March 22
March 24
UKUUG Spring 2011 Conference Leeds, UK
March 22
March 25
PgEast PostgreSQL Conference New York City, NY, USA
March 23
March 25
Palmetto Open Source Software Conference Columbia, SC, USA
March 26 10. Augsburger Linux-Infotag 2011 Augsburg, Germany
March 28
April 1
GNOME 3.0 Bangalore Hackfest | GNOME.ASIA SUMMIT 2011 Bangalore, India
March 28 Perth Linux User Group Quiz Night Perth, Australia
March 29
March 30
NASA Open Source Summit Mountain View, CA, USA
April 1
April 3
Flourish Conference 2011! Chicago, IL, USA
April 2
April 3
Workshop on GCC Research Opportunities Chamonix, France
April 2 Texas Linux Fest 2011 Austin, Texas, USA
April 4
April 5
Camp KDE 2011 San Francisco, CA, USA
April 4
April 6
SugarCon ’11 San Francisco, CA, USA
April 4
April 6
Selenium Conference San Francisco, CA, USA
April 6
April 8
5th Annual Linux Foundation Collaboration Summit San Francisco, CA, USA
April 8
April 9
Hack'n Rio Rio de Janeiro, Brazil
April 9 Linuxwochen Österreich - Graz Graz, Austria
April 9 Festival Latinoamericano de Instalación de Software Libre
April 11
April 14
O'Reilly MySQL Conference & Expo Santa Clara, CA, USA
April 11
April 13
2011 Embedded Linux Conference San Francisco, CA, USA
April 13
April 14
2011 Android Builders Summit San Francisco, CA, USA
April 16 Open Source Conference Kansai/Kobe 2011 Kobe, Japan

If your event does not appear here, please tell us about it.

Page editor: Rebecca Sobol


Copyright © 2011, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds