The publication of Larry Lessig's Code, Eben said, drew our attention to the fact that, in the world we live in, code increasingly functions as law. Code does the work of the state, but it can also serve revolution against the state. We are seeing an enormous demonstration of the power of code now, he said. At the same time, there is a lot of attention being paid to the publication of Evgeny Morozov's The Net Delusion, which makes the claim that the net is being co-opted to control freedom worldwide. The book is meant to be a warning to technology optimists. Eben is, he said, one of those optimists. The lesson he draws from current events is that the right net brings freedom, but the wrong net brings tyranny.
We have spent a lot of time making free software. In the process, we have joined forces with other elements of the free culture world. Those forces include people like Jimmy Wales, but also people like Julian Assange. Wikipedia and Wikileaks, he said, are two sides of the same coin. At FOSDEM, he said, one could see "the third side" of the coin. We are all people who have organized to change the world without creating new hierarchies in the process. At the end of 2010, Wikileaks was seen mainly as a criminal operation. Events in Tunisia changed that perception, though. Wikileaks turns out to be an attempt to help people learn about their world. Wikileaks, he said, is not destruction - it's freedom.
But now there are a lot of Egyptians out there whose freedom depends on the ability to communicate through commercial operations which will respond to pressure from the government. We are now seeing in real time the vulnerabilities which come from the bad engineering in the current system.
Social networking, he said, changes the balance of power away from the state and toward people. Events in countries like Iran, Tunisia, and Egypt demonstrate its importance. But current forms of social communication are "intensely dangerous" to use. They are too centralized and vulnerable to state control. Their design is motivated by profit, not by freedom. As a result, political movements are resting on a fragile foundation: the courage of Mr. Zuckerberg or Google to resist the state - the same state which can easily shut them down.
Likewise, real time information for people trying to build freedom currently depends on a single California-based microblogging service which must turn a profit. This operation is capable of deciding, on its own, to donate its entire history to the US Library of Congress. Who knows what types of "donations" it may have made elsewhere?
We need to fix this situation, he said, and quickly. We are "behind the curve" of freedom movements which depend heavily on code. The longer we wait, the more we become part of the system. That will bring tragedy soon. Egypt is inspiring, but things there could have been far worse. The state was late to control the net and unready to be as tough as it could have been. It is, Eben said, not hard to decapitate a revolution when everybody is in Mr. Zuckerberg's database.
It is time to think about the consequences of what we have built - and what we have not built yet. We have talked for years about replacing centralized services with federated services; overcentralization is a critical vulnerability which can lead to arrests, torture, and killings. People are depending on technology which is built to sell them out. If we care about freedom, we have to address this problem; we are running out of time, and people are in harm's way. Eben does not want people who are taking risks for freedom to be carrying an iPhone.
One thing that Egypt has showed us, like Iran did before, is that closed networks are harmful and network "kill switches" will harm people who are seeking freedom. What can we do when the government has clamped down on network infrastructure? We must return to the idea of mesh networks, built with existing equipment, which can resist governmental control. And we must go back to secure, end-to-end communications over those networks. Can we do it, he asked? Certainly, but will we? If we don't, the promise of the free software movement will begin to be broken. Force will intervene and we will see more demonstrations that, despite the net, the state still wins.
North America, Eben said, is becoming the heart of a global data mining industry. When US President Dwight Eisenhower left office, he famously warned about the power of the growing military-industrial complex. Despite that warning, the US has, since then, spent more on defense than the rest of the world combined. Since the events of September 11, 2001, a new surveillance-industrial complex has grown. Eben strongly recommended reading the Top Secret America articles published by the Washington Post. It is eye-opening to see just how many Google-like operations there are, all under the control of the government.
Europe's data protection laws have worked, in that they have caused all of that data to move to North America where its use is uncontrolled. Data mining, like any industry, tends to move to the areas where there is the least control. There is no way that the US government is going to change that situation; it depends on it too heavily. As a presidential candidate, Barack Obama was against giving immunity to the telecom industry for its role in spying on Americans. That position did not even last through the general election. Obama's actual policies are not notably different from those of his predecessor - except in the areas where they are more aggressive.
Private industry will not change things either; the profit motive will not produce privacy or defense for people in the street. Companies trying to earn a profit cannot do so without the good will of the government. So we must build under the assumption that the net is untrustworthy, and that centralized services can kill people. We cannot, he said, fool around with this; we must replace things which create these vulnerabilities.
We know how to engineer our way out of this situation. We need to create plug servers which are cheap and require little power, and we must fill them with "sweet free software." We need working mesh networking, self-constructing phone systems built with tools like OpenBTS and Asterisk, federated social services, and anonymous publication platforms. We need to keep our data within our houses where it is shielded by whatever protections against physical searches remain. We need to send encrypted email all the time. These systems can also provide perimeter defense for more vulnerable systems and proxy servers for circumvention of national firewalls. We can do all of it, Eben said; it is easily done on top of the stuff we already have.
Eben concluded with an announcement of the creation of the Freedom Box Foundation, which is dedicated to making all of this stuff available and "cheaper than phone chargers." A generation ago, he said, we set out to create freedom, and we are still doing it. But we have to pick up the pace, and we have to aim our engineering more directly at politics. We have friends in the street; if we don't help them, they will get hurt. The good news is that we already have almost everything we need and we are more than capable of doing the rest.
[Editor's note: as of this writing, the Freedom Box Foundation does not appear to have a web site - stay tuned.]
[Update: Added link to The FreedomBox Foundation]
This year, FOSDEM had a Data Analytics developer room, which turned out to be quite popular to the assembled geeks in Brussels: during many of the talks the room was fully packed. This first meeting about analyzing and learning from data had talks looking at information retrieval, large scale data processing, machine learning, text mining, data visualization and Linked Open Data, all of which was implemented using open source tools.
One of the most inspiring talks in the data analytics track, which showed just how much you can do with open source tools in data visualization, was Mapping WikiLeaks' Cablegate using Python, mongoDB, Neo4j and Gephi by Elias Showk and Julian Bilcke, two software engineers at the Centre National de la Recherche Scientifique (National Center for Scientific Research in France). Their goal was to analyze the full text of all published WikiLeaks diplomatic cables, to produce occurrence and co-occurrence networks of topics and cables, and finally to visualize how the discussions in the cables relate to each other. In short, they did this by analyzing the 3,300 cables with Python and some data extraction libraries, then they used MongoDB and Neo4j to store the documents and generate graphs, and finally they visualized and explored the graphs with Gephi.
The first step in this process, presented by Showk, is importing the cables. Luckily, the WikiLeaks cables follow a simple structure that makes this relatively easy. Showk based his work on the cablegate Python code by Mark Matienzo that scrapes data from the cables in HTML form and converts this to Python objects. For the HTML scraping, the code is using Beautiful Soup, a well-known Python HTML/XML parser that automatically converts the web pages to Unicode and can cope with errors in the HTML tree. Moreover, with a SoupStrainer object, you can tell the Beautiful Soup parser to target a specific part of the document and forget about all the boilerplate parts such as the header, footer, sidebars, and supporting information.
After the parsing, The Python natural language toolkit NLTK is used on the text body to bring more structure to the word scramble with the goal of extracting some topics. The first step is tokenization: NLTK allows easily breaking up a text into sentences and each sentence into its separate words. Then for each word the stem is determined, which means that all words are grouped by their root. For example, to analyze the topics of the WikiLeaks cables, it doesn't matter if the word in a text is "language" or "languages", so they are both grouped by their root "languag". An SHA-256 hash value of each stem is then used as a database index.
MongoDB, a document-oriented database, is used as document storage for all this data. MongoDB allows transparently inserting and reading records as Python dictionaries, as well as automatic serializing and deserializing of the objects. Then Showk queried the MongoDB database to extract the heaviest occurrences and co-occurrences of words, and converted that to a graph using the Neo4j graph database.
For the final step, visualizing and analyzing the data, Bilcke used Gephi, an open source desktop application for the visualization of complex networks. Gephi, to which Bilcke is an active contributor, is a research-oriented graph visualization tool that has been used in the past to visualize some interesting graphs, like open source communities and social networks on LinkedIn. It's based on Java and OpenGL, but it also has a headless library, the Gephi Toolkit.
So Bilcke imported the graph from the Neo4j graph database into Gephi, and then did some manual data cleaning. The graph is quite dense and has a lot of meaningless content, so there is some post-processing needed, like sorting and filtering. Bilcke chose the OpenOrd layout, which is one of the few force-directed layout algorithms that can scale to over 1 million nodes, which makes it ideal for the WikiLeaks graph. He only had to remove some artifacts, tweak the appearance slightly, and finally export the graph to PDF and GEXF (Gephi's native file format).
In total the two French researchers did a full week of coding, during which they have written 600 lines of code using four external libraries, two database systems, and one visualization program. All the tools they used are open source, as is their code, so this is quite a nice testimonial of what you could do with open source tools in the field of visualization of big data sources. Performing the whole work flow from the original WikiLeaks HTML files to the final graph requires around five hours.
Showk and Bilcke did all this during their free time in order to learn about all these technologies. Their goal was to show that every hacker can convert a corpus of textual data to a graph that is easier for exploring topics. This could be used to find some interesting new things, but the two researchers lacked the time to do this and were more interested in the technical side. In an email, Bilcke clarifies:
The result is published in the form of two graphs, which can be explored by anyone who wants to dig into the WikiLeaks cables. One graph, with 43,179 nodes and 237,058 edges, links topics to the cables they occur in. The other graph, with 39,808 nodes and 177,023 edges, only shows the topics and links them when they co-occur in the same cable. Interested readers can view the PDF or SVG files, but the best way is to load the .gephi files into Gephi, so you can interactively explore the graphs. For graphs of this size, though, the Gephi system requirements suggest 2 GB of RAM.
One of the other talks was about Datalift, an experimental research project funded by the already mentioned National Center for Scientific Research in France. Its goal is to convert structured data in various formats (like relational databases, CSV, or XML) to semantic data that can be interlinked. According to François Scharffe, only when open data meets the semantic web, will we truly see a data revolution. Because big chunks of data on an island are difficult to re-use, while data in a common format with semantic information (like RDF) paves the way for richer web applications, more precise search engines, and a lot of other advanced applications. Scharffe referred to the five stars rating system of Tim Berners-Lee:
The Datalift project is currently developing tools to facilitate all steps in the process from raw data to published linked data, from selection of the right vocabulary (e.g. FOAF for persons, or GeoNames for geographic locations), conversion to RDF, and publishing of the data on the web, to interlinking the data with other existing data sources. There are already open source solutions for all these steps. For example, D2R Server maps a relational database on-the-fly to RDF, and Triplify does the same for web applications like a blog or content management system. For the publication of RDF in a human-readable form, there is the Tabulator Firefox extension. The Datalift project is trying to streamline this whole process.
All of the talks in the data analytics developer room where quite short, from 15 to 30 minutes. This allowed a lot of projects to pass in review. Apart from the WikiLeaks and Datalift talks, there were talks about graph databases and NoSQL databases, about query languages, about analyzing and understanding large corpora of text data using Apache Hadoop, about various tools and methods for data extraction from HTML pages, about machine learning with Python, and about a real-time search engine using Apache Lucene and S4.
The whole data analytics track showed that there's no shortage of open source tools to deal with big amounts of data. That's good news for statisticians and other "data scientists", which Google's Hal Varian called "the sexy job in the next ten years". In an article in The McKinsey Quarterly from January 2009, he wrote: "The ability to take data-to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it-that's going to be a hugely important skill in the next decades." Looking at all the talks at the data analytics track at FOSDEM, it's clear that open source software will play a big role in this trend. If the track will be hosted again at FOSDEM 2012, it's going to need a bigger room.
The eyeOS project describes itself as an open source "web desktop" — by which it means an operating system that emulates a functional desktop environment entirely within a single web page. The benefits certainly sound appealing: privacy, access to your files from anywhere on the Internet, and the ability to collaborate in real-time with other users. In addition, eyeOS is AGPLv3-licensed, which makes it high on the list of software-freedom protecting web services. Still, as of February 2011, there are scores of competing ways to accomplish remote-desktop-access and real-time collaboration, so eyeOS has a tough case to make for many users.
To get a clearer picture of what makes eyeOS different from other Web-enabled desktop suites, one has to look at both the interface and the architecture. After all, AbiWord, Zoho, Google Docs, EtherPad, and even Microsoft Office now allow online collaboration on standard office tasks.
The base eyeOS system comes with a small contingent of installed applications: networking, office tools, utilities, and so on, but both the front-end and back-end APIs are open. Over the years a respectable collection of third-party eyeOS software has been developed by the community. Installing a new application requires administrative access to the eyeOS server, but it can be done through the eyeOS desktop interface itself.
You can create a new user account and test an eyeOS session on the public eyeOS server to get a feel for the system. Visiting the try it now page on eyeOS.org, you will notice two options, version 1.x and version 2.x, both of which as marked as "stable" and "to be used in production environments." The version 1.x code (which appears to be at release 1.9) has been in development since 2007. Version 2.x is a rewrite begun in March of 2010. The instance running on the public server sometimes shows 2.1 and sometimes 2.2 as its version number; presumably it is 2.2 with the occasional overlooked HTML update.
The project advertises more than 250 applications available for eyeOS 1.x and more than 20 for 2.x, but those numbers count third-party applications (most hosted at eyeos-apps.org), not what is available on the demo servers. I spent some time with version 2.x first, but the lack of application availability inspired me to test-drive 1.x as well. Even after digging through the wiki, discussion forum, and main web site, I am still not sure whether or not the public "try it now" server is a free service offered by eyeOS, or a demo limited in storage space, time, or some other resource. You are required to create an account to sign on as a new user, and provide an email address, but the email address is not validated as part of the user creation process. I hope I have not been subscribed to a marketing list as a result.
After having worked with both incarnations of eyeOS for about a day, I have mixed feelings about the implementation. On the one hand, the system is surprisingly fast: the GUI is understandably slower than working with native desktop applications, but it is approximately on par with running a local virtual machine inside VirtualBox. Features that I expected to be flaky, such as sound and mouse wheel support, worked without incident. On the other hand, there is a surprising lack of polish, particularly in the older 1.x system, where one would expect the kinks to have been worked out years ago.
The user interface is inconsistent from application to application — particularly with the desktop widgets — as is the terminology (for example, there is an application variously called "eyeOS Board" and "eyeBoard" at different places in the interface). The icons are a mix of flat, 3-D, head-on and 45-degree angle perspectives, drawn in different styles and color schemes. Some of the default UI widgets are impossible-to-read, such as white text on light gray. Changing the theme requires you to "restart" your session — but there is no session restart option, only logging out entirely and logging back in manually. The system menu is in the bottom-right-corner, and uses an icon that is not quite the eyeOS logo, and looks confusingly similar to the familiar "power button" symbol. Most surprisingly, when I attempted to open the "Documents" folder in my home directory, the built-in file manager did not know how to do so, and popped up an application-selection window asking me to find the right launcher.
2.x is a bit better in terms of UI consistency, although it too suffers from mix-and-match iconography. More seriously, the application launchers on the desktop did not work, although that hiccup is ironically solved due to there being two additional sets of launchers always visible on the upper and lower desktop panels. Several of the default applications had non-functioning menu items (most noticeably the calendar, where calendar feed properties cannot be edited). If you open a new file in the text editor, it will throw away all of your changes to the current document without affording you the opportunity to save them.
Looking past the UI issues, however, there are some intrinsic properties of the system that I grew frustrated with rather quickly. For starters, I noticed that all applications start off at very small window sizes when launched, generally too small to be used without resizing. Upon reflection, though, that behavior is probably a workaround to account for the fact that the entire desktop is running in a frame within a window within your existing desktop environment: there is simply less real estate to go around.
The scarcity of screen space is exacerbated by the use of the desktop metaphor itself: things like having another task manager inside the browser and having window title bars for every application eat up space, but they don't make the applications more usable. By a similar token, eyeOS requires the user to manually log out of an eyeOS session (like one would on a desktop system) — simply closing the browser or navigating away does not close the session. That behavior makes sense if the goal is emulating a desktop OS, but it results in a security hole that undermines one of the stated goals of the project: keeping your files safe.
I was also disappointed in the default application set. Were it not for the novelty of running inside the browser, eyeOS would be a pretty weak desktop product: the calendaring application cannot subscribe to remote calendars, the word processor is minimalist, the calculator is four-function only. It is confusing to me why eyeOS 1.x needs to have a web browser application (although I was pleased to discover that you can run eyeOS from within the eyeOS browser).
Perhaps eyeOS defenders would point me towards the still-growing library of additional applications available for installation as a way to enhance the experience. To a point, they are entirely right: if you run your own server, you can provide a considerably richer environment for your eyeOS users. It is even possible to enable Microsoft Office file format support within eyeOS's applications; this is done by installing OpenOffice.org on the server, along with the xvfb X server. There are eyeOS applications that enhance the default desktop experience with better PDF support, improved email, and additional communication tools like IRC.
But ultimately I am not persuaded that running eyeOS applications within the eyeOS environment in the browser offers a better computing experience than does simply running existing open source web applications. If you browse the eyeos-apps.org application repository, most of the non-trivial applications are ports of existing projects, like RoundCube, Moodle, or Zoho. Considering that you need access to a full LAMP stack to run your own instance of eyeOS, I see little advantage to running any of those applications within the containerized environment, simply because it emulated the existence of a desktop underneath. It is certainly not easier to deploy RoundCube within eyeOS than it is to deploy it on your own server, nor is it easier to secure, nor will it run faster or be easier for users to learn.
As always, there is a trade-off involved, including the configuration work required when you are talking about supporting a large group of users. In my estimation, the default eyeOS applications don't provide a powerful enough experience to say that its simpler deployment process ultimately makes the administrator's job easier than individually setting up other open source file sharing and collaboration tools. EyeOS has a basic new user "skeleton" directory system, but at the moment lacks robust tools for managing and pre-configuring applications for big deployments.
Standing alone, the term "web desktop" could be interpreted to mean a variety of different things. ChromeOS, for example, tries to be a "web desktop" by replacing all client-side applications with web apps. On the other end of the spectrum, more and more GNOME and KDE applications are gaining the ability to seamlessly integrate network collaboration, and with the popularity of "cloud storage" services like Ubuntu One, Dropbox, and the like, it is even possible for a Linux user to store his or her desktop preferences on a remote server, thus making the same environment available everywhere. EyeOS is certainly an innovative approach to the "web desktop," but at the moment, I'm not sure it offers a compelling advantage over the web-and-desktop integration already occurring in other areas.
The Windows "AutoRun" feature, which automatically (or semi-automatically after a user prompt) runs programs from removable storage devices, has been a regular source of security problems. It has been present since Windows 95, but Microsoft finally recognized the problem and largely disabled the "feature" in Windows 7—and issued an update on February 8 that disables it for XP and Vista. Various attacks (ab)used AutoRun on USB storage devices to propagate, including Conficker and Stuxnet. Could Linux suffer from a similar flaw? The answer, from a SchmooCon 2011 presentation, is, perhaps unsurprisingly, "yes".
At SchmooCon, Jon Larimer demonstrated a way to circumvent the screensaver lock on an Ubuntu 10.10 system just by inserting a USB storage device. Because the system will automatically mount the USB drive and the Nautilus file browser will try to thumbnail any documents it finds there, he was able to shut down the screensaver and access the system. While his demo disabled both address-space layout randomization (ASLR) and AppArmor, that was only done to make the demo run quickly. On 32-bit systems, ASLR can be brute-forced to find needed library addresses, given some time. AppArmor is more difficult to bypass, but he has some plausible ideas on doing that as well.
Larimer's exploit took advantage of a hole in the evince-thumbnailer, which was fixed back in January (CVE-2010-2640). A crafted DVI file could be constructed and used to execute arbitrary code when processed by evince. In his presentation [PDF], he shows in some detail how to use this vulnerability to execute a program stored on the USB device.
Killing the screensaver is just one of the things that could be done from that shell script, of course. Larimer points to possibilities like putting a .desktop file into ~/.config/autostart, which will then be executed every time the user logs in. The same kind of thing could be done using .bash_profile or similar files. Either of those could make for a Conficker-like attack against Linux systems. In addition, because the user is logged in, any encrypted home directory or partition will be decrypted and available for copying the user's private data.
While Larimer's demonstration is interesting, even though the specifics of his attack may be of little practical use, there is much to be considered in the rest of his presentation. As he points out, automatically mounting USB storage devices and accessing their contents invokes an enormous amount of code, from the USB drivers and filesystem code, to the desktop daemons and applications that display the contents of those devices. Each of those components could have—many have had—security vulnerabilities.
That should give anyone pause about automatically mounting those kinds of devices. One could certainly imagine crafted devices or filesystems that exploit holes in the kernel code, which would be a route that would likely avoid AppArmor (or SELinux) entirely. While Linux may not automatically run code from USB storage devices, it does enough processing of the, quite possibly malicious, data on them that the effect may be largely the same.
Larimer offers some recommendations to avoid this kind of problem, starting with the obvious: turn off auto-mounting of removable storage. He also recommends disabling the automatic thumbnailing of files on removable media. In addition, using grsecurity/PaX makes brute-forcing ASLR harder on 32-bit systems because it uses more bits of entropy to randomize the library locations. Of course, a 64-bit system allows a much wider range of potential library addresses, so that makes breaking ASLR harder still.
One clear theme of his talk is that "automatically" doing things can be quite dangerous. It may be easier and more convenient, but it can also lead to potentially serious holes. Convenience and security are often at odds.
That's simply no way to run a free country.
|Created:||February 4, 2011||Updated:||February 21, 2011|
|Description:||From the CVE entry:
Stack-based buffer overflow in the ast_uri_encode function in main/utils.c in Asterisk Open Source before 220.127.116.11, 18.104.22.168, 22.214.171.124, 126.96.36.199.1, 188.8.131.52.1, 184.108.40.206, 1.8.2.; and Business Edition before C.3.6.2; when running in pedantic mode allows remote authenticated users to execute arbitrary code via crafted caller ID data in vectors involving the (1) SIP channel driver, (2) URIENCODE dialplan function, or (3) AGI dialplan function.
|Package(s):||bugzilla||CVE #(s):||CVE-2010-4568 CVE-2010-2761 CVE-2010-4411 CVE-2010-4572 CVE-2010-4569 CVE-2010-4570 CVE-2010-4567 CVE-2011-0048 CVE-2011-0046|
|Created:||February 3, 2011||Updated:||October 10, 2011|
From the bugzilla advisory:
CVE-2010-4568: It was possible for a user to gain unauthorized access to any Bugzilla account in a very short amount of time (short enough that the attack is highly effective). This is a critical vulnerability that should be patched immediately by all Bugzilla installations.
CVE-2010-2761, CVE-2010-4411, CVE-2010-4572: By inserting particular strings into certain URLs, it was possible to inject both headers and content to any browser.
CVE-2010-4569: Bugzilla 3.7.x and 4.0rc1 have a new client-side autocomplete mechanism for all fields where a username is entered. This mechanism was vulnerable to a cross-site scripting attack.
CVE-2010-4570: Bugzilla 3.7.x and 4.0rc1 have a new mechanism on the bug entry page for automatically detecting if the bug you are filing is a duplicate of another existing bug. This mechanism was vulnerable to a cross-site scripting attack.
CVE-2011-0046: Various pages were vulnerable to Cross-Site Request Forgery attacks. Most of these issues are not as serious as previous CSRF vulnerabilities. Some of these issues were only addressed on more recent branches of Bugzilla and not fixed in earlier branches, in order to avoid changing behavior that external applications may depend on. The links below in "References" describe which issues were fixed on which branches.
|Created:||February 4, 2011||Updated:||April 19, 2011|
|Description:||From the CVE entry:
The DHCPv6 server in ISC DHCP 4.0.x and 4.1.x before 4.1.2-P1, 4.0-ESV and 4.1-ESV before 4.1-ESV-R1, and 4.2.x before 4.2.1b1 allows remote attackers to cause a denial of service (assertion failure and daemon crash) by sending a message over IPv6 for a declined and abandoned address.
|Created:||February 8, 2011||Updated:||February 22, 2011|
|Description:||From the CVE entry:
The open_log function in log.c in Exim 4.72 and earlier does not check the return value from (1) setuid or (2) setgid system calls, which allows local users to append log data to arbitrary files via a symlink attack.
|Created:||February 8, 2011||Updated:||February 9, 2011|
|Description:||From rPath RPL-3199:
When Intel VT is enabled in the BIOS of some systems which use intel_iommu, a kernel oops, and possibly a system crash, may occur. Adding intel_iommu=off to the boot parameter list works around the issue.
|Package(s):||krb5||CVE #(s):||CVE-2010-4022 CVE-2011-0281 CVE-2011-0282|
|Created:||February 9, 2011||Updated:||April 15, 2011|
|Description:||The krb5 server suffers from three independent vulnerabilities allowing a remote attacker to crash or hang the "key distribution center" process.|
|Package(s):||Opera||CVE #(s):||CVE-2011-0681 CVE-2011-0682 CVE-2011-0683 CVE-2011-0684 CVE-2011-0685 CVE-2011-0686 CVE-2011-0687|
|Created:||February 7, 2011||Updated:||February 9, 2011|
|Description:||From the CVE entries:
Opera before 11.01 does not properly restrict the use of opera: URLs, which makes it easier for remote attackers to conduct clickjacking attacks via a crafted web site. (CVE-2011-0683)
Opera before 11.01 does not properly handle redirections and unspecified other HTTP responses, which allows remote web servers to obtain sufficient access to local files to use these files as page resources, and consequently obtain potentially sensitive information from the contents of the files, via an unknown response manipulation. (CVE-2011-0684)
The Delete Private Data feature in Opera before 11.01 does not properly implement the "Clear all email account passwords" option, which might allow physically proximate attackers to access an e-mail account via an unattended workstation. (CVE-2011-0685)
Unspecified vulnerability in Opera before 11.01 allows remote attackers to cause a denial of service (application crash) via unknown content on a web page, as demonstrated by vkontakte.ru. (CVE-2011-0686)
Opera before 11.01 does not properly implement Wireless Application Protocol (WAP) dropdown lists, which allows user-assisted remote attackers to cause a denial of service (application crash) via a crafted WAP document. (CVE-2011-0687)
Opera before 11.01 does not properly handle large form inputs, which allows remote attackers to execute arbitrary code or cause a denial of service (memory corruption) via a crafted HTML document. (CVE-2011-0682)
|Created:||February 4, 2011||Updated:||April 15, 2011|
|Description:||From the CVE entry:
Buffer overflow in the gettoken function in contrib/intarray/_int_bool.c in the intarray array module in PostgreSQL 9.0.x before 9.0.3, 8.4.x before 8.4.7, 8.3.x before 8.3.14, and 8.2.x before 8.2.20 allows remote authenticated users to cause a denial of service (crash) and possibly execute arbitrary code via integers with a large number of digits to unspecified functions.
|Package(s):||vlc vlc-firefox||CVE #(s):||CVE-2011-0522|
|Created:||February 3, 2011||Updated:||February 9, 2011|
From the VUPEN advisory:
Two vulnerabilities have been identified in VLC Media Player, which could be exploited by attackers to compromise a vulnerable system. These issues are caused by buffer overflow errors in the "StripTags()" function within the USF and Text subtitles decoders ["modules/codec/subtitles/subsdec.c" and "modules/codec/subtitles/subsusf.c"] when processing malformed data, which could be exploited by attackers to crash an affected application or execute arbitrary by convincing a user to open a malicious media file.
Page editor: Jake Edge
Brief itemsreleased on February 7. "There's nothing much that stands out here. Some arch updates (arm and powerpc), the usual driver updates: dri (radeon/i915), network cards, sound, media, scisi, some filesystem updates (cifs, btrfs), and some random stuff to round it all out (networking, watchpoints, tracepoints, etc)." The short-form changelog is in the announcement, or see the full changelog for all the details.
Stable updates: the 220.127.116.11 long-term update was released on February 7 with a long list of important fixes.
The 18.104.22.168 update was released on February 9. This one contains a couple dozen important fixes.
That window has been capped by RFC 3390 at four segments (just over 4KB) for the better part of a decade. In the meantime, connection speeds have increased and the amount of data sent over a given connection has grown despite the fact that connections live for shorter periods of time. As a result, many connections never ramp up to their full speed before they are closed, so the four-segment limit is now seen as a bottleneck which increases the latency of a typical connection considerably. That is one reason why contemporary browsers use many connections in parallel, despite the fact that the HTTP specification says that a maximum of two connections should be used.
Some developers at Google have been agitating for an increase in the initial congestion window for a while; in July 2010 they posted an IETF draft pushing for this change and describing the motivation behind it. Evidently Google has run some large-scale tests and found that, by increasing the initial congestion window, user-visible latencies can be reduced by 10% without creating congestion problems on the net. They thus recommend that the window be increased to 10 segments; the draft suggests that 16 might actually be a better value, but more testing is required.
David Miller has posted a patch increasing the window to 10; that patch has not been merged into the mainline, so one assumes it's meant for 2.6.39.
Interestingly, Google's tests worked with a number of operating systems, but not with Linux, which uses a relatively small initial receive window of 6KB. Most other systems, it seems, use 64KB instead. Without a receive window at least as large as the congestion window, a larger initial congestion window will have little effect. That problem will be fixed in 2.6.38, thanks to a patch from Nandita Dukkipati raising the initial receive window to 10 segments.annoyed by HTC's policy of delaying source code releases for up to 120 days after a given handset ships. In response, Matthew Garrett has suggested an addition to the top-level COPYING file in the kernel source:
About the only response so far has been from Alan Cox, who has suggested that getting a lawyer's opinion on the matter might be useful. Linus, over whose name the new text would appear, has not commented on it. So it's not clear if the change will go in or whether it will inspire any changes in vendor behavior if it is merged. But it does, at least, make the developers' feelings on the matter known.
Kernel development news
The question, as it turns out, came sooner - February 3, to be exact - when Jan Kara suggested that removing ext2 and ext3 could be discussed at the upcoming storage, filesystems, and memory management summit. Jan asked:
One might protest that there will be existing filesystems in the ext3 (and even ext2) formats for the indefinite future. Removing support for those formats is clearly not something that can be done. But removing the ext2 and/or ext3 code is not the same as removing support: ext4 has been very carefully written to be able to work with the older formats without breaking compatibility. One can mount an ext3 filesystem using the ext4 code and make changes; it will still be possible to mount that filesystem with the ext3 code in the future.
So it is possible to remove ext2 and ext3 without breaking existing users or preventing them from going back to older implementations. Beyond that, mounting an ext2/3 filesystem under ext4 allows the system to use a number of performance enhancing techniques - like delayed allocation - which do not exist in the older implementations. In other words, ext4 can replace ext2 and ext3, maintain compatibility, and make things faster at the same time. Given that, one might wonder why removing the older code even requires discussion.
There appear to be a couple of reasons not to hurry into this change, both of which have to do with testing. As Eric Sandeen noted, some of the more ext3-like options are not tested as heavily as the native modes of operation:
There is also concern that ext4, which is still seeing much more change than its predecessors, is more likely to introduce instabilities. That's a bit of a disturbing idea; there are enough production users of ext4 now that the introduction of serious bugs would not be pleasant. But, again, the backward-compatible operating modes of ext4 may not be as heavily tested as the native mode, so one might argue that operation with older filesystems is more likely to break regardless of how careful the developers are.
So, clearly, any move to get rid of ext2 and ext3 would have to be preceded by the introduction of better testing for the less-exercised corners of ext4. The developers involved understand that clearly, so there is no need to be worried that the older code could be removed too quickly.
Meanwhile, there are also concerns that the older code, which is not seeing much developer attention, could give birth to bugs of its own. As Jan put it:
Developers have also expressed concern that new filesystem authors might copy code from ext2, which, at this point, does not serve as a good example for how Linux filesystems should be written.
The end result is that, once the testing concerns have been addressed, everybody involved might be made better off by the removal of ext2 and ext3. Users with older filesystems would get better performance and a code base which is seeing more active development and maintenance. Developers would be able to shed an older maintenance burden and focus their efforts on a single filesystem going forward. Thanks to the careful compatibility work which has been done over the years, it may be possible to safely make this move in the relatively near future.
With some regularity, the topic of allowing multiple Linux Security Modules (LSMs) to all be active comes up in the Linux kernel community. There have been some attempts at "stacking" or "chaining" LSMs in the past, but nothing has ever made it into the mainline. On the other hand, though, every time a developer comes up with some kind of security hardening patch for the kernel, they are generally directed toward the LSM interface. Because the "monolithic" security solutions (like SELinux, AppArmor, and others) tend to have already taken the single existing LSM slot in many distributions, these simpler, more targeted LSMs are generally unable to be used. But a discussion on the linux-security-module mailing list suggests that work is being done that just might solve this problem.
The existing implementation of LSMs uses a single set of function pointers in a struct security_operations for the "hooks" that get called when access decisions need to be made. Once a security module gets registered (typically at boot time using the security= flag), its implementation is stored in the structure and any other LSM is out of luck. The idea behind LSM stacking would be to keep multiple versions of the security_operations structure around and to call each registered LSM's hooks for an access decision. While that sounds fairly straightforward, there are some subtleties that need to be addressed, especially if different LSMs give different answers for a particular access.
This problem with the semantics of "composing" two (or more) LSMs has been discussed at various points, without any real global solution for composing arbitrary LSMs. As Serge E. Hallyn warned over a year ago:
There is one example of stacking LSMs as Hallyn describes in the kernel already; the capabilities LSM is called directly from other LSMs where necessary. That particular approach is not very general, of course, as LSM maintainers are likely to lose patience with adding calls for every other possible LSM. A more easily expandable solution is required.
David Howells posted a set of patches that would add that expansion mechanism. It does that by allowing multiple calls to the register_security() initialization function, each with its own set of security_operations. Instead of the current situation, where each LSM manages its own data for each kind of object (credentials, keys, files, inodes, superblocks, IPC, and sockets), Howell's security framework will allocate and manage that data for the LSMs.
The security_operations structure gets new *_data_size and *_data_offset fields for each kind of object, with the former filled in by the LSM before calling register_security() and the latter being managed by the framework. The data size field tells the framework how much space is needed for the LSM-specific data for that type of object, and the offset is used by the framework to find each LSM's private data. For struct cred, struct key, struct file, and struct super_block, the extra data for each registered LSM is tacked onto the end of the structure rather than going through an intermediate pointer (as is required for the others). Wrappers are defined that will allow an LSM to extract its data from an object based on the new fields in the operations table.
The framework then maintains a list of registered LSMs and puts the capabilities LSM in the first slot of the list. When one of the security hooks is called, the framework iterates over the list and calls the corresponding hook for each registered LSM. Depending on the specific hook, different kinds of iterators are used, but the usual iterator looks for a non-zero response from an LSM's hook, which would indicate a denial of some kind, and returns that to the framework. The other iterators are used for specialized calls, for example when there is no return value or when only the first hook found should be called. The upshot is that the hooks for registered LSMs get called in order (with capabilities coming first), and the first to deny the access "wins". Because the capabilities calls are pulled out separately, that also means that the other LSMs no longer have to make those calls themselves; instead the framework will handle it for them.
But there are a handful of hooks that do not work very well in a multi-LSM environment, in particular the secid (an LSM-specific security label ID) handling routines (e.g. secid_to_secctx(), task_getsecid(), etc.). Howells's current implementation just calls the hook of the first LSM it finds that implements it, which is not going to make it possible to use multiple LSMs that all implement those hooks (currently just SELinux and Smack). Howells's solution is to explicitly ban that particular combination:
But Smack developer Casey Schaufler isn't convinced that is the right course: "That kind of takes the wind out of the sails, doesn't it?" He would rather see a more general solution that allows multiple secids, and the related secctxs (security contexts), to be handled by the framework:
Another interesting part of Schaufler's message is that he has been working on an "alternative approach" to the multi-LSM problem that he calls "Glass". The code is, as yet, unreleased, but Schaufler describes Glass as an LSM that composes other LSMs:
Unlike Howells's proposal, Glass would leave the calls to the capabilities LSM (aka commoncap) in the existing LSMs, and only call commoncap if no LSM implemented a given hook. The idea is that the LSMs already handle the capabilities calls in their hooks as needed, so it is only when none of those get called that requires a call into commoncap. In addition, Glass leaves the allocation and management of the security "blobs" (LSM-specific data for objects) to the LSMs rather than centralizing them in the framework as Howells's patches do.
In addition to various other differences, there is a more fundamental difference in the way that the two solutions handle multiple LSMs that all have hooks for a particular security operation. Glass purposely calls each hook in each registered LSM, whereas Howells's proposal typically short-circuits the chain of hooks once one of them has denied the access. Schaufler's idea is that an LSM should be able to maintain state, which means that skipping its hooks could potentially skew the access decision:
There are plenty of other issues to resolve, including things like handling /proc/self/attr/current (which contains the security ID for the current process) because various user-space programs already parse the output of that file, though it is different depending on which LSM is active. A standardized format for that file, which takes multiple LSMs into account, might be better, but it would break the kernel ABI and is thus not likely to pass muster. Overall, though, Howells and Schaufler were making some good progress on defining the requirements for supporting multiple LSMs. Schaufler is optimistic that the collaboration will bear fruit: "I think that we may be able to get past the problems that have held multiple LSMs back this time around."
So far, there is only the code from Howells to look at, but Schaufler has promised to make Glass available soon. With luck, that will lead to a multi-LSM solution that the LSM developers can coalesce behind, whether it comes from Howells, Schaufler, or a collaboration between them. There may still be a fair amount of resistance from Linus Torvalds and other kernel hackers, but the lack of any way to combine LSMs comes up too often for it to be ignored forever.
Mesh networking, as its name implies, is meant to work via a large number of short-haul connections without any sort of centralized control. A proper mesh network should configure itself dynamically, responding to the addition and removal of nodes and changes in connectivity. In a well-functioning mesh, networking "just happens" without high-level coordination; such a net should be quite hard to disrupt. What the kernel offers now falls somewhat short of that ideal, but it is a good demonstration of how hard mesh networking can be.
The "Better Approach To Mobile Ad-hoc Networking" (BATMAN) protocol is described in this draft RFC. A BATMAN mesh is made up of a set of "originators" which communicate via network interfaces - normal wireless interfaces, for example. Every so often, each originator sends out an "originator message" (OGM) as a broadcast to all of its neighbors to tell the world that it exists. Each neighbor is supposed to note the presence of the originator and forward the message onward via a broadcast of its own. Thus, over time, all nodes in the mesh should see the OGM, possibly via multiple paths, and thus each node will know (1) that it can reach the originator, and (2) which of its neighbors has the best path to that originator. Each node maintains a routing table listing every other node it has ever heard of and the best neighbor by which to reach each one.
This protocol has the advantage of building and maintaining the routing tables on the fly; no central coordination is needed. It should also find near-optimal routes to each. If a node goes away, the routing tables will reconfigure themselves to function in its absence. There is also no node in the network which has a complete view of how the mesh is built; nodes only know who is out there and the best next hop. This lack of knowledge should add to the security and robustness of the mesh.
Nodes with a connection to the regular Internet can set a bit in their OGMs to advertise that fact; that allows others without such a connection to route packets to and from the rest of the world.
The original BATMAN protocol uses UDP for the OGM messages. That design allows routing to be handled with the normal kernel routing tables, but it also imposes a couple of unfortunate constraints: nodes must obtain an IP address from somewhere before joining the mesh, and the protocol is tied to IPv4. The BATMAN-adv protocol found in the Linux kernel has changed a few things to get around these problems, making it a rather more flexible solution. BATMAN-adv works entirely at the link layer, exchanging non-UDP OGMs directly with neighboring nodes. The routing table is maintained within a special virtual network device, which makes all nodes on the mesh appear to be directly connected via that virtual interface. Thus the system can join the mesh before it has a network address, and any protocol can be run over the mesh.
BATMAN-adv removes some of the limitations found in BATMAN, but readers who have gotten this far will likely be thinking of the limitations that remain. The flooding of broadcast OGMs through the net can only scale so far before a significant amount of bandwidth is consumed by network overhead. The protocol trims OGMs which are obviously not of interest - those which describe a route which is known to be worse than others, for example - but the OGM traffic will still be significant if the mesh gets large. The routing tables will also grow, since every node must keep track of every other node in existence. The overhead for these tables is probably manageable for a mesh of 1,000 nodes; it is probably hopeless for 1,000,000 nodes. Mobile devices - which are targeted by this protocol - are especially likely to suffer as the table gets larger.
Security is also a concern in this kind of network. Simple bandwidth-consuming denial of service attacks would seem relatively straightforward. Sending bogus OGMs could cause the size of routing tables to explode or disrupt the routing within the mesh. A more clever attack could force traffic to route through a hostile node, enabling man-in-the-middle exploits. And so on. The draft RFC quickly mentions some of these issues, but it seems clear that security has not been a major design goal.
So it would seem clear that BATMAN-adv, while interesting, is not the solution to the problem of an overly-centralized network. It could be a useful way to extend connectivity through a building or small neighborhood, but it is not meant to operate on a large scale or in an overtly hostile environment. The bigger problem is a hard one to solve, to say the least. The experience gained with protocols like BATMAN-adv may will prove valuable in the search for that solution, but there is clearly some work to be done still.
Patches and updates
Core kernel code
Filesystems and block I/O
Virtualization and containers
Benchmarks and bugs
Page editor: Jonathan Corbet
Distribution collaboration manifesto. One session which arguably didn't live up to its potential was the "distribution collaboration manifesto." It did, however, let us see Debian leader Stefano Zacchiroli, Fedora leader Jared Smith, and openSUSE community manager Jos Poortvliet together on the same stage.
The discussion wandered around the topics of how nice it would be to cooperate more, cooperative application installer work, and making better use of the distributions list on freedesktop.org. It was friendly, but somewhat lacking in specifics.
Downstream packaging collaboration. A more focused session was led by Hans de Goede and Michal Hruecký of Red Hat and openSUSE, respectively. According to Hans, packaging software is not normally a difficult task. When one is dealing with less-than-optimal upstreams, though, things get harder. One must make sure that the entire package is freely licensed; ancillary files (like artwork) can often be problematic. The package must be tweaked for filesystem hierarchy standard compliance, integration with distribution policies (including writing man pages if needed), fixing build problems, getting rid of bundled libraries, and fixing the occasional bug.
That's all just part of a packager's job, but, Hans asked, what should be done with the results of that work? The obvious thing to do is to send it back upstream, and to educate the upstream project about problems like bundled libraries. But what happens if the upstream is unresponsive - or if there is no functioning upstream at all? This situation arises more often than one might expect, especially with games, it seems. Assuming that the code itself is still worth shipping, it would make sense for distributors to work together to provide a working upstream for this kind of project.
Again, specific suggestions were relatively scarce, but Hans did say that having a set of package-specific email aliases at freedesktop.org would be useful. For any given problematic package (xaw3d was one such listed), packagers at each distribution could subscribe to the appropriate list to discuss the work they have done. The list would also receive commit notifications from each distributor's version control system, so everybody could see changes being made by other distributors, comment on them, and, perhaps, pick them up as well.
Michal talked about setting up a mechanism designed specifically to let packagers share patches. He seemed to envision a shared directory somewhere where packagers would put their specific changes; subsequent discussion made it clear that some people, at least, would rather see some sort of source code management system used. Michal also called for the adoption of a set of conventions for patch metadata to describe the purpose of the patch, who shipped it, etc. Bdale Garbee suggested from the audience that what people really seem to want is a set of simple pointers to everybody's git repositories. He added that anybody who is a package maintainer and does not know who his or her counterparts are in other distributions is failing at the job and needs to go out and start meeting people.
Forking is difficult. A rather different approach to collaboration - and the lack thereof - could be found in a session led by Anne Nicolas and Michael Scherer. Anne and Michael are two of the founders of the Mageia distribution, which is a fork of Mandriva. According to Anne, Mandriva was built on a good foundation and with a great "users first" policy, but, when things started to go bad, it was a "disturbing experience." From that experience Mageia was born.
Mageia is built on the principles of "people first" and trust in the community. The distribution wants to make life as easy as possible for both users and packagers. Actually getting there is proving to be a challenge, though, with every step on the way taking far longer than had been expected. There is now a legal association in place, though, and an initial pass at a design for project governance has been done. The build system is mostly ready, and training of packagers is underway.
In the process, the developers have found that simply forking an established distribution is a lot of work. It has taken about three months to get a base set of 4100 packages ready. As they do this job, they are trying to make the job of changing the name and the look of the distribution easier for the next group that has to take it on. That should improve life for anybody who might, down the road, choose to fork Mageia; it is also aimed at making the creation of Mageia derivatives easier.
The first Mageia alpha will, with luck, be released on February 15. Current plans are to make the first stable release on June 1. June is also the target for having the full organization and governance mechanism in place. This governance is expected to be made up of around ten teams, an elected council and an elected board.
The other challenge that the Mageia developers are facing is that of creating an "economic model" which will support the work going forward. From the discussion, it seems that the main source of income at the moment is donations and T-shirts, which is unlikely to sustain a serious effort in the long term.
Fixing Gentoo. Finally, Petteri Räty led a session on the reform and future of Gentoo. Contrary to what some people may think, the Gentoo distribution is alive and well with some 235 developers maintaining packages. That said, there are some issues which need attention.
Many of these problems are organizational; the project's meta-structure has been neglected over the years. There is little accountability for people working in specific roles. Nobody can really say what Gentoo projects are ongoing, and which of those are really alive. Nobody really knows what to do about dead projects either. The relationship between the Gentoo Council and the Gentoo Foundation is not particularly clear. And there is an unfortunate split between Gentoo's users and its developers. Mentoring for new developers is in short supply.
There are plans to reinvigorate Gentoo's meta-structure project, giving it the responsibility of tracking the other outstanding projects. That should give some visibility into what is going on. The current corporate structure was described as a "two-headed monster" that needs to be straightened out. To that end, the Gentoo Foundation is finally getting close to its US 501c(3) status, making it an official nonprofit organization. The Foundation is expected to handle legal issues and the distribution's "intellectual property," while the council will be charged with technical leadership.
In summary: it seems that the Gentoo project has a number of challenges to overcome, but the project remains strong and people are working on addressing the issues.
Conclusion: the FOSDEM cross-distro track included far more talks than are listed here; there's only so many that your editor was able to attend. It's clear that the conference served as a valuable meeting point for developers who are often working independently of each other. Linux distributors are, at one level, highly competitive with each other. But they are all based on the work of one community. If they can do more of their work at the community level, that will give each distributor more time to work on the things which makes their project special. The discussions at FOSDEM can only have helped to increase understanding and collaboration across distributions, and that must be a good thing.
Debian GNU/Linuxcoordinating security fixes. Wheezy will bring in some changes: "In terms of expected larger changes, the upload of KDE 4.6 to the archive is anticipated in early March, the Ocaml team would like to move to a new upstream version and GNOME 3 is due for release in April. The GNOME team is already staging packages in experimental but this is a major upstream release and will certainly lead to its fair share of disruption when it hits unstable." After about 13 years with nearly the same design, the layout and design of the website changed with today's release of Debian Squeeze."
FedoraAny interested parties are invited to submit their bids. Once you have prepared a bid, please send an email to the fudcon-planning list. Bids will be accepted up until the end of the day on March 15th, 2011."
Ubuntu familyUnity is now the default in the Ubuntu Desktop session. It is only partially implemented at this stage, so keep an eye on the daily builds, new features and bug fixes are emerging daily!"
Newsletters and articles of interest
Page editor: Rebecca Sobol
Python 3.0 was released at the end of 2008, but so far only a relatively small number of packages have been updated to support the latest release; the majority of Python software still only supports Python 2. Python 3 introduced changes to Unicode and string handling, module importing, integer representation and division, print statements, and a number of other differences. This article will cover some of the changes that cause the most problems when porting code from Python 2 to Python 3, and will present some strategies for managing a single code base that supports both major versions.
The changes that made it into Python 3 were originally part of a plan called "Python 3000" as sort of a joke about language changes that could only be done in the distant future. The changes made up a laundry list of inconsistencies and inconvenient designs in the Python language that would have been really nice to fix, but had to wait because fixing them meant breaking all existing Python code. Eventually the weight of all the changes led the Python developers to decide to just fix the problems with a real stable release, and accept the fact that it will take a few years for most packages and users to make the switch.
The biggest change is to how strings are handled in Python 3. Python 2 has 8-bit strings and Unicode text, whereas Python 3 has Unicode text and binary data. In Python 2 you can play fast and loose with strings and Unicode text, using either type for parameters and conversion is automatic when necessary. That's great until you get some 8-bit data in a string and some function (anywhere — in your code or deep in some library you're using) needs Unicode text. Then it all falls apart. Python 2 tries to decode strings as 7-bit ASCII to get Unicode text leaving the developer, or worse yet the end user, with one of these:
Traceback (most recent call last): ... UnicodeDecodeError: 'ascii' codec can't decode byte 0xf4 in position 3: \ ordinal not in range(128)
In Python 3 there are no more automatic conversions, and the default is Unicode text almost everywhere. While Python 2 treats 'all\xf4' as an 8-bit string with four bytes, Python 3 treats the same literal as Unicode text with U+00F4 as the fourth character.
Files opened in text mode (the default, including for sys.stdin, sys.stdout, and sys.stderr) in Python 3 return Unicode text from read() and expect Unicode text to be passed to write(). Files opened in binary mode operate on binary data only. This change affects Python users in Linux and other Unix-like operating systems more than Windows and Mac users — files in Python 2 on Linux that are opened in binary mode are almost indistinguishable from files opened in text mode, while Windows and Mac users have been used to Python at least munging their line breaks when in text mode.
This means that much code that used to "work" (where work is defined for uses with ASCII text only) is now broken. But once that code is updated to properly account for which inputs and outputs are encoded text and which are binary, it can then be used comfortably by people whose native languages or names don't fit in ASCII. That's a pretty nice result.
Python 3's bytes type for binary data is quite different from Python 2's 8-bit strings. Python 2.6 and later have defined bytes to be the same as the str type, which a little strange because the interface has changed significantly:
>>> bytes([2,3,4]) # Python 2 '[2, 3, 4]' >>> [x for x in 'abc'] ['a', 'b', 'c']
In Python 3 b'' is used for byte literals:
>>> bytes([2,3,4]) # Python 3 b'\x02\x03\x04' >>> [x for x in b'abc'] [97, 98, 99]
Python 3's byte type can be treated like an unchanging list with values between 0 and 255. That's convenient for doing bit arithmetic and other numeric operations common to dealing with binary data, but it's quite different from the string-of-length-1 Python 2 programmers expect.
Integers have changed as well. There is no distinction between long integers and normal integers and sys.maxint is gone. Integer division has changed too. Anyone with a background in Python (or C) will tell you that:
>>> 1/2 0 >>> 1.0/2 0.5
But no longer. Python 3 returns 0.5 for both expressions. Fortunately Python 2.2 and later have an operator for floor division (//). Use it and you can be certain of an integer result.
The last big change I'll point out is to comparisons. In Python 2 comparisons (<, <=, >=, >) are always defined between all objects. When no explicit ordering is defined then all the objects of one type will either be arbitrarily considered greater or less than all the objects of another type. So you could take a list with a mix of types, sort it, and all the different types will be grouped together. Most of the time though, you really don't want to order different types of objects and this feature just hides some nasty bugs.
Python 3 now raises a TypeError any time you compare objects with incompatible types, as it should. Note that equality (==, !=) is still defined for all types.
Module importing has changed. In Python 2 the directory containing the source file is searched first when importing (called a "relative import"), then the directories in the system path are tried in order. In Python 3 relative imports must be made explicit:
from . import my_utils
The print statement has become a function in Python 3. This Python 2 code that prints a string to sys.stderr with a space instead of a newline at the end:
import sys print >>sys.stderr, 'something bad happened:',
import sys print('something bad happened:', end=' ', file=sys.stderr)
These are just some of the biggest changes. The complete list is here.
Fortunately a large number of the little incompatibilities are taken care of by the 2to3 tool that ships with Python. 2to3 takes Python 2 source code and performs some automated replacements to prepare the code to run in Python 3. Print statements become functions, Unicode text literals drop their "u" prefix, relative imports are made explicit, and so on.
Unfortunately the rest of the changes need to be made by hand.
It is reasonable to maintain a single code base that works across Python 2 and Python 3 with the help of 2to3. In the case of my library "Urwid" I am targeting Python 2.4 and up, and this is part of the compatibility code I use. When you really have to write code that takes different paths for Python 2 and Python 3 it's nice to be clear with an "if PYTHON3:" statement:
import sys PYTHON3 = sys.version_info >= (3, 0) try: # define bytes for Python 2.4, 2.5 bytes = bytes except NameError: bytes = str if PYTHON3: # for creating byte strings B = lambda x: x.encode('latin1') else: B = lambda x: x
String handling and literal strings are the most common areas that need to be updated. Some guidelines:
Use Unicode literals (u'') for all literal text in your source. That way your intention is clear and behaviour will be the same in Python 3 (2to3 will turn these into normal text strings).
Use byte literals (b'') for all literal byte strings or the B() function above if you are supporting versions of Python earlier than 2.6. B() uses the fact that the first 256 code points in Unicode map to Latin-1 to create a binary string from Unicode text.
Use normal strings ('') only in cases where 8-bit strings are expected in Python 2 but Unicode text is expected in Python 3. These cases include attribute names, identifiers, docstrings, and __repr__ return values.
Document whether your functions accept bytes or Unicode text and guard against the wrong type being passed in (eg. assert isinstance(var, unicode)), or convert to Unicode text immediately if you must accept both types.
Clearly labeling text as text and binary as binary in your source serves as documentation and may prevent you from writing code that will fail when run under Python 3.
Handling binary data across Python versions can be done a few ways. If you replace all individual byte accesses such as data[i] with data[i:i+1] then you will get a byte-string-of-length-1 in both Python 2 and Python 3. However, I prefer to follow the Python 3 convention of treating byte strings as lists of integers with some more compatibility code:
if PYTHON3: # for operating on bytes ord2 = lambda x: x chr2 = lambda x: bytes([x]) else: ord2 = ord chr2 = chr
ord2 returns the ordinal value of a byte in Python 2 or Python 3 (where it's a no-op) and chr2 converts back to a byte string. Depending on how you are processing your binary data, it might be noticeably faster to operate on the integer ordinal values instead of byte-strings-of-length-1.
Python "doctests" are snippets of test code that appear in function, class and module documentation text. The test code resembles an interactive Python session and includes the code run and its output. For simple functions this sort of testing is often enough, and it's good documentation. Doctests create a challenge for supporting Python 2 and Python 3 from the same code base, however.
2to3 can convert doctest code in the same way as the rest of the source, but it doesn't touch the expected output. Python 2 will put an "L" at the end of a long integer output and a "u" in front of Unicode strings that won't be present in Python 3, but print-ing the value will always work the same. Make sure that other code run from doctests outputs the same text all the time, and if you can't you might be able to use the ELLIPSIS flag and ... in your output to paper over small differences.
There are a number easy changes you need to make as well, including:
Use // everywhere you want floor division (mentioned above).
Derive exception classes from BaseException.
Use k in my_dict instead of my_dict.has_key(k).
Use my_list.sort(key=custom_key_fn) instead of my_list.sort(custom_sort).
Use distribute instead of Setuptools.
Python 3 is unarguably a better language than Python 2. Many people new to the language are starting with Python 3, particularly users of proprietary operating systems. Many more current Python 2 users are interested in Python 3 but are held back by the code or a library they are using.
By adding Python 3 support to an application or library you help:
make it available to the new users just starting with Python 3
encourage existing users to adopt it, knowing it won't stop them from switching to Python 3 later
clean up ambiguous use of text and binary data and find related bugs
And as a little bonus that software can then be listed among the packages with Python 3 support in the Python Packaging Index, one click from the front page.
Many popular Python packages haven't yet made the switch, but it's certainly on everyone's radar. In my case I was lucky. Members of the community already did most of the hard work porting my library to Python 3, I only had to update my tests and find ways to make the changes work with old versions of Python as well.
There is currently a divide in the Python community because of the significant differences between Python 2 and Python 3. But with some work, that divide can be bridged. It's worth the effort.
Newsletters and articles
looks at the importance of roadmaps. "The end result of a good roadmap process is that your users know where they stand, more or less, at any given time. Your developers know where you want to take the project, and can see opportunities to contribute. Your core team knows what the release criteria for the next release are, and you have agreed together mid-term and long-term goals for the project that express your common vision. As maintainer, you have a powerful tool to explain your decisions and align your community around your ideas. A good roadmap is the fertile soil on which your developer community will grow."
Page editor: Jonathan Corbet
Brief itemsAda Initiative - an organization intended to promote women in open technology and culture - has announced its existence. "The Ada Initiative will concentrate on focused, direct action programs, including recruitment and training for women, education for community members, and working with companies and projects to improve their outreach to women." The Initiative is the work of longtime community members Valerie Aurora and Mary Gardiner. joined OIN. "Open Invention Network (OIN), the company formed to enable and protect Linux, today extended its community with the signing of Mandriva as a licensee. By becoming a licensee, Mandriva has joined the growing list of organizations that recognize the importance of leveraging the Open Invention Network to further spur open source innovation."
Articles of interestcovers Sony's escalating lawsuits over the PS3 hacks. "Sony is also trying to haul the so-called "failOverflow hacking team" into court. But first, Sony needs to learn the identities and whereabouts of the group's members. They are accused of posting a rudimentary hack in December. It was refined by Hotz weeks later when he accessed the console's so-called "metldr keys," or root keys that trick the system into running unauthorized programs." posted a message from Armijn Hemel on his blog about considering what might happen to your free software (or other) project after you are gone. It's "a post that can easily ruin your mood", but one worth thinking about. "The common theme is that these people were very passionate about what they did. They truly loved their work and it work was appreciated by many. But when fate struck it turned out that they had not taken care of what would happen after they would pass away. I am very sure that they didn't expect this to happen so soon, or never realized that this could be an issue. But in the digital world, with lapsing domain name registrations, databases and webspace being deleted because of unpaid bills, offline development trees and uninformed heirs this is becoming more and more of a risk." talks with Jonathan Oxer about open hardware. ""This is where one of the interesting differences between open hardware and open software come in - with open software it's quite easy to publish the source code and the whole tool chain, like compilers or whatever else is necessary. You can give everybody, at zero cost essentially, everything they need to reproduce your work and to develop and build on it. With open hardware it's quite different. I can give someone the design parts for a project but then they need the actual materials or the tools and resources to reproduce it in order to improve on it or collaborate with me."" a review of LibreOffice. "The remainder of LibreOffice Writer's new features were also useful. I liked the page numbering tool, and I really appreciated the new Print dialog box (which is present in all of the LibreOffice tools). I know, it's a little odd to get excited about a dialog box, but I always have found the OpenOffice.org Print dialog box rather clunky, so it's LibreOffice counterpart is a breath of fresh air." writes about the amicus brief filed with the US Supreme Court by Red Hat and a diverse group of other companies in the Microsoft v. i4i case. "Once a patent is inappropriately granted, it is possible, in theory, for a party accused of infringing it to show that it is invalid. In practice, this is quite difficult. When software patents are at issue, the technical issues are often complicated and difficult for a lay jury to understand. Jurors frequently mistakenly assume that the patent examination process was careful and exhaustive, and so have a tendency to assume that a patent must be valid. On top of all this potential confusion, jurors are instructed under current rules that they may only invalidate a patent if they find the evidence for invalidity clear and convincing. Even when there's strong evidence that a patent should never have been granted, it's difficult for lay juries to conclude that the technical issues are clear." a vague report saying that Nokia is dropping MeeGo. "In a leaked internal memo, Chief Executive Stephen Elop wrote: 'We thought MeeGo would be a platform for winning high-end smartphones. However, at this rate, by the end of 2011, we might have only one MeeGo product in the market.'"
Update: the full memo has been posted on Engadget; what Nokia will do is far from clear at this point.
Contests and AwardsUbuntu, Android, MySQL, Cassandra, VLC, Puppet and Django are among the winners."
Calls for Presentationsannounced that applications are open for hosting linux.conf.au 2013. Formal submissions will be accepted until May 15, 2011. The winner will be announced at the close of linux.conf.au 2012, which will be hosted at Ballarat University in Victoria. Camp KDE has been announced. This year's Camp KDE will take place in San Francisco, California on April 4-5, 2011, preceding the Linux Foundation's Collaboration Summit. Registration is open and proposals for talks will be accepted until March 2, 2011.
Upcoming Eventsannounced this year's Community Leadership Summit on his blog. It will be held the weekend before OSCON, July 23-24, at the Oregon Convention Center in Portland. "For those of you who are unfamiliar with the CLS, it is an entirely free event designed to bring together community leaders and managers and the projects and organizations that are interested in growing and empowering a strong community. The event provides an unconference style schedule in which attendees can discuss, debate and explore topics. This is augmented with a range of scheduled talks, panel discussions, networking opportunities and more." announced for conf.kde.in, along with a list of talks and presentations. Early bird registration ends February 25. conf.kde.in will be held in Bengaluru (Bangalore), India, March 9-13, 2011. DFD is a global day to celebrate Open Standards and open document formats and its importance. Open Standards ensure the freedom to access your data, and the freedom to build Free Software to write and read data in specific formats." "startup stories". "These are the stories of companies that have built and shipped (or, in the case of Threadless - about to ship) Python systems at scale. Or, in the case of Open Stack - it is the story of the next generation "Open Cloud" platform for Python at scale." International guests should note that Kiwi PyCon is to run on the following weekend, making it a great opportunity to attend a couple of awesome Down Under conferences and hopefully do some sprinting with the locals." Most people in the tech industry know that the IPv4 address space will run dry over the next few months. The forced march to IPv6, with all of its potential downsides -- including network slowdowns and downright outages -- has only just begun. Has your organization started the move? If not, consider attending SCALE!" The 9th Annual Southern California Linux Expo (SCALE 9X) will be held February 25-27, 2011 in Los Angeles, CA.
|February 25||Build an Open Source Cloud||Los Angeles, CA, USA|
|Southern California Linux Expo||Los Angeles, CA, USA|
|February 25||Ubucon||Los Angeles, CA, USA|
|February 26||Open Source Software in Education||Los Angeles, CA, USA|
|Linux Foundation End User Summit 2011||Jersey City, NJ, USA|
|March 5||Open Source Days 2011 Community Edition||Copenhagen, Denmark|
|Drupalcon Chicago||Chicago, IL, USA|
|ConFoo Conference||Montreal, Canada|
|conf.kde.in 2011||Bangalore, India|
|PyCon 2011||Atlanta, Georgia, USA|
|March 19||Open Source Conference Oita 2011||Oita, Japan|
|Chemnitzer Linux-Tage||Chemnitz, Germany|
|March 19||OpenStreetMap Foundation Japan Mappers Symposium||Tokyo, Japan|
|Embedded Technology Conference 2011||San Jose, Costa Rica|
|OMG Workshop on Real-time, Embedded and Enterprise-Scale Time-Critical Systems||Washington, DC, USA|
|UKUUG Spring 2011 Conference||Leeds, UK|
|PgEast PostgreSQL Conference||New York City, NY, USA|
|Palmetto Open Source Software Conference||Columbia, SC, USA|
|March 26||10. Augsburger Linux-Infotag 2011||Augsburg, Germany|
|GNOME 3.0 Bangalore Hackfest | GNOME.ASIA SUMMIT 2011||Bangalore, India|
|March 28||Perth Linux User Group Quiz Night||Perth, Australia|
|NASA Open Source Summit||Mountain View, CA, USA|
|Flourish Conference 2011!||Chicago, IL, USA|
|Workshop on GCC Research Opportunities||Chamonix, France|
|April 2||Texas Linux Fest 2011||Austin, Texas, USA|
|Camp KDE 2011||San Francisco, CA, USA|
|SugarCon 11||San Francisco, CA, USA|
|Selenium Conference||San Francisco, CA, USA|
|5th Annual Linux Foundation Collaboration Summit||San Francisco, CA, USA|
|Hack'n Rio||Rio de Janeiro, Brazil|
|April 9||Linuxwochen Österreich - Graz||Graz, Austria|
|April 9||Festival Latinoamericano de Instalación de Software Libre|
|O'Reilly MySQL Conference & Expo||Santa Clara, CA, USA|
|2011 Embedded Linux Conference||San Francisco, CA, USA|
|2011 Android Builders Summit||San Francisco, CA, USA|
|April 16||Open Source Conference Kansai/Kobe 2011||Kobe, Japan|
If your event does not appear here, please tell us about it.
Page editor: Rebecca Sobol
Copyright © 2011, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds