LWN.net Weekly Edition for February 10, 2011

Moglen on Freedom Box and making a free net

By Jonathan Corbet
February 8, 2011

Eben Moglen is usually an inspiring speaker, and his keynote at FOSDEM 2011 did not disappoint. Free software remains, as always, at the core of his talks, but he has adopted a wider political vision and thinks that the community should do the same. Our freedom, he said, depends on reengineering the network to replace vulnerable, centralized services with alternatives which resist government control.

The publication of Larry Lessig's Code, Eben said, drew our attention to the fact that, in the world we live in, code increasingly functions as law. Code does the work of the state, but it can also serve revolution against the state. We are seeing an enormous demonstration of the power of code now, he said. At the same time, there is a lot of attention being paid to the publication of Evgeny Morozov's The Net Delusion, which makes the claim that the net is being co-opted to control freedom worldwide. The book is meant to be a warning to technology optimists. Eben is, he said, one of those optimists. The lesson he draws from current events is that the right net brings freedom, but the wrong net brings tyranny.

We have spent a lot of time making free software. In the process, we have joined forces with other elements of the free culture world. Those forces include people like Jimmy Wales, but also people like Julian Assange. Wikipedia and Wikileaks, he said, are two sides of the same coin. At FOSDEM, he said, one could see "the third side" of the coin. We are all people who have organized to change the world without creating new hierarchies in the process. At the end of 2010, Wikileaks was seen mainly as a criminal operation. Events in Tunisia changed that perception, though. Wikileaks turns out to be an attempt to help people learn about their world. Wikileaks, he said, is not destruction - it's freedom.

But now there are a lot of Egyptians out there whose freedom depends on the ability to communicate through commercial operations which will respond to pressure from the government. We are now seeing in real time the vulnerabilities which come from the bad engineering in the current system.

Social networking, he said, changes the balance of power away from the state and toward people. Events in countries like Iran, Tunisia, and Egypt demonstrate its importance. But current forms of social communication are "intensely dangerous" to use. They are too centralized and vulnerable to state control. Their design is motivated by profit, not by freedom. As a result, political movements are resting on a fragile foundation: the courage of Mr. Zuckerberg or Google to resist the state - the same state which can easily shut them down.

Likewise, real time information for people trying to build freedom currently depends on a single California-based microblogging service which must turn a profit. This operation is capable of deciding, on its own, to donate its entire history to the US Library of Congress. Who knows what types of "donations" it may have made elsewhere?

We need to fix this situation, he said, and quickly. We are "behind the curve" of freedom movements which depend heavily on code. The longer we wait, the more we become part of the system. That will bring tragedy soon. Egypt is inspiring, but things there could have been far worse. The state was late to control the net and unready to be as tough as it could have been. It is, Eben said, not hard to decapitate a revolution when everybody is in Mr. Zuckerberg's database.

It is time to think about the consequences of what we have built - and what we have not built yet. We have talked for years about replacing centralized services with federated services; overcentralization is a critical vulnerability which can lead to arrests, torture, and killings. People are depending on technology which is built to sell them out. If we care about freedom, we have to address this problem; we are running out of time, and people are in harm's way. Eben does not want people who are taking risks for freedom to be carrying an iPhone.

One thing that Egypt has showed us, like Iran did before, is that closed networks are harmful and network "kill switches" will harm people who are seeking freedom. What can we do when the government has clamped down on network infrastructure? We must return to the idea of mesh networks, built with existing equipment, which can resist governmental control. And we must go back to secure, end-to-end communications over those networks. Can we do it, he asked? Certainly, but will we? If we don't, the promise of the free software movement will begin to be broken. Force will intervene and we will see more demonstrations that, despite the net, the state still wins.

North America, Eben said, is becoming the heart of a global data mining industry. When US President Dwight Eisenhower left office, he famously warned about the power of the growing military-industrial complex. Despite that warning, the US has, since then, spent more on defense than the rest of the world combined. Since the events of September 11, 2001, a new surveillance-industrial complex has grown. Eben strongly recommended reading the Top Secret America articles published by the Washington Post. It is eye-opening to see just how many Google-like operations there are, all under the control of the government.

Europe's data protection laws have worked, in that they have caused all of that data to move to North America where its use is uncontrolled. Data mining, like any industry, tends to move to the areas where there is the least control. There is no way that the US government is going to change that situation; it depends on it too heavily. As a presidential candidate, Barack Obama was against giving immunity to the telecom industry for its role in spying on Americans. That position did not even last through the general election. Obama's actual policies are not notably different from those of his predecessor - except in the areas where they are more aggressive.

Private industry will not change things either; the profit motive will not produce privacy or defense for people in the street. Companies trying to earn a profit cannot do so without the good will of the government. So we must build under the assumption that the net is untrustworthy, and that centralized services can kill people. We cannot, he said, fool around with this; we must replace things which create these vulnerabilities.

We know how to engineer our way out of this situation. We need to create plug servers which are cheap and require little power, and we must fill them with "sweet free software." We need working mesh networking, self-constructing phone systems built with tools like OpenBTS and Asterisk, federated social services, and anonymous publication platforms. We need to keep our data within our houses where it is shielded by whatever protections against physical searches remain. We need to send encrypted email all the time. These systems can also provide perimeter defense for more vulnerable systems and proxy servers for circumvention of national firewalls. We can do all of it, Eben said; it is easily done on top of the stuff we already have.

Eben concluded with an announcement of the creation of the Freedom Box Foundation, which is dedicated to making all of this stuff available and "cheaper than phone chargers." A generation ago, he said, we set out to create freedom, and we are still doing it. But we have to pick up the pace, and we have to aim our engineering more directly at politics. We have friends in the street; if we don't help them, they will get hurt. The good news is that we already have almost everything we need and we are more than capable of doing the rest.

[Editor's note: as of this writing, the Freedom Box Foundation does not appear to have a web site - stay tuned.]

[Update: Added link to The FreedomBox Foundation]

Comments (70 posted)

FOSDEM: Mapping WikiLeaks using open-source tools

February 9, 2011

This article was contributed by Koen Vervloesem

This year, FOSDEM had a Data Analytics developer room, which turned out to be quite popular to the assembled geeks in Brussels: during many of the talks the room was fully packed. This first meeting about analyzing and learning from data had talks looking at information retrieval, large scale data processing, machine learning, text mining, data visualization and Linked Open Data, all of which was implemented using open source tools.

Mapping WikiLeaks cables

One of the most inspiring talks in the data analytics track, which showed just how much you can do with open source tools in data visualization, was Mapping WikiLeaks' Cablegate using Python, mongoDB, Neo4j and Gephi by Elias Showk and Julian Bilcke, two software engineers at the Centre National de la Recherche Scientifique (National Center for Scientific Research in France). Their goal was to analyze the full text of all published WikiLeaks diplomatic cables, to produce occurrence and co-occurrence networks of topics and cables, and finally to visualize how the discussions in the cables relate to each other. In short, they did this by analyzing the 3,300 cables with Python and some data extraction libraries, then they used MongoDB and Neo4j to store the documents and generate graphs, and finally they visualized and explored the graphs with Gephi.

The first step in this process, presented by Showk, is importing the cables. Luckily, the WikiLeaks cables follow a simple structure that makes this relatively easy. Showk based his work on the cablegate Python code by Mark Matienzo that scrapes data from the cables in HTML form and converts this to Python objects. For the HTML scraping, the code is using Beautiful Soup, a well-known Python HTML/XML parser that automatically converts the web pages to Unicode and can cope with errors in the HTML tree. Moreover, with a SoupStrainer object, you can tell the Beautiful Soup parser to target a specific part of the document and forget about all the boilerplate parts such as the header, footer, sidebars, and supporting information.

After the parsing, The Python natural language toolkit NLTK is used on the text body to bring more structure to the word scramble with the goal of extracting some topics. The first step is tokenization: NLTK allows easily breaking up a text into sentences and each sentence into its separate words. Then for each word the stem is determined, which means that all words are grouped by their root. For example, to analyze the topics of the WikiLeaks cables, it doesn't matter if the word in a text is "language" or "languages", so they are both grouped by their root "languag". An SHA-256 hash value of each stem is then used as a database index.

MongoDB, a document-oriented database, is used as document storage for all this data. MongoDB allows transparently inserting and reading records as Python dictionaries, as well as automatic serializing and deserializing of the objects. Then Showk queried the MongoDB database to extract the heaviest occurrences and co-occurrences of words, and converted that to a graph using the Neo4j graph database.

For the final step, visualizing and analyzing the data, Bilcke used Gephi, an open source desktop application for the visualization of complex networks. Gephi, to which Bilcke is an active contributor, is a research-oriented graph visualization tool that has been used in the past to visualize some interesting graphs, like open source communities and social networks on LinkedIn. It's based on Java and OpenGL, but it also has a headless library, the Gephi Toolkit.

So Bilcke imported the graph from the Neo4j graph database into Gephi, and then did some manual data cleaning. The graph is quite dense and has a lot of meaningless content, so there is some post-processing needed, like sorting and filtering. Bilcke chose the OpenOrd layout, which is one of the few force-directed layout algorithms that can scale to over 1 million nodes, which makes it ideal for the WikiLeaks graph. He only had to remove some artifacts, tweak the appearance slightly, and finally export the graph to PDF and GEXF (Gephi's native file format).

In total the two French researchers did a full week of coding, during which they have written 600 lines of code using four external libraries, two database systems, and one visualization program. All the tools they used are open source, as is their code, so this is quite a nice testimonial of what you could do with open source tools in the field of visualization of big data sources. Performing the whole work flow from the original WikiLeaks HTML files to the final graph requires around five hours.

Showk and Bilcke did all this during their free time in order to learn about all these technologies. Their goal was to show that every hacker can convert a corpus of textual data to a graph that is easier for exploring topics. This could be used to find some interesting new things, but the two researchers lacked the time to do this and were more interested in the technical side. In an email, Bilcke clarifies:

Since we worked on the publicly released cables, we didn't expect any more secrets than what had been already published by media like The Guardian. So, we didn't find any unexpected secrets. Moreover, don't forget that there is a publication bias: we only see what has been released and censored by WikiLeaks. Our maps are mostly a tool for exploration, to help people dig into large datasets of intricate topics. In a sense, our visualization helps seeing the general topics and dig into the hierarchy, level after level. You can see potentially interesting cable stories at a glance, just by looking at what seem to be clusters (sub-networks) in the map, and zoom in for details. We believe this can be used as a complement to other cablegate tools we have seen so far.

The result is published in the form of two graphs, which can be explored by anyone who wants to dig into the WikiLeaks cables. One graph, with 43,179 nodes and 237,058 edges, links topics to the cables they occur in. The other graph, with 39,808 nodes and 177,023 edges, only shows the topics and links them when they co-occur in the same cable. Interested readers can view the PDF or SVG files, but the best way is to load the .gephi files into Gephi, so you can interactively explore the graphs. For graphs of this size, though, the Gephi system requirements suggest 2 GB of RAM.

Semantic data heaven

One of the other talks was about Datalift, an experimental research project funded by the already mentioned National Center for Scientific Research in France. Its goal is to convert structured data in various formats (like relational databases, CSV, or XML) to semantic data that can be interlinked. According to François Scharffe, only when open data meets the semantic web, will we truly see a data revolution. Because big chunks of data on an island are difficult to re-use, while data in a common format with semantic information (like RDF) paves the way for richer web applications, more precise search engines, and a lot of other advanced applications. Scharffe referred to the five stars rating system of Tim Berners-Lee:

If your data is available on the web with an open license, it gets one star. If it's available as machine-readable structured data, such as an Excel file instead of an image scan of a table, it gets two stars. The interesting things begin when you use a non-proprietary format like CSV, which gives the data three stars. But to become part of the semantic web, you need open standards to identify and search for things, like RDF and SPARQL, which amounts to four points. And to reach data heaven (five points), you finally need to link your data to other people's data to let you benefit from the network effect.

The Datalift project is currently developing tools to facilitate all steps in the process from raw data to published linked data, from selection of the right vocabulary (e.g. FOAF for persons, or GeoNames for geographic locations), conversion to RDF, and publishing of the data on the web, to interlinking the data with other existing data sources. There are already open source solutions for all these steps. For example, D2R Server maps a relational database on-the-fly to RDF, and Triplify does the same for web applications like a blog or content management system. For the publication of RDF in a human-readable form, there is the Tabulator Firefox extension. The Datalift project is trying to streamline this whole process.

No shortage of tools

All of the talks in the data analytics developer room where quite short, from 15 to 30 minutes. This allowed a lot of projects to pass in review. Apart from the WikiLeaks and Datalift talks, there were talks about graph databases and NoSQL databases, about query languages, about analyzing and understanding large corpora of text data using Apache Hadoop, about various tools and methods for data extraction from HTML pages, about machine learning with Python, and about a real-time search engine using Apache Lucene and S4.

The whole data analytics track showed that there's no shortage of open source tools to deal with big amounts of data. That's good news for statisticians and other "data scientists", which Google's Hal Varian called "the sexy job in the next ten years". In an article in The McKinsey Quarterly from January 2009, he wrote: "The ability to take data-to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it-that's going to be a hugely important skill in the next decades." Looking at all the talks at the data analytics track at FOSDEM, it's clear that open source software will play a big role in this trend. If the track will be hosted again at FOSDEM 2012, it's going to need a bigger room.

Comments (1 posted)

The eyeOS web desktop

February 9, 2011

This article was contributed by Nathan Willis

The eyeOS project describes itself as an open source "web desktop" — by which it means an operating system that emulates a functional desktop environment entirely within a single web page. The benefits certainly sound appealing: privacy, access to your files from anywhere on the Internet, and the ability to collaborate in real-time with other users. In addition, eyeOS is AGPLv3-licensed, which makes it high on the list of software-freedom protecting web services. Still, as of February 2011, there are scores of competing ways to accomplish remote-desktop-access and real-time collaboration, so eyeOS has a tough case to make for many users.

To get a clearer picture of what makes eyeOS different from other Web-enabled desktop suites, one has to look at both the interface and the architecture. After all, AbiWord, Zoho, Google Docs, EtherPad, and even Microsoft Office now allow online collaboration on standard office tasks.

Although it runs inside the browser, to the user eyeOS looks and feels like a standard office desktop: from the menu system and toolbars to the file manager and user preferences. To get started, you have to have an account on the remote eyeOS server, and once you log in, a full desktop session starts on the server and in the browser window. In fact, the desktop computing paradigm is built into the core of eyeOS's design. The back-end of the desktop environment runs as a PHP application on the server, and the browser is used only to run a JavaScript-based interface, which communicates to the server process via AJAX.

Any application that the user opens (word processor, email client, or even file manager) is launched as a separate server-side process that communicates with the eyeOS desktop process on the back-end, while its front-end communicates with the JavaScript GUI running in the browser. That may sound straightforward — the JavaScript front-end mirroring the behavior of the window system, and the back-end behaving like a traditional OS — but it is a distinctly different model than used by other web-based services like Zoho, which does not use a separate overseer process to manage individual application components.

The base eyeOS system comes with a small contingent of installed applications: networking, office tools, utilities, and so on, but both the front-end and back-end APIs are open. Over the years a respectable collection of third-party eyeOS software has been developed by the community. Installing a new application requires administrative access to the eyeOS server, but it can be done through the eyeOS desktop interface itself.

Test driving

You can create a new user account and test an eyeOS session on the public eyeOS server to get a feel for the system. Visiting the try it now page on eyeOS.org, you will notice two options, version 1.x and version 2.x, both of which as marked as "stable" and "to be used in production environments". The version 1.x code (which appears to be at release 1.9) has been in development since 2007. Version 2.x is a rewrite begun in March of 2010. The instance running on the public server sometimes shows 2.1 and sometimes 2.2 as its version number; presumably it is 2.2 with the occasional overlooked HTML update.

The project advertises more than 250 applications available for eyeOS 1.x and more than 20 for 2.x, but those numbers count third-party applications (most hosted at eyeos-apps.org), not what is available on the demo servers. I spent some time with version 2.x first, but the lack of application availability inspired me to test-drive 1.x as well. Even after digging through the wiki, discussion forum, and main web site, I am still not sure whether or not the public "try it now" server is a free service offered by eyeOS, or a demo limited in storage space, time, or some other resource. You are required to create an account to sign on as a new user, and provide an email address, but the email address is not validated as part of the user creation process. I hope I have not been subscribed to a marketing list as a result.

After having worked with both incarnations of eyeOS for about a day, I have mixed feelings about the implementation. On the one hand, the system is surprisingly fast: the GUI is understandably slower than working with native desktop applications, but it is approximately on par with running a local virtual machine inside VirtualBox. Features that I expected to be flaky, such as sound and mouse wheel support, worked without incident. On the other hand, there is a surprising lack of polish, particularly in the older 1.x system, where one would expect the kinks to have been worked out years ago.

The user interface is inconsistent from application to application — particularly with the desktop widgets — as is the terminology (for example, there is an application variously called "eyeOS Board" and "eyeBoard" at different places in the interface). The icons are a mix of flat, 3-D, head-on and 45-degree angle perspectives, drawn in different styles and color schemes. Some of the default UI widgets are impossible-to-read, such as white text on light gray. Changing the theme requires you to "restart" your session — but there is no session restart option, only logging out entirely and logging back in manually. The system menu is in the bottom-right-corner, and uses an icon that is not quite the eyeOS logo, and looks confusingly similar to the familiar "power button" symbol. Most surprisingly, when I attempted to open the "Documents" folder in my home directory, the built-in file manager did not know how to do so, and popped up an application-selection window asking me to find the right launcher.

2.x is a bit better in terms of UI consistency, although it too suffers from mix-and-match iconography. More seriously, the application launchers on the desktop did not work, although that hiccup is ironically solved due to there being two additional sets of launchers always visible on the upper and lower desktop panels. Several of the default applications had non-functioning menu items (most noticeably the calendar, where calendar feed properties cannot be edited). If you open a new file in the text editor, it will throw away all of your changes to the current document without affording you the opportunity to save them.

Looking past the UI issues, however, there are some intrinsic properties of the system that I grew frustrated with rather quickly. For starters, I noticed that all applications start off at very small window sizes when launched, generally too small to be used without resizing. Upon reflection, though, that behavior is probably a workaround to account for the fact that the entire desktop is running in a frame within a window within your existing desktop environment: there is simply less real estate to go around.

The scarcity of screen space is exacerbated by the use of the desktop metaphor itself: things like having another task manager inside the browser and having window title bars for every application eat up space, but they don't make the applications more usable. By a similar token, eyeOS requires the user to manually log out of an eyeOS session (like one would on a desktop system) — simply closing the browser or navigating away does not close the session. That behavior makes sense if the goal is emulating a desktop OS, but it results in a security hole that undermines one of the stated goals of the project: keeping your files safe.

I was also disappointed in the default application set. Were it not for the novelty of running inside the browser, eyeOS would be a pretty weak desktop product: the calendaring application cannot subscribe to remote calendars, the word processor is minimalist, the calculator is four-function only. It is confusing to me why eyeOS 1.x needs to have a web browser application (although I was pleased to discover that you can run eyeOS from within the eyeOS browser).

Beyond the desktop

Perhaps eyeOS defenders would point me towards the still-growing library of additional applications available for installation as a way to enhance the experience. To a point, they are entirely right: if you run your own server, you can provide a considerably richer environment for your eyeOS users. It is even possible to enable Microsoft Office file format support within eyeOS's applications; this is done by installing OpenOffice.org on the server, along with the xvfb X server. There are eyeOS applications that enhance the default desktop experience with better PDF support, improved email, and additional communication tools like IRC.

But ultimately I am not persuaded that running eyeOS applications within the eyeOS environment in the browser offers a better computing experience than does simply running existing open source web applications. If you browse the eyeos-apps.org application repository, most of the non-trivial applications are ports of existing projects, like RoundCube, Moodle, or Zoho. Considering that you need access to a full LAMP stack to run your own instance of eyeOS, I see little advantage to running any of those applications within the containerized environment, simply because it emulated the existence of a desktop underneath. It is certainly not easier to deploy RoundCube within eyeOS than it is to deploy it on your own server, nor is it easier to secure, nor will it run faster or be easier for users to learn.

As always, there is a trade-off involved, including the configuration work required when you are talking about supporting a large group of users. In my estimation, the default eyeOS applications don't provide a powerful enough experience to say that its simpler deployment process ultimately makes the administrator's job easier than individually setting up other open source file sharing and collaboration tools. EyeOS has a basic new user "skeleton" directory system, but at the moment lacks robust tools for managing and pre-configuring applications for big deployments.

Standing alone, the term "web desktop" could be interpreted to mean a variety of different things. ChromeOS, for example, tries to be a "web desktop" by replacing all client-side applications with web apps. On the other end of the spectrum, more and more GNOME and KDE applications are gaining the ability to seamlessly integrate network collaboration, and with the popularity of "cloud storage" services like Ubuntu One, Dropbox, and the like, it is even possible for a Linux user to store his or her desktop preferences on a remote server, thus making the same environment available everywhere. EyeOS is certainly an innovative approach to the "web desktop," but at the moment, I'm not sure it offers a compelling advantage over the web-and-desktop integration already occurring in other areas.

Comments (1 posted)

Page editor: Jonathan Corbet

Inside this week's LWN.net Weekly Edition

Security: Linux autorun vulnerabilities?; New vulnerabilities in asterisk, bugzilla, postgresql, vlc
Kernel: TCP initial congestion window; Removing ext2 and/or ext3; Supporting multiple LSMs; BATMAN-adv.
Distributions: FOSDEM: Collaboration between distributions; Debian 6.0; Android 3.0
Development: Moving to Python 3; OpenSSH, Psycopg, ulatencyd, ...
Announcements: Ada Initiative; Mandriva Joins OIN; PS3 hack; Nokia drops MeeGo; Camp KDE, Community Leadership Summit, conf.kde.in, PyCon, ...

Next page: Security>>