Eben Moglen is usually an inspiring speaker, and his keynote at FOSDEM 2011
did not disappoint. Free software remains, as always, at the core of his
talks, but he has adopted a wider political vision and thinks that the
community should do the same. Our freedom, he said, depends on
reengineering the network to replace vulnerable, centralized services with
alternatives which resist government control.
The publication of Larry Lessig's
Code, Eben said, drew our attention to the fact that, in the world we
live in, code increasingly functions as law. Code does the work of the state,
but it can also serve revolution against the state.
We are seeing an enormous demonstration of the power of code now, he said.
At the same time, there is a lot of attention being paid to the publication
of Evgeny Morozov's The Net
Delusion, which makes the claim that the net is being co-opted
to control freedom worldwide. The book is meant to be a warning to
technology optimists. Eben is, he said, one of those optimists. The
lesson he draws from current events is that the right net brings freedom,
but the wrong net brings tyranny.
We have spent a lot of time making free software. In the process, we have
joined forces with other elements of the free culture world. Those forces
include people like Jimmy Wales, but also people like Julian Assange.
Wikipedia and Wikileaks, he said, are two sides of the same coin. At
FOSDEM, he said, one could see "the third side" of the coin. We are all
people who have organized to change the world without creating new
hierarchies in the process.
At the end of 2010, Wikileaks was seen mainly as a criminal operation.
Events in Tunisia changed that perception, though. Wikileaks turns out to
be an attempt to help people learn about their world. Wikileaks, he said,
is not destruction - it's freedom.
But now there are a lot of Egyptians out there whose freedom depends on the
ability to communicate through commercial operations which will respond to
pressure from the government. We are now seeing in real time the
vulnerabilities which come from the bad engineering in the current system.
Social networking, he said, changes the balance of power away from the
state and toward people. Events in countries like Iran, Tunisia, and Egypt
demonstrate its importance. But current forms of social communication are
"intensely dangerous" to use. They are too centralized and vulnerable to
state control. Their design is motivated by profit, not by freedom. As a
result, political movements are resting on a fragile foundation: the
courage of Mr. Zuckerberg or Google to resist the state - the same state
which can easily shut them down.
Likewise, real time information for people trying to build freedom
currently depends on a single California-based microblogging service which must
turn a profit. This operation is capable of deciding, on its own, to donate
its entire history to the US Library of Congress. Who knows what types
of "donations" it may have made elsewhere?
We need to fix this situation, he said, and quickly. We are "behind the
curve" of freedom movements which depend heavily on code. The longer we
wait, the more we become part of the system. That will bring tragedy
soon. Egypt is inspiring, but things there could have been far worse. The
was late to control the net and unready to be as tough as it could have
been. It is, Eben said, not hard to decapitate a revolution when everybody
is in Mr. Zuckerberg's database.
It is time to think about the consequences of what we have built - and what
we have not built yet. We have talked for years about replacing
centralized services with federated services; overcentralization is a
critical vulnerability which can lead to arrests, torture, and killings.
People are depending on technology which is built to sell them out. If we
care about freedom, we have to address this problem; we are running out of
time, and people are in harm's way. Eben does not want people who are
taking risks for freedom to be carrying an iPhone.
One thing that Egypt has showed us, like Iran did before, is that closed
networks are harmful and network "kill switches" will harm people who are
seeking freedom. What can we do when the government has clamped down on
network infrastructure? We must return to the idea of mesh networks, built
with existing equipment, which can resist governmental control. And we must
to secure, end-to-end communications over those networks. Can we do it, he
asked? Certainly, but will we? If we don't, the promise of the free
software movement will begin to be broken. Force will intervene and we
will see more demonstrations that, despite the net, the state still wins.
North America, Eben said, is becoming the heart of a global data mining
industry. When US President Dwight Eisenhower left office, he famously
warned about the power of the growing military-industrial complex. Despite
that warning, the US has, since then, spent more on defense than the rest
of the world combined. Since the events of September 11, 2001, a new
surveillance-industrial complex has grown. Eben strongly recommended
America articles published by the Washington Post. It
is eye-opening to see just how many Google-like operations there are, all
under the control of the government.
Europe's data protection laws have worked, in that they have caused all of
that data to move to North America where its use is uncontrolled. Data
mining, like any industry, tends to move to the areas where there is the
least control. There is no way that the US government is going to change
that situation; it depends on it too heavily. As a presidential candidate,
Barack Obama was against giving immunity to the telecom industry for its
role in spying on Americans. That position did not even last through the
general election. Obama's actual policies are not notably different from
his predecessor - except in the areas where they are more aggressive.
Private industry will not change things either; the profit motive will not
produce privacy or defense for people in the street. Companies trying to
earn a profit cannot do so without the good will of the government. So we
must build under the assumption that the net is untrustworthy, and that
centralized services can kill people. We cannot, he said, fool around with
this; we must replace things which create these vulnerabilities.
We know how to engineer our way out of this situation. We need to create
plug servers which are cheap and require little power, and we must fill
them with "sweet free software." We need working mesh networking,
self-constructing phone systems built with tools like OpenBTS and Asterisk,
federated social services, and anonymous publication platforms. We need to
keep our data within our houses where it is shielded by whatever
protections against physical searches remain. We need to send encrypted
email all the time. These systems can also provide perimeter defense for
more vulnerable systems and proxy servers for circumvention of national
firewalls. We can do all of it, Eben said; it is easily done on top of the
stuff we already have.
Eben concluded with an announcement of the creation of the Freedom
Box Foundation, which is dedicated to making all of this stuff available
and "cheaper than phone chargers." A generation ago, he said, we set out
to create freedom, and we are still doing it. But we have to pick up the
pace, and we have to aim our engineering more directly at politics. We
have friends in the street; if we don't help them, they will get hurt. The
good news is that we already have almost everything we need and we are more
than capable of doing the rest.
[Editor's note: as of this writing, the Freedom Box Foundation does not
appear to have a web site - stay tuned.]
[Update: Added link to The FreedomBox Foundation]
Comments (70 posted)
This year, FOSDEM had a Data
Analytics developer room, which turned out to be quite popular to the
assembled geeks in Brussels: during many of the talks the room was fully
packed. This first meeting about analyzing and learning from data had talks
looking at information retrieval, large scale data processing, machine learning, text mining, data visualization and Linked Open Data, all of which was implemented using open source tools.
Mapping WikiLeaks cables
One of the most inspiring talks in the data analytics track, which
showed just how much you can do with open source tools in data visualization, was Mapping WikiLeaks' Cablegate using Python, mongoDB, Neo4j and Gephi by Elias Showk and Julian Bilcke, two software engineers at the Centre National de la Recherche Scientifique (National Center for Scientific Research in France). Their goal was to analyze the full text of all published WikiLeaks diplomatic cables, to produce occurrence and co-occurrence networks of topics and cables, and finally to visualize how the discussions in the cables relate to each other. In short, they did this by analyzing the 3,300 cables with Python and some data extraction libraries, then they used MongoDB and Neo4j to store the documents and generate graphs, and finally they visualized and explored the graphs with Gephi.
The first step in this process, presented by Showk, is importing the cables. Luckily, the WikiLeaks cables follow a simple structure that makes this relatively easy. Showk based his work on the cablegate Python code by Mark Matienzo that scrapes data from the cables in HTML form and converts this to Python objects. For the HTML scraping, the code is using Beautiful Soup, a well-known Python HTML/XML parser that automatically converts the web pages to Unicode and can cope with errors in the HTML tree. Moreover, with a SoupStrainer object, you can tell the Beautiful Soup parser to target a specific part of the document and forget about all the boilerplate parts such as the header, footer, sidebars, and supporting information.
After the parsing, The Python natural language toolkit NLTK is used on the text body to bring more structure to the word scramble with the goal of extracting some topics. The first step is tokenization: NLTK allows easily breaking up a text into sentences and each sentence into its separate words. Then for each word the stem is determined, which means that all words are grouped by their root. For example, to analyze the topics of the WikiLeaks cables, it doesn't matter if the word in a text is "language" or "languages", so they are both grouped by their root "languag". An SHA-256 hash value of each stem is then used as a database index.
MongoDB, a document-oriented database, is used as document storage for all this data. MongoDB allows transparently inserting and reading records as Python dictionaries, as well as automatic serializing and deserializing of the objects. Then Showk queried the MongoDB database to extract the heaviest occurrences and co-occurrences of words, and converted that to a graph using the Neo4j graph database.
For the final step, visualizing and analyzing the data, Bilcke used Gephi, an open source desktop application for the visualization of complex networks. Gephi, to which Bilcke is an active contributor, is a research-oriented graph visualization tool that has been used in the past to visualize some interesting graphs, like open source communities and social networks on LinkedIn. It's based on Java and OpenGL, but it also has a headless library, the Gephi Toolkit.
So Bilcke imported the graph from the Neo4j graph database into Gephi,
and then did some manual data cleaning. The graph is quite dense and has a
lot of meaningless content, so there is some post-processing needed, like
sorting and filtering. Bilcke chose the OpenOrd layout, which
is one of the few force-directed layout algorithms that can scale to over
1 million nodes, which makes it ideal for the WikiLeaks graph. He only
had to remove some artifacts, tweak the appearance slightly, and finally
export the graph to PDF and GEXF
(Gephi's native file format).
In total the two French researchers did a full week of coding, during
which they have written 600 lines of code using four external libraries,
two database systems, and one visualization program. All the tools they used are open source, as is their code, so this is quite a nice testimonial of what you could do with open source tools in the field of visualization of big data sources. Performing the whole work flow from the original WikiLeaks HTML files to the final graph requires around five hours.
Showk and Bilcke did all this during their free time in order to learn about all these technologies. Their goal was to show that every hacker can convert a corpus of textual data to a graph that is easier for exploring topics. This could be used to find some interesting new things, but the two researchers lacked the time to do this and were more interested in the technical side. In an email, Bilcke clarifies:
Since we worked on the publicly released cables, we didn't expect any more secrets than what had been already published by media like The Guardian. So, we didn't find any unexpected secrets. Moreover, don't forget that there is a publication bias: we only see what has been released and censored by WikiLeaks.
Our maps are mostly a tool for exploration, to help people dig into large datasets of intricate topics. In a sense, our visualization helps seeing the general topics and dig into the hierarchy, level after level. You can see potentially interesting cable stories at a glance, just by looking at what seem to be clusters (sub-networks) in the map, and zoom in for details. We believe this can be used as a complement to other cablegate tools we have seen so far.
The result is published in the form of two graphs, which can be explored by anyone who wants to dig into the WikiLeaks cables. One graph, with 43,179 nodes and 237,058 edges, links topics to the cables they occur in. The other graph, with 39,808 nodes and 177,023 edges, only shows the topics and links them when they co-occur in the same cable. Interested readers can view the PDF or SVG files, but the best way is to load the .gephi files into Gephi, so you can interactively explore the graphs. For graphs of this size, though, the Gephi system requirements suggest 2 GB of RAM.
Semantic data heaven
One of the other talks was about Datalift, an experimental research project
funded by the already mentioned National Center for Scientific Research in
France. Its goal is to convert structured data in various formats (like
relational databases, CSV, or XML) to semantic data that can be
interlinked. According to François Scharffe, only when open data
meets the semantic web, will we truly see a data revolution. Because big chunks of data on an island are difficult to re-use, while data in a common format with semantic information (like RDF) paves the way for richer web applications, more precise search engines, and a lot of other advanced applications. Scharffe referred to the five stars rating system of Tim Berners-Lee:
If your data is available on the web with an open license, it gets one star. If it's available as machine-readable structured data, such as an Excel file instead of an image scan of a table, it gets two stars. The interesting things begin when you use a non-proprietary format like CSV, which gives the data three stars. But to become part of the semantic web, you need open standards to identify and search for things, like RDF and SPARQL
, which amounts to four points. And to reach data heaven (five points), you finally need to link your data to other people's data to let you benefit from the network effect.
The Datalift project is currently developing tools to facilitate all
steps in the process from raw data to published linked data, from selection
of the right vocabulary (e.g. FOAF for persons, or GeoNames for geographic locations),
conversion to RDF, and publishing of the data on the web, to interlinking
the data with other existing data sources. There are already open source
solutions for all these steps. For example, D2R Server
maps a relational database on-the-fly to RDF, and Triplify does the same for web applications
like a blog or content management system. For the publication of RDF in a
human-readable form, there is the Tabulator Firefox
extension. The Datalift project is trying to streamline this whole process.
No shortage of tools
All of the talks in the data analytics developer room where quite short, from 15 to 30 minutes. This allowed a lot of projects to pass in review. Apart from the WikiLeaks and Datalift talks, there were talks about graph databases and NoSQL databases, about query languages, about analyzing and understanding large corpora of text data using Apache Hadoop, about various tools and methods for data extraction from HTML pages, about machine learning with Python, and about a real-time search engine using Apache Lucene and S4.
The whole data analytics track showed that there's no shortage of open
source tools to deal with big amounts of data. That's good news for
statisticians and other "data scientists", which Google's Hal Varian called
"the sexy job in the next ten years". In an article in
The McKinsey Quarterly from January 2009, he wrote: "The ability
to take data-to be able to understand it, to process it, to extract value
from it, to visualize it, to communicate it-that's going to be a hugely
important skill in the next decades." Looking at all the talks at
the data analytics track at FOSDEM, it's clear that open source software
will play a big role in this trend. If the track will be hosted again at
FOSDEM 2012, it's going to need a bigger room.
Comments (1 posted)
The eyeOS project describes itself
as an open source "web desktop" — by which it means an
operating system that emulates a functional desktop environment entirely
within a single web page. The benefits certainly sound appealing: privacy,
access to your files from anywhere on the Internet, and the ability to collaborate in real-time with other users. In addition, eyeOS is AGPLv3-licensed, which makes it high on the list of software-freedom protecting web services. Still, as of February 2011, there are scores of competing ways to accomplish remote-desktop-access and real-time collaboration, so eyeOS has a tough case to make for many users.
To get a clearer picture of what makes eyeOS different from other Web-enabled desktop suites, one has to look at both the interface and the architecture. After all, AbiWord, Zoho, Google Docs, EtherPad, and even Microsoft Office now allow online collaboration on standard office tasks.
The base eyeOS system comes with a small contingent of installed applications: networking, office tools, utilities, and so on, but both the front-end and back-end APIs are open. Over the years a respectable collection of third-party eyeOS software has been developed by the community. Installing a new application requires administrative access to the eyeOS server, but it can be done through the eyeOS desktop interface itself.
You can create a new user account and test an eyeOS session on the
public eyeOS server to get a feel for the system. Visiting the try it now
page on eyeOS.org, you will notice two options, version 1.x and version
2.x, both of which as marked as "stable" and "to be used
in production environments." The version 1.x
code (which appears to be at release 1.9) has been in development since
2007. Version 2.x
is a rewrite begun in March of 2010. The instance running on the public
server sometimes shows 2.1 and sometimes 2.2 as its version number;
presumably it is 2.2 with the occasional overlooked HTML update.
The project advertises more than 250 applications available for eyeOS 1.x and more than 20 for 2.x, but those numbers count third-party applications (most hosted at eyeos-apps.org), not what is available on the demo servers. I spent some time with version 2.x first, but the lack of application availability inspired me to test-drive 1.x as well. Even after digging through the wiki, discussion forum, and main web site, I am still not sure whether or not the public "try it now" server is a free service offered by eyeOS, or a demo limited in storage space, time, or some other resource. You are required to create an account to sign on as a new user, and provide an email address, but the email address is not validated as part of the user creation process. I hope I have not been subscribed to a marketing list as a result.
After having worked with both incarnations of eyeOS for about a day, I have mixed feelings about the implementation. On the one hand, the system is surprisingly fast: the GUI is understandably slower than working with native desktop applications, but it is approximately on par with running a local virtual machine inside VirtualBox. Features that I expected to be flaky, such as sound and mouse wheel support, worked without incident. On the other hand, there is a surprising lack of polish, particularly in the older 1.x system, where one would expect the kinks to have been worked out years ago.
The user interface is inconsistent from application to application — particularly with the desktop widgets — as is the terminology (for example, there is an application variously called "eyeOS Board" and "eyeBoard" at different places in the interface). The icons are a mix of flat, 3-D, head-on and 45-degree angle perspectives, drawn in different styles and color schemes. Some of the default UI widgets are impossible-to-read, such as white text on light gray. Changing the theme requires you to "restart" your session — but there is no session restart option, only logging out entirely and logging back in manually. The system menu is in the bottom-right-corner, and uses an icon that is not quite the eyeOS logo, and looks confusingly similar to the familiar "power button" symbol. Most surprisingly, when I attempted to open the "Documents" folder in my home directory, the built-in file manager did not know how to do so, and popped up an application-selection window asking me to find the right launcher.
2.x is a bit better in terms of UI consistency, although it too suffers from mix-and-match iconography. More seriously, the application launchers on the desktop did not work, although that hiccup is ironically solved due to there being two additional sets of launchers always visible on the upper and lower desktop panels. Several of the default applications had non-functioning menu items (most noticeably the calendar, where calendar feed properties cannot be edited). If you open a new file in the text editor, it will throw away all of your changes to the current document without affording you the opportunity to save them.
Looking past the UI issues, however, there are some intrinsic properties of the system that I grew frustrated with rather quickly. For starters, I noticed that all applications start off at very small window sizes when launched, generally too small to be used without resizing. Upon reflection, though, that behavior is probably a workaround to account for the fact that the entire desktop is running in a frame within a window within your existing desktop environment: there is simply less real estate to go around.
The scarcity of screen space is exacerbated by the use of the desktop metaphor itself: things like having another task manager inside the browser and having window title bars for every application eat up space, but they don't make the applications more usable. By a similar token, eyeOS requires the user to manually log out of an eyeOS session (like one would on a desktop system) — simply closing the browser or navigating away does not close the session. That behavior makes sense if the goal is emulating a desktop OS, but it results in a security hole that undermines one of the stated goals of the project: keeping your files safe.
I was also disappointed in the default application set. Were it not for the novelty of running inside the browser, eyeOS would be a pretty weak desktop product: the calendaring application cannot subscribe to remote calendars, the word processor is minimalist, the calculator is four-function only. It is confusing to me why eyeOS 1.x needs to have a web browser application (although I was pleased to discover that you can run eyeOS from within the eyeOS browser).
Beyond the desktop
Perhaps eyeOS defenders would point me towards the still-growing library of additional applications available for installation as a way to enhance the experience. To a point, they are entirely right: if you run your own server, you can provide a considerably richer environment for your eyeOS users. It is even possible to enable Microsoft Office file format support within eyeOS's applications; this is done by installing OpenOffice.org on the server, along with the xvfb X server. There are eyeOS applications that enhance the default desktop experience with better PDF support, improved email, and additional communication tools like IRC.
But ultimately I am not persuaded that running eyeOS applications within
the eyeOS environment in the browser offers a better computing
experience than does simply running existing open source web applications.
If you browse the eyeos-apps.org application repository, most of the
non-trivial applications are ports of existing projects, like RoundCube,
Moodle, or Zoho. Considering that you need access to a full LAMP stack to
run your own instance of eyeOS, I see little advantage to running any of
those applications within the containerized environment, simply because it
emulated the existence of a desktop underneath. It is certainly not easier
to deploy RoundCube within eyeOS than it is to deploy it on your own
server, nor is it easier to secure, nor will it run faster or be easier for
users to learn.
As always, there is a trade-off
involved, including the configuration work required when you are talking
about supporting a large group of users. In my estimation, the default
eyeOS applications don't provide a powerful enough experience to say that
its simpler deployment process ultimately makes the administrator's job
easier than individually setting up other open source file sharing and
collaboration tools. EyeOS has a basic new user "skeleton" directory
system, but at the moment lacks robust tools for managing and
pre-configuring applications for big deployments.
Standing alone, the term "web desktop" could be interpreted to mean a variety of different things. ChromeOS, for example, tries to be a "web desktop" by replacing all client-side applications with web apps. On the other end of the spectrum, more and more GNOME and KDE applications are gaining the ability to seamlessly integrate network collaboration, and with the popularity of "cloud storage" services like Ubuntu One, Dropbox, and the like, it is even possible for a Linux user to store his or her desktop preferences on a remote server, thus making the same environment available everywhere. EyeOS is certainly an innovative approach to the "web desktop," but at the moment, I'm not sure it offers a compelling advantage over the web-and-desktop integration already occurring in other areas.
Comments (1 posted)
Page editor: Jonathan Corbet
Inside this week's LWN.net Weekly Edition
- Security: Linux autorun vulnerabilities?; New vulnerabilities in asterisk, bugzilla, postgresql, vlc
- Kernel: TCP initial congestion window; Removing ext2 and/or ext3; Supporting multiple LSMs; BATMAN-adv.
- Distributions: FOSDEM: Collaboration between distributions; Debian 6.0; Android 3.0
- Development: Moving to Python 3; OpenSSH, Psycopg, ulatencyd, ...
- Announcements: Ada Initiative; Mandriva Joins OIN; PS3 hack; Nokia drops MeeGo; Camp KDE, Community Leadership Summit, conf.kde.in, PyCon, ...