LWN.net Logo

LWN.net Weekly Edition for December 15, 2011

Apertium: An open source translation engine

December 14, 2011

This article was contributed by Nathan Willis

Understanding human language is a notoriously complicated, hard-to-reduce problem without simple software solutions. On the plus side, this is one of the few things that prevent the machines from becoming our masters, but it also complicates all manner of natural-language processing tasks, from grammar-checking to speech recognition. High on that list of tasks is automated (or semi-automated) translation from one human language to another. Most of us are well acquainted with the benefits of automatically-translated web pages, so it is easy to imagine other areas that would see a boost from better automatic translation: email, instant messaging, video subtitles, or code comments and documentation, to name just a few.

When we covered the Transifex collaborative translation tool in September, I started reading up on open source machine translation (MT) engines. MT is not a particularly fast-paced area of study; most of the projects are focused on academic research rather than on producing user-friendly applications and software modules. In addition, a large percentage of the projects are also focused on either one particular language, or on a particular language pair — which underscores one of the topic's major hurdles: once you stray outside of related languages, our forms of communication can become so different that it is hard for humans to accurately translate between them, much less machines.

Still, there are a few free software projects that both produce usable code and cover a broad enough set of languages to play around with. The leader of the pack is Apertium, a GPL-licensed package originally developed with funding from the Spanish government. Apertium is a rule-based (or transfer-based) MT system, meaning that it takes the input text in one language and attempts to break it down into an intermediate, abstract representation. The abstract message is then used to compose an equivalent text in the output language.

Apertium 101

In 2004, Apertium started off targeting the native languages of Spain (Castillian Spanish, Catalan, Basque, Galician, Asturian, and Occitan), but has subsequently branched out, and now boasts stable support for more than 20 languages. Currently the supported languages are all of European origin, but the unstable set includes considerably more, several of which hail from other continents.

The engine itself requires that language support be built in pairs, with each module consisting of three pieces: one "morphological dictionary" for each of the two languages (which includes both words and rules for how they are inflected), and a third file that holds word mappings between the languages. The project's New Language Pair HOWTO goes into further detail about the XML-based syntax of the dictionaries, and how they mark up various parts of speech and language features. Despite the one-to-one nature of the translation engine itself, the various front-end tools written for Apertium can chain together several of these one-to-one relationships and translate between many more languages. However, there are several languages that only exist in one language pair and thus effectively form closed loops — such as Norwegian Nynorsk and Norwegian Bokmål.

The engine's documentation is actually quite thorough, although that does not make it simple to dig in and start translating using the bundled tools. The translation process involves a number of discrete steps: simply turning the input into the intermediate form requires parsing the input to strip out formatting, breaking the input text into segments, finding the base form and appropriate morphological tags for each word, and choosing the best-fit match for each of the ambiguous bits (both for words that could resolve to more than one match, such as homographs and heteronyms, and for sentences and phrases where the meaning itself is unclear). After that, the engine tries to match the intermediate representation to matching base words in the target dictionary, then it has to rearrange and grammatically reconstruct the message into words, chunks, and sentences in the target language, and ultimately re-apply any original formatting.

Each step in the translation process is implemented as a separate module; Apertium uses finite-state transducers (FSTs), which are finite state machines that have separate input and output "tapes." The online documentation does not go into as much detail about the various modules as it is available in the downloadable PDF manual; if you are interested in learning more about Apertium's lexical processing capabilities, the manual is the place to start.

However, it seems clear that a substantial portion of Apertium's skill on any particular translation task is bound up in the quality of the language-pairs, and not with the core modules themselves. Dictionary creation takes up more of the documentation than hacking on the code itself does — and with good reason. The quality of a dictionary sets lower- and upper-bounds on the quality of the output possible. Thus, the project encourages users to create new language pairs, contribute to existing pairs, and tackle tricky phenomena in languages. The manual, too, spends as much time discussing good language data as it does the functionality of the modules — and the project posts statistics on the quality of the language pairs, rather than the individual modules of the engine.

Your first translation(s)

The best way to assess Apertium's readiness for real-world use is to run it on a text of your own, of course. The code is available for Linux, Windows, and Mac OS X. The latest release of the engine is 3.2, from September 2010. It is packaged in Debian (and thus is automatically available for most Debian derivatives, including Ubuntu), and the project provides installation instructions for other popular distributions. The necessary pieces include the apertium package, libapertium, and auxiliary tools provided in lttoolbox and liblttoolbox. But the language pairs are provided as separate packages, each updated independently; they average around 1MB in size, so if space is available the safest bet is to install them all.

The code and language pairs are also available through Subversion. If building from source, you must compile and install the lttoolbox packages first, followed by Apertium itself, and the language modules last. Some language pairs also require an additional Constraint Grammar module be installed separately. The wiki lists the Breton-French and Nynorsk-Bokmål pairs, as examples — but there could be others.

The apertium package include a command-line tool of the same name. To translate a single file, the syntax is:

    apertium path_to_language_pair_directory text_format translation_pair < infile > outfile
The translation_pair argument indicates the direction of the translation (e.g. en-es or es-en). The text format can be txt, html, or rtf. You can also omit the infile and outfile arguments and enter text directly, followed by Ctrl-D.

The project also runs an online translation server on the Demo page of its web site — as well as a bleeding-edge demo bolstered with the unstable language pairs, found at http://xixona.dlsi.ua.es/testing/. The stable demo server can also translate web pages, perform dictionary lookups, and translate uploaded documents (including OpenOffice/LibreOffice ODTs in addition to the formats supported by the command line tool).

[apertium-tolk]

For those of us on Linux, an experimental D-Bus service is also available, which is intended to make Apertium more accessible to desktop application developers. The simplest demo application using the D-Bus option is Apertium-tolk, a GTK+ text translation tool written in Python. You simply type text into the top pane of the window, select the appropriate language pair from the drop-down selector, and Apertium generates translated text in the bottom pane. Apertium-tolk did not recognize accented characters in my tests (which appears to be a recurring issue with Python expecting a different character encoding), but for input text without accents, it produced results on par with the demo web application — with significantly less lag time.

Translation in practice

The demo server and Apertium-tolk are not designed for "serious" work, however. One of the recurring themes on MT sites is the notion that an automatic translation can never replace a human translation — it may serve as a tool to bootstrap a translation or save time on long texts, but an unattended translation is simply not feasible. Consequently, MT tools like Apertium are designed to be incorporated into human-translators' workflows. Those workflows are often very task-specific: someone translating an ebook has different requirements than someone translating gettext using Transifex, and both have different requirements than someone translating subtitles in a video.

[Apertium Subtitles]

On that front, it looks like Apertium has been making inroads in recent years. There are two projects for video subtitle editing: Apertium Subtitles, which is developed entirely by the project, and a separate extension for the Gaupol editor. I tested both on some randomly-selected .SRT subtitles files, and (as with Apertium-tolk), both ran into trouble with text encoding and were unable to figure out a good portion of the words.

Regrettably, that makes it virtually impossible to estimate the accuracy of a translation language pair, because removing multiple words from a sentence derails the morphological analyzer (the module whose job it is to decide how break sentences up into meaningful chunks). This does not appear to be a bug in the Apertium core; just a problem some users hit with the GUI front-ends. Because that step comes before translating individual words, when it fails the stages of the process that follow it fail, too. You can always fall back on the command-line tool (which did not hiccup on accented characters in my tests), but for translation projects that do not revolve around one of the supported file formats, that is hardly a realistic option.

But outside developers are using Apertium to build some other interesting front-end services. At least two are focused on online text; there is an AGPL-licensed "social translation" system called Tradubi, which focuses on building and customizing language pairs, and a WordPress translation plugin for posts called Transposh. Virtaal is a tool for translating .po text (and, according to the site, will soon tackle other things). It makes up for the highly technical vocabulary found in software strings by augmenting the MT engine with pre-loaded suggestions. Although it has not proven itself practical in the field yet, one of my particular favorites is a plugin for XChat that calls the Apertium D-Bus service to translate other users' comments.

In its current state, Apertium is not a free drop-in-replacement for Google Translate, but its capabilities are impressive — the third-party tools built on top of it exhibit a degree of polish that the demo applications do not, and you can easily imagine them growing into truly useful services. The developer base is small, but active — the mailing list traffic is steady, there is a roadmap, and the project has participated in Google's Summer of Code.

The one piece that still seems to be missing is a straightforward way for users to contribute useful data back into the language dictionaries. For example, one of the test texts I used on all of the Apertium front-ends was the opening paragraph of Don Quixote, and to my surprise, Apertium was unable to translate some very common words (including cuyo and hidalgo from the first sentence...). Most of the proprietary translation web services provide a simple "suggest a better translation" mechanism; not only does Apertium not offer such a feature, but building and refining the language pairs is an arcane, manual, and closed-off process. Most have individual maintainers, which makes sense from a management perspective, but considering how much of Apertium's intelligence resides in its language pairs, building better tools to improve them would go a long way toward improving the project as a whole.

Comments (8 posted)

An update on the Ada Initiative

December 13, 2011

This article was contributed by Valerie Aurora (formerly Henson)

The Ada Initiative is a non-profit dedicated to increasing the participation of women in open technology and culture. In other words, we want more women in open source, Wikipedia, and the rest of our brave new Internet world. A lot of people agree with that goal - at least that's what our first Ada Initiative survey told us. (Note that the Ada Initiative has absolutely nothing to do with the Ada programming language, other than sharing a namesake. Your author wrote Ada 95 for a living once and sincerely hopes to never touch another "bondage-and-discipline" language again.)

LWN readers might remember us from our launch announcement back in February, as well as our first "Seed 100" fundraising campaign. This article is an update on what Ada Initiative has done since its founding, what we're doing next, and what you can do to help.

Accomplishments this year

The most surprising and visible change in open tech/culture communities over the last year was the widespread adoption of some form of code of conduct or anti-harassment policy by over 30 conferences and organizations. (The exact number is hard to determine since some organizations adopted the policy for all their events and put on dozens of events per year.) Many major Linux and open source conferences have a policy: all Linux Foundation events, including Plumbers, LinuxCon, and Kernel Summit, linux.conf.au, all Ubuntu Developer Summits, several PyCons, OSBridge, all O'Reilly conferences (pledged), and many more. The idea that everyone should be able to attend a conference without expecting to be harassed or threatened is spreading to fan, science-fiction, and open culture events as well. Given the level of controversy a year ago, this shows a strong change in public opinion across a broad swath of the open technology and culture community.

We ran two surveys this year. First was the Ada Initiative Census (part 1 and part 2), with over 2800 responses (about 1600 from women). We ran this survey to find out what people thought about women in open technology and culture, which communities had more women than others, and if people felt that having more women was a good goal or not. A lot of people had told us that they wanted more women in open tech/culture, and that they felt many communities weren't very welcoming of women, but it was good to get statistical confirmation from several thousand people.

Our second survey asked attendees of the Grace Hopper Celebration (mainly young women in university computer science programs) about their attitudes towards careers in open source. It was an extremely simple and open-ended survey, nonetheless two common themes appeared: most believed you couldn't get paid to write open source software or that it paid much less than closed source, and that the "personalities" and culture of open source were intimidating and unpleasant. This is important information to know so that efforts to recruit new college graduates into open source jobs can be successful.

We also achieved our goal of being the "go-to" organization for advice on how to respond to incidents of harassment in a way that says women are a welcome and valued part of the community. Often it's merely an issue of raising awareness: Most people simply don't know that harassment and bad behavior is happening. If you're a famous, well-known, influential member of the open source community, you're likely to be treated very well, and if you do run into any obnoxiousness, you have a lot of friends willing to come to your aid quickly. You won't have much of an idea of how newcomers are treated, or people who look different from you, or don't have as many friends as you, unless you go looking for their stories. One place to find these stories is on the "Timeline of sexist incidents in geek communities" maintained on the Geek Feminism wiki.

We organized our first AdaCamp for January 14, 2012, in Melbourne, Australia (the Saturday before linux.conf.au 2012 in nearby Ballarat). AdaCamp is a small invitation-only unconference bringing together a variety of people to collaborate on ways to increase the participation of women in open technology and culture. We have people from open source, but are making a strong effort to bring in people from Wikipedia, fan culture, and other areas. Applications are currently open; if you know someone who should attend, encourage them to apply! For our North American AdaCampers, our next AdaCamp is tentatively planned to coincide with Wikimania in Washington D.C. in July.

We wrote a first draft of Ada's Advice, a guide to useful resources for people who want to help women in open tech/culture, organized by the role of the person looking for advice: parent of a young daughter, employer looking to hire more women, women in open tech/culture themselves. I'm constantly trying to find that link to that one article that I vaguely remember being somewhere on the Geek Feminism Wiki and failing; this is my solution. We are also planning to write short summaries of books and longer articles, as well as some original content and updating older content (such as generalizing HOWTO Encourage Women in Linux to open tech/culture overall). We think that people shouldn't have to read ten books before they can start helping women effectively.

Ada's Careers is a project in the planning stage. This is our answer to the abandoned job postings mailing list - you know, the one you create after recruiters keep trying to post jobs to your development mailing list and then no one ever reads again? Well, we want to create a career development community: a place where women hang out all the time because it helps them at all stages of their careers, not just when they are looking for a new job. Finally, we'll have an answer to "Where do I put my job posting where qualified women will read it?"

Another project we'd like to run is First Patch Week. Often, experience writing open source is a prerequisite to getting a job in open source. At the same time, women face extra barriers to getting that unpaid experience, starting with local user group meetings that are often uncomfortable for women to attend, to IRC servers where users perceived to be female are 25 times more likely [PDF] to get a nasty private message than those percieved to be male. We want to partner with an open source company to donate a week of their programmers' time to mentor women through the process of creating and submitting their first patch to an open source project. This will be an expensive and time-consuming project to run the first time, but will get easier as we repeat it, and will have a major, direct effect on the number of women available and qualified to be hired by open source companies.

We have some other project ideas but these are the ones we're most likely to do soon. What project do you want to see finished next? Leave us a comment telling us what your favorite is.

The not-so-fun stuff: Paperwork and government regulations

I'm a kernel programmer by training, so it's not that surprising that I found myself comparing the process of incorporating a U.S. non-profit with booting a kernel. You have to bootstrap from a couple of people with an idea for a non-profit to a legally registered corporation with strict oversight by a board of directors, with every step along the way properly authorized and recorded. It may not be the best analogy to explain how to found a non-profit, since most people don't know how the boot process works either, but since this is an article for LWN I can get away with it.

The non-profit/boot analogy goes thus: (1) file articles of incorporation (BIOS) and bylaws (bootloader), (2) take "action by incorporator" to appoint the board of directors (secondary bootloader), (3) board votes for standard "startup" motions (kernel initialization), then (4) board meets regularly to vote on new motions, elect new board members, and delegate tasks (servicing interrupts, running processes).

The "articles of incorporation" are paperwork you send to a state government declaring that you are a non-profit corporation. The articles of incorporation describe the ground rules of the corporation and don't change. The bylaws, which can change, are filed at the same time as the articles of incorporation and describe how the corporation is governed - stuff like how the board of directors is elected.

To me, the most obscure part of the bootstrapping process was the "action by incorporator." Sure, the bylaws say how your board of directors elects new directors, but how do you get your board in the first place? What happens is that the person who filed the articles of incorporation (me, in this case) writes down who they appoint to the board of directors, states they relinquish all rights as incorporator, and then signs and dates the document. Presto, the corporation now has a board of directors in complete control.

From there on out, everything is governed by votes by the board of directors. The board usually delegates a lot of stuff to the officers so it doesn't have to meet every time the hosting bill has to be paid. There is an initial set of standard motions that most corporations pass that is similar to kernel initialization, allowing the officers to do things like hire lawyers and buy liability insurance. After that, the board meets routinely and as-needed (which is like responding to timer ticks or servicing interrupts) to vote on new motions. We even have an equivalent of AppArmor or SELinux: We have to make detailed yearly reports to the U.S. tax service on our finances and management, beginning with filing an incredibly complex and expensive application for tax-exempt status.

The annoying stuff: Fundraising

Fundraising is a lot like funding a startup except that no one gets rich. We began in classic self-funded startup fashion: For 7 months we lived on our savings and part-time consulting work. We also had angel funders who trusted us enough to give us money on faith: Linux Australia, Puppet Labs, and the Ceph division of DreamHost. Next we raised a round of "seed funding": 100 donors of $512 or more in our Seed 100 round (actually, 103 because we couldn't close the drive fast enough). We've nearly used up our startup capital and have started our first general fundraising drive, open to both small individual donors and large corporate donors. If you like the work we're doing, and want to see things like Ada's Advice and First Patch Week become a reality, please donate now and tell your friends about us too!

We're still debating the long-term funding model for the Ada Initiative. Should companies who benefit financially from open source and open culture fund most of the Ada Initiative? Should we rely on lots of small individual donors like Wikimedia? Should we sell t-shirts? Tell us what you think in the comments!

Comments (64 posted)

2011 Linux and free software timeline - Q3

Here is LWN's fourteenth annual timeline of significant events in the Linux and free software world for the year.

We will be breaking the timeline up into quarters, and this is our report on July-September 2011. Next week, we will be put out the timeline for the last quarter of 2011.


This is version 0.8 of the 2011 timeline. There are almost certainly some errors or omissions; if you find any, please send them to timeline@lwn.net.

LWN subscribers have paid for the development of this timeline, along with previous timelines and the weekly editions. If you like what you see here, or elsewhere on the site, please consider subscribing to LWN.

For those with a nostalgic bent, our timeline index page has links to the previous thirteen timelines and some other retrospective articles going all the way back to 1998.

July

A backdoor is found in the vsftpd source code (LWN blurb).

Most well-adjusted people would not stand up in a crowd of people and start calling people around them idiots. Just because there is a monitor and a network cable separating you from the crowd doesn't make it ok, and I am tired of it.

-- Rasmus Lerdorf

[Open Hardware logo] CERN releases version 1.1 of its Open Hardware License (announcement).

Project Harmony releases version 1.0 of its contributor agreements (LWN blurb, agreements).

Nortel sells a huge pile of patents covering networking and lots more to a consortium made up of Apple, EMC, Ericsson, Microsoft, Research In Motion, and Sony. Google also unsuccessfully bid on the patents (Reuters article).

The VLC media player reports that companies are bundling it with adware/spyware, which is an increasing problem for free software projects (announcement, LWN article).

I am quite at ease not participating in netfilter/iptables anymore while the discussion about IPv6 NAT becomes an issue again: I always indicated "over my dead body", and now that I am no longer in charge, nobody will have to kill me ;)

-- Harald Welte

[CentOS logo]

CentOS 6.0 is released, eight months after RHEL 6 (announcement, release notes).

The realtime kernel tree moves to 3.0 after being based on 2.6.33 for a long time (3.0-rc7-rt0 announcement).

IBM promises to contribute the Symphony fork of OpenOffice.org (OOo) to the Apache OOo project (announcement).

Oracle acquires Ksplice, Inc., makers of the ksplice no-reboot kernel patching product (announcement, LWN article: Ksplice and CentOS).

As already mentioned several times, there are no special landmark features or incompatibilities related to the version number change, it's simply a way to drop an inconvenient numbering system in honor of twenty years of Linux.

-- Linus Torvalds announces 3.0

Linux 3.0 is released without any major changes that some might assume come with the move from 2.6.x (announcement, KernelNewbies summary, and Who wrote 3.0).

Mozilla announces the "Boot to Gecko" standalone operating system, which is based on Linux (announcement, LWN coverage).

Several versions of Emacs ship without all of the source code, which does not comply with the GPL, though the FSF itself is not violating the license (LWN coverage). [digiKam logo]

The digiKam software collection 2.0.0 is released; digiKam SC is a photo editor and related tools (announcement, LWN review).

KDE Software Compilation 4.7 is released (announcement).

DebConf 2011 is held July 24-30 in Banja Luka, Bosnia and Herzegovina.

August

[Desktop summit logo]

The second Desktop Summit is held in Berlin, Germany, August 6-12; it is a combination of GNOME's GUADEC and KDE's Akademy conferences (LWN coverage: Companies and open source, Copyright assignments, Desktop crypto consolidation, Service design, Plasma Active

Every time I get frustrated with doing paperwork, I simply imagine having the job of estimating how much time it takes to do paperwork, and I feel better immediately.

-- Valerie Aurora

Samba 3.6.0 is released (announcement).

Debian celebrates its 18th birthday, just two years younger than Linux itself (announcement).

Google announces its intent to acquire Motorola Mobility mostly for its patents it would seem (announcement).

The first release candidate of the Mozilla Public License 2.0 is released (announcement, an LWN look at the update process).

But if you want to be taken seriously as a researcher, you should publish your code! Without publication of your *code* research in your area cannot be reproduced by others, so it is not science.

-- Guido van Rossum

[LinuxCon NA logo]

LinuxCon North America is held August 17-19 in Vancouver, Canada and celebrates 20 years of Linux (LWN coverage: Clay Shirky on collaboration, Largest desktop Linux deployment, FreedomBox, x86 platform drivers, MeeGo architecture update, ConnMan, and Mobile Linux patent landscape).

COSCUP 2011 is held in Taipei, Taiwan August 20-21 (LWN coverage: Year of the Linux tablet?).

[xkcd password strength] A serious denial-of-service attack against Apache web servers is seen in the wild (announcement, LWN coverage).

HP announces it is dropping its webOS devices (press release).

The 20th anniversary of the first Linux post is August 25; the now-famous "just a hobby" post to comp.os.minix.

The Certificate Authority system as it stands today is a house of cards and we're witnessing in public what many have known for years in private. The entire system is soaked in petrol and waiting for a light.

-- Jacob Appelbaum

DigiNotar issues fraudulent SSL/TLS certificates for several domains including google.com in July, but it is discovered in August (LWN blurb and coverage).

The kernel.org server is found to be compromised; the compromise affects various Linux Foundation servers as well; it will take some time for things to get back to normal. (LWN coverage)

[Mandriva logo] Mandriva 2011 ("Hydrogen") is released (announcement, release notes).

September

The Linux Plumbers Conference is held in Santa Rosa, California, September 7-9 (LWN coverage: Development model diversity, Booting and systemd, Making the net go faster, Coping with hardware diversity, Bufferbloat update, and Control groups).

No developer ever thinks their change is going to break anything for anyone. It's the QA Law of What Could Possibly Go Wrong.

-- Adam Williamson

The Linux Security Summit is held with Plumbers (LWN coverage: LSM roundtable and Kernel hardening roundtable). [PostgreSQL logo]

PostgreSQL 9.1 is released (announcement, LWN article).

[Qt logo] The Qt Project is announced for more open governance of the free software UI toolkit (announcement).

Coherent vision isn't something that the kernel community really values.

-- Neil Brown

The openSUSE conference is held in Nürnberg, Germany September 11-14 (Conference wrap-up). [OpenShot logo]

The OpenShot video editor releases version 1.4 (announcement).

UEFI "secure boot" and Microsoft's mandate of it for Windows 8 hardware starts to concern free operating system developers (Matthew Garrett blog posts: Part 1, Part 2; LWN article).

Not spending as much time sitting in meetings and fighting with other vendors is one of the competitive advantages PostgreSQL development has vs. the "big guys". There needs to be a pretty serious problem with your process before adding bureaucracy to it is anything but a backwards move. And standardization tends to attract lots of paperwork. Last thing you want to be competing with a big company on is doing that sort of big company work.

-- Greg Smith

GNOME 3.2 is released (announcement, release notes).

[digiKam logo] PulseAudio 1.0 is released (announcement, release notes).

Tizen, the successor to MeeGo, is announced, which incorporates technology from the LiMo project; the announcement comes less than a month after Intel says it is "fully committed" to MeeGo (announcement, LWN coverage).

The Berlios code repository announces that it will shut down at the end of the year (announcement, LWN coverage).

Comments (6 posted)

Page editor: Jonathan Corbet

Security

A white paper on comparative browser security

December 14, 2011

This article was contributed by Nathan Willis

A paper was released in early December comparing the security designs of recent versions of Microsoft Internet Explorer, Mozilla Firefox, and Google Chrome, and concluded that Google Chrome was the "most secured against attack" — and Firefox the least. But Google sponsored the paper (by Denver-based security firm Accuvant), a fact that many in the trade press immediately latched onto as evidence that its contents were untrustworthy. It is always wise to take such reports with a heap of salt, but Google's funding alone does not mean that there is no interesting information in the report. Still, many of the headlines in recent days have glossed over some important details in the paper and its conclusions.

A careful reading of the paper shows it to be not a quantitative analysis of the various browsers' vulnerabilities (or lack thereof) to real-world attacks, but more of a feature-by-feature review of their respective security architectures. In other words, when the paper's conclusion calls Chrome the most secured, instead of the most secure, the distinction is important. The paper's premise is that the browser with the most "modern" security features is the best prepared to repel likely attacks, and it examines the three browsers against a list of specific features, namely sandboxing, just-in-time (JIT) compiler hardening, protection against malicious add-ons (plug-ins, extensions, and themes), and various low-level exploit-prevention measures (such as address space randomization).

The browsers scored equally well on the low-level exploit prevention measures, but Chrome's sandbox, add-on security, and JIT hardening were deemed "industry standards" while the other browsers' were not. Interestingly enough, the paper also includes sections on URL blacklisting, and a look at browsers' vulnerability-report and patch statistics over an 18-month period — statistics which the authors take pains to insist should not be used to draw conclusions.

Approaches, blacklists, and statistics

The paper, dated December 6, 2011, is entitled Browser Security Comparison: A Quantitative Approach. A summary is posted on the Accuvant blog, and includes a link to a separate page on which the full, 140-page PDF is available, along with a ZIP archive of the raw data and supporting tools.

The paper begins by making a case for the approach used — comparing the security design of the browsers tested — and follows up with an overview of the browsers' architectures. For the security feature comparison, the paper considers Google Chrome versions 12 (12.0.724.122) and 13 (13.0.782.218), Internet Explorer 9 (9.0.8112.16421), and Firefox 5 (5.0.1), all of which were examined in July 2011, on Microsoft Windows 7 (32-bit).

Next is a survey of security vulnerability statistics, collected and collated between January 2009 and June 2011 (which includes versions of Firefox from 2.0 to 5.0, versions of IE from IE6 to IE9, and all stable releases of Chrome). The paper makes four arguments that such statistics are unreliable. First, that vendor-advisories do not correspond one-to-one with vulnerabilities (which includes rolling multiple vulnerabilities into one advisory and unreported vulnerabilities). Second, that timeline information gleaned from advisory and patch publication dates does not accurately reflect when a vulnerability is caught and/or fixed (which includes a number of factors, from bug duplication to vulnerabilities that are discovered internally by Microsoft and unpublished). Third, that there are no generally-agreed-upon criteria for classifying the severity of vulnerabilities. Finally, the varying development models of the browser vendors make correlating vulnerability data across vendors difficult if not impossible (which includes patches to Windows that affect IE, and idiosyncrasies in the bug trackers used by both Firefox and Chrome).

Nevertheless, the authors follow up by reporting statistics for update frequencies, public vulnerability reports, vulnerabilities sorted by severity, and the average time between a vulnerability report and a published fix. The section makes several comments dissuading readers from inferring browser quality based on the numbers, such as "none of these pieces of information can be used to draw a security related conclusion" and "any conclusion drawn from the data is speculation and the data does not aid in discovering which browser is most secure." However, each of these comments comes immediately after a set of conclusions spelled out by the authors — such as Chrome being the most frequently updated browser, and Firefox having the most "critical" vulnerabilities. It is a puzzling approach: writing a conclusion, and then immediately disavowing it, but since the entire topic is deemed unreliable, too, perhaps this is a moot point.

The next section is a look at URL blacklist services, namely Microsoft's URL Reporting Service (URS) and Google's Safe Browsing List (SBL). The authors harvested active malware URLs from four web security sites, and queried both services. Over an eight-day stretch, they sampled a total of 47,682 URLs. Out of the 24,686 malware URLs which were still live when requested, URS and SBL each managed to block a scant 10%, with the remainder successfully slipping by.

Clearly, neither of the blacklist services performed well, but the data in this section of the paper is presented in a confusing manner. For example, in the pie chart which purports to show the portion of malware URLs blocked by the blacklist services (a graph reproduced in several news reports about the paper), the "unmatched URLs" pie-piece that takes up roughly 75% of the circle is labeled with the number from the total row of the chart. The pie-pieces showing URS and SBL's respective numbers of blocked URLs are also separate from each other, which implies that they had no URL-matches in common — a highly unlikely, albeit not impossible, event. Essentially, the pie pieces seem to come from two or three separate pies.

Security!

The next section defines the security features examined in each browser. The approach taken to assess the quality of each feature varied. First are the low-level exploit-prevention measures. This list includes Address Space Layout Randomization (ASLR), Data Execution Prevention (DEP), Stack Cookies (a buffer overflow prevention technique), and Structured Exception Handling (SEH; techniques to prevent exception handlers from executing hidden payloads of malware). The authors examined all of binaries loaded by the three browsers (including .EXEs and .DLLs of the browser itself as well as all of the Windows system .DLLs called by the browser) and checked for ASLR, DEP, Stack Cookie, and SEH compatibility.

Sandboxing receives the most attention. On Windows, Chrome and IE both take advantage of the OS's sandboxing functionality to limit each process's access to the filesystem, the network, Windows Registry data, other processes and threads, and various other system resources. Chrome and IE are both multi-process, providing separate processes for the UI, the rendering engine, and most individual tabs. Chrome and Firefox also run plug-ins in separate processes (though IE does not). Firefox, though, uses one process for the rest of the browser, and does not take advantage of Window's sandboxing features. It therefore receives the default "Medium" level security token from Windows.

That distinction is responsible for the bulk of the paper's criticism of Firefox; the analysis section examines each of the system components that are accessible to an un-sandboxed Firefox process and walled off from a sandboxed Chrome process in turn. The authors used the sandbox testing tool from the Chrome project to perform tests on each browser. Chrome does not hit every bullet point, however, allowing access to some system parameters and "Windows Hooks" on the authors' checklist. Firefox does not miss on every point, either, rather it receives mixed marks on many of the checklist items. IE falls somewhere in between.

JIT hardening is the next topic examined. The authors enumerate eight techniques for securing the JIT engine against malware: codebase alignment randomization, instruction alignment randomization, constant blinding, constant folding, memory page protection, resource constraints, executable memory allocation randomization, and memory guard pages. On this topic, the authors examined the source code, disassembled binaries, and ran test scripts against the JIT engines to check for each technique. IE received the most positive marks, with complete implementations of all the techniques except for additional randomization and guard pages, for which it was scored "technique was not necessary."

Chrome scored in the middle of the pack, without implementations for three of the eight techniques (codebase alignment randomization, instruction alignment randomization, and memory page protection). On the guard pages technique, though, Chrome received a check-mark with a footnote noting that the feature was implemented in Chrome 14 — which was not the version reported earlier as having been tested. Firefox did not receive any check-marks in this section, with the authors observing succinctly "Firefox does not implement any JIT hardening techniques."

The final section of the paper addresses the security measures protecting each browser against malicious add-ons. The authors identify a list of 19 possible security measures — including whether add-ons are subject to many of the sandboxing protections measured for the browsers themselves. It also includes user-facing techniques, such as displaying pre-install warnings, allowing automatic updates, and providing a user-controlled permission set for each add-on. The authors examined each browser with a mix of manual inspection (for user-visible techniques such as installation warnings) and repetition of the earlier sandbox tests.

Here the results are surprising considering what has come before; all of the browsers scored virtually the same, with mediocre add-on security. Chrome picked up one more checkbox than Firefox for its add-on permission model and a "partially-functioning" mark for its incomplete sandboxing. Both browsers received failing marks on eleven of the other criteria, including many sandboxing techniques that Chrome passed when examining the browser itself. IE, as always, scored in the middle, but it, too, failed to enforce many of sandboxing rules for add-ons that it enforced for browser processes itself. Nevertheless, in the paper's Executive Summary section, Chrome is given an "industry standard" check-mark, IE is given an "implemented" dot, and Firefox an "unimplemented or ineffective" X.

Two appendices follow; the first is an exploration of Chrome Frame, a plug-in for IE that uses Chrome as the page rendering and JavaScript engine. The authors examine how Chrome Frame operates and assess its potential security impact, concluding that it increases the attack surface of IE just like any other browser add-on. The second appendix is a lengthy (22-page) table of the low-level exploit-prevention measure test results for the browsers. Detailed test results for the other features examined are not included, although they are included in the downloadable data archive at the Accuvant site.

Is the perspective of the paper slanted?

Skeptics and Mozilla fans have every right to doubt the results of any Google-funded "research" that shows Chrome superior to other browsers — just as any skeptic should with vendor-funded research. After all, such research could be designed from the start to ensure a victory for Chrome, by examining only those features where Chrome outscores the competition. In that case, there is no need to fudge any numbers; the victor emerges naturally. Such a set-up was alleged by several Slashdot commenters (and hinted at by the story submitter) in the site's December 10 discussion of the paper.

Certainly the sandbox analysis could have been chosen to showcase one of Chrome's flagship features, but I would not conclude the same thing about the JIT hardening or add-on analysis sections, which did not show Chrome in nearly as favorable of a light. On the other hand, I simply do not think that I buy the paper's premise that running a checklist examination of the browsers results in what the authors call "a more accurate window into the vulnerabilities of each browser." Under the "Methodology Delta" section, the authors say:

Accuvant LABS' analysis is based on the premise that all software of sufficient complexity and an evolving code base will always have vulnerabilities. Anti-exploitation technology can reduce or eliminate the severity of a single vulnerability or an entire class of exploits. Thus, the software with the best anti-exploitation technologies is likely to be the most resistant to attack and is the most crucial consideration in browser security.

Perhaps that is a defensible position in theory, but what the paper examines is essentially the existence of these anti-exploitation features in the code base — it is hardly a "quantitative" approach as the title suggests. After all, the paper spends several pages asserting that real-world quantitative data on vulnerability reporting and patching can be "misleading" and "misappropriated." One could argue that a bug in the sandboxing code could single-handedly undermine a dozen of the check-marks that Chrome or IE received for implementing the features examined. A test performed in the lab may or may not catch such a bug, while real-world vulnerability reports — or attacks — are more likely to.

Regardless of how one feels about the approach taken by the paper, it is worth taking a look at because it has a different approach to measuring application security than do the bulk of other analyses. We can all agree that vulnerability statistics are often open to interpretation, so relying on them to measure the security of different applications is suspect — but many similarly targeted white papers do so. Accuvant has made an effort to analyze the security of these browsers in a different way, which is useful in its own right.

What a browser-maker might learn

Of course, weaknesses in the paper do not mean that Firefox should not consider a sandbox and multi-process design on all of its desktop platforms. It would clearly be more secure if it migrated to a model that included both, and if it implemented JIT hardening techniques, but those are hardly overnight changes.

At least the paper provides a survey of the attack surface addressed by Windows sandboxing and JIT hardening, which is valuable — both to browser vendors and to other developers. It is also interesting to note how many Windows system libraries are touched by each of the browsers, how ineffective URL blacklists are in practice, as well as how-and-where the security provided by the main browser breaks down when an add-on is installed. Skeptics may turn up their noses at Google's financing of the work, or at the methodology employed, but a detailed discussion of application security always makes for valuable reading.

Firefox's Johnathan Nightingale told Informationweek that Mozilla regards sandboxing as just one tool among many used to reduce security threats, "from platform-level features like address space randomization to internal systems like our layout frame poisoning system." He added that the browser-maker emphasizes security in the development process as well as in the code itself, highlighting code reviews, testing and analysis, and rapid responses to security issues.

As for the specifics touched on in the paper's comparison to Chrome's security architecture, Mozilla has been exploring a multi-process design for some time — but primarily out of an interest in speeding up Firefox's responsiveness. That work appears to have been back-burnered in favor of a set of smaller changes, including optimizations to the Places database and garbage collector. There are also Bugzilla issues tracking JIT hardening work, which does not include substantial architectural changes to Firefox.

The paper is a puzzling affair — parts of it contradict other parts, the URL blacklisting discussion is a tangent, and the conclusion seems to weigh some of the tests significantly more than others. But whatever else they may show, the public reaction to the paper since its release indicates that many Firefox users are interested in seeing that project push forward in these unaddressed areas.

Comments (none posted)

Brief items

Security quotes of the week

I thought Download.com would try to behave for at least a few days while the heat is on, but they're already back to distributing malware. The trojan installer now tries to install something called Drop Down Deals on your computer (screen shot). This handy application spies on all of your web traffic in order to pop up ads when you visit certain sites.
-- Fyodor of Nmap fame is still unhappy with Download.com

Merely restricting a printer to installing or running software signed by the manufacturer deprives the owner of both security and freedom. It might end one specific threat, but only at the much greater cost of leaving the printer's security policies under the manufacturer's control. The way to give printer owners real security—security from rogue print jobs and manufacturer antifeatures alike—is for them to have the freedom to study, modify, and replace the software the printer runs.
-- Brett Smith on the Free Software Foundation blog

Comments (4 posted)

Download.com "apologises" for bundling (The H)

The H reports that Download.com has apologized for bundling the Nmap scanner with an installer that does a lot more than just install Nmap (it changes the default search to Bing, installs toolbars, ...). "'The bundling of this software was a mistake on our part and we apologize to the user and developer communities for the unrest it caused' said [Download.com's Sean] Murphy, adding that the company had 'reviewed all open source files in our catalog to ensure none are being bundled'. Nmap has been removed from the download manager on Download.com, according to Murphy, and attempts to download it from the site will now send the user what appears to be an unmodified setup file for the network scanner." Nmap's Fyodor is maintaining a web page covering the "unrest".

Comments (10 posted)

New vulnerabilities

acpid: multiple vulnerabilities

Package(s):acpid CVE #(s):CVE-2011-2777 CVE-2011-4578
Created:December 9, 2011 Updated:August 17, 2012
Description:

From the Ubuntu advisory:

Oliver-Tobias Ripka discovered that an ACPI script incorrectly handled power button events. A local attacker could use this to execute arbitrary code, and possibly escalate privileges. (CVE-2011-2777)

Helmut Grohne and Michael Biebl discovered that ACPI scripts were executed with a permissive file mode creation mask (umask). A local attacker could read files and modify directories created by ACPI scripts that did not set a strict umask. (CVE-2011-4578)

Alerts:
Debian DSA-2362-1 2011-12-10
Ubuntu USN-1296-1 2011-12-08
Mageia MGASA-2012-0215 2012-08-12
Mageia MGASA-2012-0216 2012-08-12
Mandriva MDVSA-2012:137 2012-08-17
Mandriva MDVSA-2012:138 2012-08-17

Comments (none posted)

arora: certificate spoof

Package(s):arora CVE #(s):CVE-2011-3367
Created:December 13, 2011 Updated:August 20, 2012
Description: From the CVE entry:

Arora, possibly 0.11 and other versions, does not use a certain font when rendering certificate fields in a security dialog, which allows remote attackers to spoof the common name (CN) of a certificate via rich text.

Alerts:
Fedora FEDORA-2011-14756 2011-10-22
Fedora FEDORA-2011-14719 2011-10-22
Mageia MGASA-2012-0220 2012-08-18

Comments (none posted)

chasen: code execution

Package(s):chasen CVE #(s):CVE-2011-4000
Created:December 8, 2011 Updated:July 10, 2012
Description:

From the Debian advisory:

It was discovered that ChaSen, a Japanese morphological analysis system, contains a buffer overflow, potentially leading to arbitrary code execution in programs using the library.

Alerts:
Debian DSA-2361-1 2011-12-07
openSUSE openSUSE-SU-2012:0026-1 2012-01-05
Gentoo 201207-03 2012-07-09

Comments (none posted)

dhcp: denial of service

Package(s):dhcp CVE #(s):CVE-2011-4539
Created:December 8, 2011 Updated:January 3, 2012
Description:

From the Mandriva advisory:

dhcpd in ISC DHCP 4.x before 4.2.3-P1 and 4.1-ESV before 4.1-ESV-R4 does not properly handle regular expressions in dhcpd.conf, which allows remote attackers to cause a denial of service (daemon crash) via a crafted request packet (CVE-2011-4539).

Alerts:
CentOS CESA-2011:1819 2011-12-22
Oracle ELSA-2011-1819 2011-12-17
Scientific Linux SL-dhcp-20111214 2011-12-14
Ubuntu USN-1309-1 2011-12-15
Fedora FEDORA-2011-16981 2011-12-11
Red Hat RHSA-2011:1819-01 2011-12-14
openSUSE openSUSE-SU-2011:1318-1 2011-12-13
Mandriva MDVSA-2011:182 2011-12-08
Fedora FEDORA-2011-16976 2011-12-11
Debian DSA-2519-2 2012-08-04
Slackware SSA:2012-237-01 2012-08-24
Gentoo 201301-06 2013-01-09

Comments (none posted)

dovecot: certificate validation flaw

Package(s):dovecot CVE #(s):CVE-2011-4318
Created:December 9, 2011 Updated:February 21, 2013
Description:

From the Ubuntu advisory:

It was discovered that Dovecot incorrectly validated certificate hostnames when being used as a POP3 and IMAP proxy. If a remote attacker were able to perform a man-in-the-middle attack, this flaw could be exploited to view sensitive information.

Alerts:
Fedora FEDORA-2011-16234 2011-11-23
Fedora FEDORA-2011-16272 2011-11-23
Ubuntu USN-1295-1 2011-12-08
openSUSE openSUSE-SU-2012:0219-1 2012-02-09
Red Hat RHSA-2013:0520-02 2013-02-21
Oracle ELSA-2013-0520 2013-02-25
Scientific Linux SL-dove-20130304 2013-03-04
CentOS CESA-2013:0520 2013-03-09

Comments (none posted)

icu: code execution

Package(s):icu CVE #(s):CVE-2011-4599
Created:December 14, 2011 Updated:September 25, 2012
Description: ICU has a vulnerability wherein the opening of a crafted locale representation can cause a crash or code execution.
Alerts:
Mandriva MDVSA-2011:194 2011-12-27
CentOS CESA-2011:1815 2011-12-22
Oracle ELSA-2011-1815 2011-12-17
Scientific Linux SL-icu-20111213 2011-12-13
CentOS CESA-2011:1815 2011-12-14
Oracle ELSA-2011-1815 2011-12-14
Red Hat RHSA-2011:1815-01 2011-12-13
Fedora FEDORA-2011-17101 2011-12-14
Fedora FEDORA-2011-17119 2011-12-14
openSUSE openSUSE-SU-2012:0100-1 2012-01-19
Ubuntu USN-1348-1 2012-01-26
Debian DSA-2397-1 2012-01-29
Gentoo 201209-07 2012-09-24

Comments (none posted)

ipmitool: denial of service

Package(s):ipmitool CVE #(s):CVE-2011-4339
Created:December 14, 2011 Updated:January 3, 2012
Description: The PID file used by ipmitool is world-writable, allowing a local attacker to kill arbitrary processes on the system.
Alerts:
Debian DSA-2376-1 2011-12-30
Mandriva MDVSA-2011:196 2011-12-28
CentOS CESA-2011:1814 2011-12-22
Oracle ELSA-2011-1814 2011-12-17
Scientific Linux SL-ipmi-20111213 2011-12-13
Red Hat RHSA-2011:1814-01 2011-12-13
Debian DSA-2376-2 2011-12-31
Fedora FEDORA-2011-17065 2011-12-13
Fedora FEDORA-2011-17071 2011-12-13

Comments (none posted)

jasper: two code execution flaws

Package(s):jasper CVE #(s):CVE-2011-4516 CVE-2011-4517
Created:December 9, 2011 Updated:January 24, 2012
Description:

From the Red Hat advisory:

Two heap-based buffer overflow flaws were found in the way JasPer decoded JPEG 2000 compressed image files. An attacker could create a malicious JPEG 2000 compressed image file that, when opened, would cause applications that use JasPer (such as Nautilus) to crash or, potentially, execute arbitrary code. (CVE-2011-4516, CVE-2011-4517)

Alerts:
CentOS CESA-2011:1807 2011-12-22
Ubuntu USN-1315-1 2011-12-20
Oracle ELSA-2011-1807 2011-12-17
Scientific Linux SL-jasp-20111209 2011-12-09
openSUSE openSUSE-SU-2011:1328-1 2011-12-16
Mandriva MDVSA-2011:189 2011-12-16
Oracle ELSA-2011-1811 2011-12-13
Oracle ELSA-2011-1811 2011-12-13
SUSE SUSE-SU-2011:1317-2 2011-12-14
Scientific Linux SL-netp-20111212 2011-12-12
Debian DSA-2371-1 2011-12-24
CentOS CESA-2011:1811 2011-12-12
CentOS CESA-2011:1811 2011-12-12
Red Hat RHSA-2011:1811-01 2011-12-12
SUSE SUSE-SU-2011:1317-1 2011-12-12
Red Hat RHSA-2011:1807-01 2011-12-09
Fedora FEDORA-2011-16966 2011-12-11
Fedora FEDORA-2011-16955 2011-12-11
Ubuntu USN-1317-1 2012-01-04
Gentoo 201201-10 2012-01-23

Comments (none posted)

nova: file overwrite

Package(s):nova CVE #(s):CVE-2011-4596
Created:December 13, 2011 Updated:December 14, 2011
Description: From the Ubuntu advisory:

David Black discovered that Nova did not properly perform input validation during image registration. An attacker could exploit this by registering a crafted image using the EC2 API or S3/RegisterImage method and overwrite files as the nova user.

Alerts:
Ubuntu USN-1305-1 2011-12-13

Comments (none posted)

opera: multiple vulnerabilities

Package(s):opera CVE #(s):CVE-2011-4681 CVE-2011-4682 CVE-2011-4683 CVE-2011-4684 CVE-2011-4685 CVE-2011-4686 CVE-2011-4687
Created:December 9, 2011 Updated:December 14, 2011
Description: Evidently there are 7 flaws of one kind or another that were fixed in Opera. See the openSUSE advisory for more information.
Alerts:
openSUSE openSUSE-SU-2011:1314-1 2011-12-09
Gentoo 201206-03 2012-06-15

Comments (none posted)

php: denial of service and information disclosure

Package(s):php5 php CVE #(s):CVE-2011-4566
Created:December 14, 2011 Updated:April 13, 2012
Description: PHP incorrectly handles EXIF headers in JPEG files; an attacker could exploit this vulnerability to crash the PHP server or view (unspecified) sensitive information.
Alerts:
Ubuntu USN-1307-1 2011-12-14
Red Hat RHSA-2012:0019-01 2012-01-11
Mandriva MDVSA-2011:197 2011-12-30
CentOS CESA-2012:0019 2012-01-11
CentOS CESA-2012:0019 2012-01-11
Oracle ELSA-2012-0019 2012-01-12
Scientific Linux SL-NotF-20120112 2012-01-12
Oracle ELSA-2012-0019 2012-01-13
Red Hat RHSA-2012:0033-01 2012-01-18
CentOS CESA-2012:0033 2012-01-18
Oracle ELSA-2012-0033 2012-01-18
Scientific Linux SL-php-20120119 2012-01-19
Fedora FEDORA-2012-0504 2012-01-19
Fedora FEDORA-2012-0504 2012-01-19
Fedora FEDORA-2012-0504 2012-01-19
Fedora FEDORA-2012-0420 2012-01-26
Fedora FEDORA-2012-0420 2012-01-26
Fedora FEDORA-2012-0420 2012-01-26
Red Hat RHSA-2012:0071-01 2012-01-30
CentOS CESA-2012:0071 2012-01-30
Debian DSA-2399-1 2012-01-31
Oracle ELSA-2012-0071 2012-01-31
Scientific Linux SL-php-20120130 2012-01-30
openSUSE openSUSE-SU-2012:0426-1 2012-03-29
SUSE SUSE-SU-2012:0496-1 2012-04-12
Mandriva MDVSA-2012:071 2012-05-10
Oracle ELSA-2012-1046 2012-06-30
Gentoo 201209-03 2012-09-23

Comments (none posted)

pidgin: denial of service

Package(s):pidgin CVE #(s):CVE-2011-4601
Created:December 12, 2011 Updated:January 9, 2012
Description: From the Mandriva advisory:

When receiving various messages related to requesting or receiving authorization for adding a buddy to a buddy list, the oscar protocol plugin failed to validate that a piece of text was UTF-8. In some cases invalid UTF-8 data would lead to a crash.

Alerts:
CentOS CESA-2011:1821 2011-12-22
Oracle ELSA-2011-1821 2011-12-17
Scientific Linux SL-pidg-20111214 2011-12-14
Scientific Linux SL-pidg-20111214 2011-12-14
Oracle ELSA-2011-1820 2011-12-14
CentOS CESA-2011:1820 2011-12-14
CentOS CESA-2011:1820 2011-12-14
Red Hat RHSA-2011:1821-01 2011-12-14
Red Hat RHSA-2011:1820-01 2011-12-14
Mandriva MDVSA-2011:183 2011-12-10
Fedora FEDORA-2011-17558 2011-12-30
Fedora FEDORA-2011-17546 2011-12-30
openSUSE openSUSE-SU-2012:0066-1 2012-01-09
Ubuntu USN-1500-1 2012-07-09

Comments (none posted)

python-celery: privilege escalation

Package(s):python-celery CVE #(s):CVE-2011-4356
Created:December 12, 2011 Updated:December 14, 2011
Description: From the CVE entry:

Celery 2.1 and 2.2 before 2.2.8, 2.3 before 2.3.4, and 2.4 before 2.4.4 changes the effective id but not the real id during processing of the --uid and --gid arguments to celerybeat, celeryd_detach, celeryd-multi, and celeryev, which allows local users to gain privileges via vectors involving crafted code that is executed by the worker process.

Alerts:
Fedora FEDORA-2011-16549 2011-11-28
Fedora FEDORA-2011-16539 2011-11-28

Comments (none posted)

ykclient: Authentication bypass via NULL password

Package(s):ykclient CVE #(s):CVE-2011-4120
Created:December 13, 2011 Updated:December 14, 2011
Description: Pressing Ctrl-D when the Yubico PAM Module prompts for password will allow the user to log in without a password. See the Red Hat bugzilla for details.
Alerts:
Fedora FEDORA-2011-15580 2011-11-10
Fedora FEDORA-2011-15580 2011-11-10
Fedora FEDORA-2011-15580 2011-11-10

Comments (none posted)

zabbix: remote SQL command execution

Package(s):zabbix CVE #(s):CVE-2011-4674
Created:December 12, 2011 Updated:December 14, 2011
Description: From the CVE entry:

SQL injection vulnerability in popup.php in Zabbix 1.8.3 and 1.8.4, and possibly other versions before 1.8.9, allows remote attackers to execute arbitrary SQL commands via the only_hostid parameter.

Alerts:
Fedora FEDORA-2011-16712 2011-12-04
Fedora FEDORA-2011-16745 2011-12-04

Comments (none posted)

Page editor: Jake Edge

Kernel development

Brief items

Kernel release status

The current development kernel is 3.2-rc5, released on December 9. "It's been a bit over a week, and I'm sad to report that -rc5 is bigger (at least in number of commits - most of the commits are pretty small, so it's possible that the *diff* ends up being smaller, but I didn't check) than both -rc2 and -rc4 were. So much for 'calming down'." 355 changes have been merged since -rc4, indeed bigger than -rc2 (280) and -rc4 (207) but smaller than -rc3 (412). All told, there have been 1,254 changes since -rc1, which, at a bit over 10% of the total, is actually relatively small.

Stable updates: the 2.6.32.50, 3.0.13, and 3.1.5 stable updates were released on December 9. All three contain the usual long list of important fixes.

Comments (none posted)

Quotes of the week

I've started using Gnome 3.2 on my main desktop 4 weeks ago, and while I came into it with prejudice and expected a rough ride, everything is surprisingly nice so far.

It's in fact the best (read: most usable, most intuitive) Linux desktop I've ever used for kernel development and maintenance work-flows. It gets out my way, tries to be there when I need it and takes usage ergonomy and UI consistency as seriously as Apple and Google does. Kudos.

-- Ingo Molnar

Your next patch series better come with perfectly spelled changelog entries that actually describe what the patches do, numbered properly (none of this 30/30 crap after a 00/29 series), not break the build (is that asking too much?) apply with no fuzz, and to help it all out, a home made holiday bread of your choosing, as long as it's fresh, and does not contain dried fruit bits (soaked in liquor can't hurt either.)
-- Greg Kroah-Hartman

Comments (14 posted)

Bernat: Tuning the Linux IPv4 route cache

Vincent Bernat has posted a lengthy description of how the IPv4 routing cache works and how to tune it for best results. "Once an entry has been added to the route cache, there are several ways to remove it. Most entries are removed by the garbage collector which will scan the route cache and remove invalid and older entries. It will be triggered when the route cache is full or at regular interval, once a certain threshold has been met." (Thanks to Paul Wise).

Comments (2 posted)

Kernel development news

Fixing the symlink race problem

By Jake Edge
December 14, 2011

The problems with symbolic link race conditions have existed for decades, been well understood in that time, and developers have been given clear guidelines on how to avoid them. Yet they still persist, with new vulnerabilities discovered regularly. There is also a known way to avoid most of the problems by changing the kernel—something that has been done for many years in grsecurity and Openwall—but it has never made its way into the mainline. While kernel hackers are understandably unenthusiastic about working around buggy user-space programs in the kernel, this particular problem is severe enough that it probably makes sense to do so. It would seem that we are seeing some movement on that front.

The basic problem is a time-to-check-to-time-of-use (TOCTTOU) flaw. Buggy applications will look for the existence and/or characteristics of temporary files before opening them. An attacker can exploit the flaw by changing the file (often by making a symlink) in between the check and the open(). If the program with the flaw has elevated privileges (e.g. setuid), and the attacker replaces the file with a symlink to a system file, serious problems can result.

The bug generally happens in shared, world-writable directories that have the "sticky" bit set (like /tmp). The sticky bit on a directory is set to prevent users from deleting other users' files. So, the fix restricts the ability to follow symlinks in sticky directories. In particular, a symlink is only followed if it is owned by the follower or if the directory and symlink have the same owner. That solves much of the symlink race problem without breaking any known applications.

We looked at patches to restrict the following of symlinks in sticky directories in June 2010. Since that time, there has been a two-pronged approach, championed by Kees Cook, to try to get the code into the mainline. The first is the Yama LSM, which is meant to collect up extensions to the Linux discretionary access control (DAC) model. But it runs afoul of the usual problem for specialized LSMs: the inability to stack LSMs.

Cook and others would clearly prefer to see the symlink changes go into the core VFS code, rather than via an LSM, but there has been a push by some to keep it out of the core. There was discussion of Yama and its symlink protections at the Linux Security Summit LSM roundtable, where the plan to push Yama as a DAC enhancement LSM was hatched. That may well be a way forward, but Cook has also posted a patch set that would put the symlink restrictions into fs/namei.c.

The latter patch attracted some interesting comments that would seem to indicate that Ingo Molnar and Linus Torvalds, at least, see value in closing the hole. None of the VFS developers have weighed in on this iteration, but Cook notes that the patch reflects feedback from Al Viro, which could be seen as a sign that he's not completely opposed. Molnar was particularly unhappy that the hole still exists:

Ugh - and people continue to get exploited from a preventable, fixable and already fixed VFS design flaw.

Molnar also had some questions about the implementation, including whether the PROTECTED_STICKY_SYMLINKS kernel configuration parameter should default to 'yes', but was overall very interested in seeing the patch move forward. Torvalds had a somewhat different take, "Ugh. I really dislike the implementation.", but suggested a different mechanism to try to solve the underlying problem by using the permission bits on the symlink. His argument is that Cook's approach is not very "polite" because it is hidden away, so it turns into:

some kind of hacky run-time random behavior depending on some invisible config option that people aren't even aware of.

As Cook points out, though, Torvalds's approach has its own set of "weird hidden behaviors". Torvalds admittedly had not thought his proposal through completely, but it does show an interest in seeing the problem solved. From Cook's perspective, the changes he is proposing simply change the behavior of sticky directories with respect to symlinks, whereas Torvalds's would have wider-ranging effects on symlink creation. Either might do the job, but Cook's solution does have an advantage in that the proposed changes have been shaken out for years in grsecurity and Openwall, as well as in Ubuntu more recently.

Given that several high-profile kernel hackers seem to be in favor of fixing the problem—Ted Ts'o was also favorably disposed to a fix back in 2010—the winds may have shifted in favor of the core VFS approach. If Viro and the other VFS developers aren't completely unhappy with it, we could see it in 3.4 or so.

If that were to happen, there is another related patch that would presumably also be pushed for mainline inclusion: hard link restrictions. That, like the symlink change, currently lives in Yama, though the case can be made that it should also be done in the core VFS code. That patch would disallow the creation of hard links to files that are inaccessible (neither readable nor writable) to the user making the link. It also disallows hard links to setuid and setgid files. That would close some further holes in the symlink race vulnerability, as well as fix some other application vulnerabilities.

Should both the symlink and hard link restrictions make their way into the VFS core, that would only leave the ptrace() restrictions in Yama. Those restrictions allow administrators to disallow a process from using ptrace() on anything other than its descendants (unless it has the CAP_SYS_PTRACE capability). Currently, any process can trace any other running under the same UID, so a compromise in one running program could lead to disclosing credentials and other sensitive information from another running program. There may also be other DAC enhancements that Cook or others are interested in adding to Yama in the future.

One way or another, the problem is severe enough that there should, at least, be a way for distributors or administrators to thwart these common vulnerabilities. Whether the fix lives in VFS or an LSM, providing an option to turn off a whole class of application flaws—which can often lead to system compromise—seems worth doing. Hopefully we are seeing movement in that direction.

Comments (11 posted)

LTTng rejection, next generation

By Jonathan Corbet
December 14, 2011
The story of tracing in the Linux kernel sometimes seems to resemble a bad multi-season TV soap opera. We have no end of strong characters, plot twists, independent story lines, recurring themes, and conflicting agendas. The cast changes slowly over time, but things never seem to come to any sort of satisfying conclusion. Those watching the show might be forgiven for thinking that one of those story lines might be about to be wrapped up when the LTTng tracing system was pulled into the staging tree for the 3.3 merge window. But they should have known that they were just being set up for another sad twist in the plot.

LTTng descends from some of the earliest dynamic tracing work done for Linux. Its distinguishing characteristics include integrated kernel- and user-space tracing, performance sufficient to deal with high-bandwidth event streams, and a well-developed set of capture and analysis tools. LTTng has always been maintained out of the mainline kernel tree, but it is packaged by a number of distributors and has base of dedicated users, some of whom have been happy to fund ongoing LTTng development work.

Had LTTng been merged years ago, the story may have been much simpler, but, for a number of reasons (including the simple fact that, for years, any sort of tracing capability was hard to sell to the kernel development community) that did not happen. So we have ended up with a number of projects in this area, including SystemTap (which also remains out-of-tree), and the in-tree ftrace and perf subsystems. Naturally, none of these solutions has proved entirely satisfactory so, while there has been a fair amount of pressure to consolidate the various tracing projects, that has tended not to happen.

That is not to say that there has been no progress at all. Some agreement has been reached on the format of tracepoints themselves; much of the work in that area was done by primary LTTng maintainer Mathieu Desnoyers. As a result, the number of tracepoints in the kernel has been growing rapidly, making kernel operations more visible to users in a number of ways. A lot of talk about merging more infrastructure has been heard over the years - said talk was often audible from a great distance at various conferences - but progress has been minimal. It seems to be easy for developers in this area to get bogged down on the details of ring buffers, event formats, and so on at the expense of producing an actual, usable solution.

To Mathieu, merging into the staging tree must have looked like an attractive way to push things forward. The relaxed rules for that tree would allow the code into the mainline where its visibility would increase, any remaining issues could be fixed up, and more users could be found. It all seemed to be working - some cleanup patches from new developers were posted - until Mathieu tried to add exports for some core kernel symbols so LTTng could access them. That attracted the attention of the core kernel developers who, to put it gently, were not impressed with what they saw.

In the end, Ingo Molnar vetoed the whole patch series and asked Greg Kroah-Hartman to remove the LTTng code from staging. Greg complied with that request, with the result that LTTng is, once again, no closer to merging into the mainline than it was before. This particular story line, it seems, has at least one more season to run yet.

What is it about LTTng that makes it unsuitable for merging into the mainline? It starts with a lot of duplicated infrastructure. Inevitably, LTTng brings in its own ring buffer to communicate events to user space, despite the fact that the two ring buffers used by perf and ftrace are already seen as being too many. There is a new instrumentation mechanism for system calls - something that the kernel already has. And, of course, there is a new user-space ABI to control all of this - again an unwelcome addition when there is strong pressure from some directions to merge the existing in-kernel tracing ABIs.

Duplicated infrastructure always tends to be hard to merge into the mainline; duplicated user-space ABIs, which must be supported forever, are even more so. It is thus not surprising that there is pushback against these patches, even without considering the highly contentious nature of the discussion around tracing work in general. Ingo claims to be receptive to merging the parts of LTTng that are better than what the kernel has now - after it has been unified with the existing infrastructure, of course - but, he says, Mathieu has been more interested in maintaining LTTng as a separate "brand" and has been unwilling to merge things in this way.

Mathieu's response has not done much to address those concerns. Duplicate infrastructure, he said, is fine as long as there is no agreement on how that infrastructure should work. Thus, he said, it is better to get his ring buffer into the mainline and to try to work out the differences there. He takes a similar approach to the ABI:

Interfaces to user-space: very much like filesystems, these ABIs don't need to be shared across projects that have different use-cases. Having multiple tracer ABIs, if self-contained, should not hurt anybody and just increase the rate of innovation.

The points that are missed here are that (1) filesystems do, in fact, share the same ABI, and (2) there is indeed a cost to multiple ABIs for tracing. Those ABIs have to be maintained indefinitely and they fragment the efforts of tool developers who find themselves forced to choose one or the other. Unless he can produce a convincing proof that the existing kernel interfaces cannot possibly be extended to meet LTTng's needs, Mathieu will almost certainly not succeed in getting a new tracing ABI into the mainline.

Two notable conclusions were reached at the 2011 Kernel Summit. One was that maintainers should say "no" more often and accept fewer new features into the mainline; that would argue that Ingo and others are right to block the addition of LTTng in its current form. But the other conclusion was that code that has been shipped for years and that has real users should be strongly considered for merging even if it has known technical shortcomings. That, of course, would argue for merging LTTng, which certainly meets those conditions. Given the players involved, that conflict seems almost certain to be resolved with LTTng remaining an active project out of the mainline. Tune in next year for another episode of "As the tracing world turns."

Comments (14 posted)

Vtunerc and software acceptance politics

By Jonathan Corbet
December 14, 2011
The kernel development process prides itself on being driven exclusively by technical concerns. Ideally, all decisions with regard to the merging of code would be based on whether that code makes technical sense or not; decisions based on "political" concerns are seen as being rather less ideal. But, as a recent discussion shows, even a seemingly "political" decision can have technical reasoning behind it.

In June 2011, Honza Petrous posted a patch to the linux-media list containing an implementation of a virtual DVB (digital video broadcast) device driver. DVB drivers normally talk to devices that tune in and capture video streams - television tuners, in other words. Honza's "vtunerc" driver, instead, drives no physical hardware at all. Instead, it serves as a sort of loopback device. One side looks like a normal DVB device; it handles all the usual DVB system calls. The other side, which shows up as /dev/vtunercN, passes a processed form of those DVB system calls back to user space. The intended use is for a user-space process to receive those operations and pass them to a remote peer elsewhere on the network; that peer would then perform the operations on a real DVB device. Using this mechanism, DVB devices could be hosted on a network in a manner that is entirely transparent to DVB applications. Honza has posted a diagram showing how the pieces fit together.

Virtual devices of this type are not unprecedented in the Linux (and Unix) tradition; the venerable virtual terminal devices work in much the same way. This type of mechanism is also sometimes used to make devices available within virtualized guest systems. But this patch was not accepted into the DVB subsystem for a number of reasons, one of which being that it would facilitate the creation of proprietary user-space drivers for DVB devices. That was the reason Honza picked up on when he went to the linux-kernel list in an attempt to gain support in November, saying that, while he didn't discount the possibility of "bad guys" abusing the interface to create closed-source drivers, he was not convinced that it justified the "aggressive NACK" the code received.

As the subsequent discussion made clear, some developers do, indeed, believe that the potential for abuse in this way is sufficient reason to keep an interface out of the mainline kernel. That is the same reasoning that has, for example, blocked the merging of graphics drivers that have proprietary user-space components. But it also turns out that there is rather more than that to this particular decision. Reasons for keeping vtunerc out include:

  • The same ABI that enables proprietary drivers also exposes a fair amount of internal information about the DVB layer. That ABI would have to remain unchanged even as DVB evolves, leading to maintenance burdens in the future.

  • There appears to be little advantage to routing all that video data through the kernel and immediately back to user space; it would make more sense for DVB applications to use a network video protocol directly and avoid the cost of routing data through the kernel.

  • DVB applications tend to work with tight timing constraints. Adding a network connection into the mix will create latencies that may well confuse these applications. Working across a network requires a different approach than talking to a device directly; operations that may be done synchronously on a local device may need to happen asynchronously with a remote device. By hiding the network link, vtunerc makes it impossible for applications to drive the device appropriately.

  • If the creation of this type of loopback device absolutely cannot be avoided, it can be done with the CUSE (char drivers in user space) interface instead of adding a new ABI.

In the discussion, it seems that much of the motivation for vtunerc comes from the fact that it would require no changes to applications at all, while a user-space approach might require such changes. In fact, it seems that there is a political problem at that level: the maintainer of the Video Disk Recorder (VDR) tool is evidently uninterested in adding real network client support. Needless to say, adding an interface to the kernel to get around an uncooperative application maintainer is not an idea that gains a lot of sympathy on the kernel side.

It is easy to see politics in decisions that do not go one's way. As the old saying goes: just because you're paranoid doesn't mean that they aren't out to get you; in some cases non-technical agendas almost certainly play a part. But it may also be that the proposed code simply is not acceptable in its current form and needs work. Going back to the mailing lists and crying "politics" is an almost certain way to turn it into a political issue, though, and with an almost certainly undesirable result.

Comments (3 posted)

Patches and updates

Kernel trees

Core kernel code

Device drivers

Documentation

Filesystems and block I/O

Memory management

Networking

Architecture-specific

Virtualization and containers

Miscellaneous

Page editor: Jonathan Corbet

Distributions

WebOS reborn?

By Jonathan Corbet
December 12, 2011
On December 9, HP ended a long period of rumors and speculation with the announcement that it would release the code for its webOS platform under an open-source license. Very little information beyond that brief press release is available, so the net has duly responded with lots more speculation. To some, webOS is about to start a new and better life; to others, this announcement is the last gasp of a dying product. Never one to let a prime handwaving opportunity to pass unexploited, your editor has written some thoughts of his own.

In many ways, the mobile device market at the end of 2011 is in far better shape than many of us would have ever dared to hope for. Powerful handsets running Linux are ubiquitous, relatively cheap, and, in many cases, mostly open to hacking by their users. A great deal of creativity has been unleashed on both the hardware and software sides, to everybody's benefit. Linux has become the system for the bulk of these devices, and the companies that make them are getting better at contributing their work upstream. In many parts of the kernel, the old problem of missing hardware support has been replaced by the problem of dealing with the massive amounts of code contributed by manufacturers. That is, as they say, a high-quality problem; in many ways, life is good.

Naturally, things could be better for everybody involved. The bulk of those devices are running Android, which falls somewhat short of what many of us would like to see in an open-source project. The direction of the project is closely controlled by Google, source releases have been delayed and withheld (is it truly "open source" if one cannot get the source for the code running on one's device?), and some companies have better access to the source than others. Manufacturers have reason to dislike depending on Google for access, and they worry about being relegated to the commodity side of the business. As has been written here before, Android is a huge and valuable gift, but we can acknowledge that gift while still wishing for something a little better.

The dominant players in this market (handsets and tablets, mainly) are Apple's iOS and Android. The former is not available to other manufacturers, leaving them with a single choice for their operating software. In such a situation, there should certainly be room for another contender. Microsoft might yet fill that space with Windows 8 Mobile, but it does not have to be that way; Microsoft has always struggled in this market. Wouldn't it be nice if another Linux-based system could establish itself as a major mobile platform instead?

There is no shortage of alternatives in this area. Tizen announced itself as a sort of successor to MeeGo in September, but almost nothing has been heard from this project since. Various developers are trying to keep a MeeGo derivative alive as a community-driven project; the resulting "Nemo" project has made a few releases for the N900, but does not appear to be progressing quickly. The GNOME project has its eyes on this market with its "GNOME OS" concept based on GNOME 3; KDE's "Plasma Active" has very similar goals. Canonical, too, has ambitions in the mobile arena. Most of these projects have actively been looking for manufacturers to ship their software, but there are not a lot of high-profile results to point to thus far. Will webOS do better?

Those who have expressed pessimistic views certainly have their reasons for doing so. As noted above, Linux-based mobile platforms are not in especially short supply; webOS is a late addition to a crowded field. Given the time that has passed since HP abandoned its webOS plans and the rumors that went around, it seems certain that HP tried, unsuccessfully, to find a buyer for webOS before deciding to open-source it. If nobody wanted to own the system before, what are the chances that they will want to use an open-source version in the future?

That said, webOS is a system with a history of shipping in real products and with a core of enthusiastic users; the alternatives have neither of those. Given code, developers, users, and space in the market, it should be possible for a system like webOS to establish itself and prosper. Getting there, though, will depend on a number of things.

One of those, obviously, is the code itself. Is the quality of the code such that the community can pick it up and carry it forward without a huge amount of cleanup work? Will all of the code be released, or will it be necessary to find or create alternatives for pieces that have been withheld? And, crucially, how long will it take for the code to appear? Every day that passes between now and the code release will decrease the relevance of the whole exercise. If HP wants webOS to succeed, it needs to get the code out there quickly.

Then, there is the quality of HP's management of the project. The press release promises "good, transparent and inclusive governance to avoid fragmentation," which can mean almost anything. "Avoid fragmentation," alas, is often a euphemism for "maintain a firm grip on the project and where it can go." If, instead, HP were to create a structure that gave up some control and showed faith in the community it hopes to build, it could find itself with a crowd of enthusiastic contributors. That said, HP needs to remain at the forefront of that crowd for some time; it will be hard to convince others to contribute to webOS if HP stops doing so. Licensing, too, is a clear concern; some licenses are rather more attractive to contributors than others. HP has not yet said which license it will use, or whether copyright assignments will be required to contribute to the project.

Finally, the code is of limited interest without useful devices to run it on. Google has made a point of ensuring the existence of unlocked devices and making it easy for developers to get their hands on those devices. HP would be wise to emulate this example if it wants to developers to hack on - and improve - the code.

In summary: webOS has a real chance as an open-source project if HP manages things correctly and gets the code out there quickly. There is an existing code base, room in the market, a desire for alternatives, and a group of ready customers. That is far more than most projects have at their launch. The open-source version of webOS has a hard road ahead of it with many challenges to overcome but, with some luck and careful management, there is a real possibility for interesting things to happen.

Comments (51 posted)

Brief items

CentOS 6.1 released

The CentOS 6.1 release is now available; there have also been announcements for minimal, live CD, and live DVD variants. Information on 6.1 can be found in the release notes.

Comments (13 posted)

HP to open-source webOS

HP has announced that it will contribute webOS to the open source community. "HP will engage the open source community to help define the charter of the open source project under a set of operating principles: The goal of the project is to accelerate the open development of the webOS platform; HP will be an active participant and investor in the project; Good, transparent and inclusive governance to avoid fragmentation; Software will be provided as a pure open source project." Details beyond that are scarce at the moment.

Comments (14 posted)

Linux Mint switches Banshee's Amazon MP3 store referral code

In something of a reprise of the February incident where Canonical switched Banshee's Amazon MP3 store referral code so that it could collect the revenue (and share 30% 25% of that with Banshee), Linux Mint has now done much the same thing. First reported by OMG! Ubuntu!, it has since been confirmed by Linux Mint lead Clement Lefebvre. So far, the revenue ($3.41) has been negligible, but he seems willing to negotiate a revenue share should that change: "Now, should we share the $3.41/month with Banshee? We could. With Ubuntu? Why not. They're both upstream to us and they're both important to us. If we agree with them on how to share, then it might happen, whether they keep control and share with us, or we keep control and share with them. What's for sure though, is that for this kind of revenue, not a lot of time is going to be spent in negotiations."

Comments (27 posted)

Distribution News

Debian GNU/Linux

Bits from the DPL for November 2011

In the November bits from the Debian Project Leader Stefano "Zack" Zacchiroli covers a Call for Help: press/publicity team, various interviews, so financial information, some legal advice from SPI lawyers, Debian's relationship with other organizations and a few other topics.

Full Story (comments: none)

Fedora

Appointments to the Fedora Board

Jared K. Smith has announced that David Nalley has been appointed to serve another one-year term on the Fedora Board.

Full Story (comments: none)

Fedora 14 End of Life

Fedora 14 has reached its end-of-life. No updates, including security updates, will be available for Fedora 14.

Full Story (comments: none)

Newsletters and articles of interest

Distribution newsletters

Comments (none posted)

Page editor: Rebecca Sobol

Development

Xxxterm: Surfing like a Vim pro

December 14, 2011

This article was contributed by Koen Vervloesem

Modern web browsers provide more and more functionality, so it won't be a surprise that new lightweight web browsers crop up from time to time to please users who prefer a "back to the basics" approach. In April 2009, Arch Linux release engineer Dieter Plaetinck announced Uzbl, a refreshingly minimalist web browser that prides itself for following the UNIX philosophy (LWN looked at Uzbl in July 2009). In August 2010, OpenBSD developer Marco Peereboom published the initial release of xxxterm, a lightweight and secure web browser with a vi-like command-line interface for heavy keyboard users.

The name xxxterm comes from xterm but with a triple "x" as a reference to www. Xxxterm was initially developed for OpenBSD, but it was later ported to Linux, and it's available in the repositories of Debian Sid, Gentoo, Arch Linux, and FreeBSD. It uses the WebKit browser engine and its source code is published under the ISC license, which is a permissive free software license written by the Internet Systems Consortium. It's equivalent to the two-clause BSD license and is the preferred license for OpenBSD.

In a wiki page entitled "XXXTerm Rationale", Peereboom explains why he wrote xxxterm. First and foremost, he noticed that Firefox became slower and slower, and second, he was an avid Vim user and wanted the same level of keyboard control in his web browser. So he tried a bunch of Vim-like web browsers, but none of them had the right mix of features for Peereboom. So he began tinkering with WebKit and after a few hours he had a working minimal web browser which eventually became xxxterm.

At first sight, xxxterm looks like a regular web browser, although with a somewhat boring layout: at the top of the window it has an address bar with back, forward, stop, and go buttons to the left and a search bar to the right. You can use xxxterm like any browser: Ctrl-t opens a new tab, and you can use the mouse to switch between tabs, follow links, and so on. Thanks to its WebKit engine, it has no problem rendering modern web sites, including HTML video and Flash (using the Adobe Flash plugin). However, the real beauty is that xxxterm allows fully mouse-less browsing by offering its Vim-like command-line mode.

Browser commands

Before you begin with the commands, you first have to know something about focus: F6 focuses on the address bar, F7 on the search bar, "i" on the default page input and Esc removes the focus. The latter two shortcut keys are not accidentally the same as for entering input mode and command mode in Vim. In command mode, you can use search commands like "/" (search forward), "?" (search backward), "n" (next item) and "N" (previous item). "0" (zero) scrolls the page to the far left and "$" to the far right, while "gg" goes to the top of the page and "G" to the end. A Vim user will probably know by now what actions are performed by the shortcut keys "j", "k", "h", and "l" ...

[Xxxterm tabs]

If you want to use xxxterm without touching the mouse at all, just press "f", after which the browser highlights all links and prefixes them with a number. Entering a number will follow the corresponding link. Switching to another tab without the mouse is equally easy: type :ls in command mode, which lists all the tabs in a drop down menu, and type the tab number or navigate the menu with the arrow keys.

The number of available commands is quite large, and fortunately the command mode has tab completion, so you can discover a lot of these commands yourself. For a full list, have a look at the man page. There are commands for session saving, plug-ins, tabs, and so on. A convenient, and for Vim users quite natural, command is :wq, which saves all open tabs and quits the web browser. All tabs will be reopened next time you start xxxterm.

Another interesting feature is that you can execute arbitrary commands in your running xxxterm instance by running xxxterm -e <command> in your terminal. This requires the enable_socket option to be enabled in the configuration file, but after this, you are able to control your browser session from outside the browser. For instance, xxxterm -e tabnew example.com opens example.com in a new tab and xxxterm -e wq closes your running xxxterm instance. So if your terminal emulator, email reader, or RSS feed reader supports custom commands to open links in a web browser, this is how you could configure them to use xxxterm.

Security and privacy

Xxxterm is also meant to be a secure web browser, and this is visible in features like the ability to control cookies, plug-ins, and JavaScript policies on a per-website basis. For each of these security risks, the user can define whitelists of which trusted web sites are allowed to use them. For instance, you can permanently whitelist the use of cookies on your current site with the command :cookie save or permanently whitelist the use of JavaScript with the command :js save. However, by default xxxterm behaves like any other browser, so to be able to use the whitelists, you have to place "browser_mode = whitelist" at the top of the .xxxterm.conf configuration file. The man page explains the details of what you can whitelist and how.

[Xxxterm cookie jar]

The xxxterm wiki page also mentions that many web sites not only track visitors by cookies, but also by embedding links with host names that require a DNS lookup. Because many web browsers have DNS prefetch enabled by default, your browser does all these DNS lookups whether you visit these other sites or not. This is even made worse by browsers performing link prefetching: with this feature enabled, they download the pages referenced by links with the rel="prefetch" attribute on the current page. Xxxterm has DNS and link prefetching disabled by default to thwart these web tracking techniques, which may be a bit far-fetched, but xxxterm prefers to be on the safe side.

At first sight, the prospective xxxterm user will search in vain for an ad blocker, which is strange for a web browser that prides itself on security and privacy. However, this feature is intentionally missing: the developers recommend using AdSuck, a special-purpose DNS server that can blacklist addresses belonging to advertisers, thus preventing the browser from ever connecting to the advertisers' sites. AdSuck, too, was created by Peereboom. This approach actually makes sense: it makes sure that ads and other unwanted content never make it into the browser, and as a side effect the browser becomes a bit more responsive.

The focus on security is also visible in some small things. For instance, many other web browsers have been treating non-URLs in the address bar as a search string. This is quite a convenient feature, but the developers of xxxterm have intentionally disabled it by default, because otherwise accidentally pasting a password or any other private information into the address bar would send it to a search engine. Another nice thing for the privacy-conscious is that the default search engine in the search bar is Scroogle, a web site that uses Google but disguises your IP address so Google can't track your search terms.

Comparison

Compared with other minimalist web browsers, xxxterm has clearly found its own niche. With its focus on security, it's not surprising that it's quite popular in the OpenBSD community. However, it's not the only minimalist web browser. We already mentioned Uzbl, which is much more flexible than xxxterm, but requires the user to write a lot of external shell scripts. Uzbl is actually more of a personal web browser building kit. A project with a similar approach is Luakit, which the developers call a "browser from scratch", because the user creates the entire interface in a rc.lua configuration file. There's also surf, an extremely minimalist web browser from the suckless project which just displays web pages and follows links, but doesn't even support tabbed browsing. The philosophy behind it is that tabs are not meant to be a feature of the browser but of the window manager, and hence it's a natural companion of a window manager like dwm (another project under the suckless umbrella) or awesome. However, compared with Uzbl, Luakit, and surf (which are all based on WebKit), xxxterm seems to have found a sweet spot between minimalism and usability.

It takes some time to get used to xxxterm, because the interface is really minimalist: there are no menus, and the context menu only has back, forward, stop, and reload actions. Settings like the HTTP proxy or the CA file to validate SSL certificates have to be changed in the configuration file, and the user has to memorize commands for a lot of the non-browsing tasks. However, if you are a Vim user, you already have a head start for a lot of the default shortcut keys, and under the surface of this minimalist web browser lies a surprising amount of functionality. If you're a heavy keyboard user and looking for a web browser focused on security, xxxterm is definitely something to try out.

Comments (14 posted)

Brief items

Quotes of the week

The overhead of formatting a patch properly is trivial. Getting a patch set into thunderbird or the web so totally dwarfs the tedium of actually creating the patch, it's unbelievable.
-- Dave Täht

For me, if I had to design a new language today, I would probably use braces, not because they're better than whitespace, but because pretty much every other language uses them, and there are more interesting concepts to distinguish a new language.
-- Guido van Rossum

I've mentioned this before, and I keep getting back to it: With all the great work that has been put into OsmocomBB, we are "at an arms lengh" away from being able to create a true Free Software mobile phone.

We already have the hardware drivers, protocol stack and even the 'mobile' program which can be used for making and receiving voice calls and sending/receiving SMS text messages on real GSM networks!

While the journey has been a lot of fun and everyone involved has learned a lot, we have so far been catering mostly about "scratching our own itch", i.e. implementing what we needed in order to satisfy our ego and/or to implement the ideas we had regarding cellular security.

I believe we cannot miss the bigger opportunity here to put our code into bigger use: To create something like a very simple GSM feature phone.

-- Harald Welte

Comments (none posted)

Facebook's "HipHop Virtual Machine" released

Facebook has announced the release of the HipHop Virtual Machine. "So, early last year, we put together a small team to experiment with dynamic translation of PHP code into native machine code. What resulted is a new PHP execution engine based on the HipHop language runtime that we call the HipHop Virtual Machine (hhvm). We're excited to report that Facebook is now using hhvm as a faster replacement for hphpi, with plans to eventually use hhvm for all PHP execution." They claim some significant speed improvements; the announcement has a fair amount of detail about how it works. The source is available from Github.

Comments (23 posted)

KDE Plasma Active Two released

The KDE project has announced the release of Plasma Active Two, the second iteration of its mobile device environment. Changes include a lot of user interface improvements, better performance, and "recommendations": "Plasma Active is now able to learn as you use your device. It uses that information to make recommendations as to what content, web sites and applications are likely to be related to what you are doing right now. This technology uses the power of the 'semantic desktop' efforts from KDE Nepomuk to make your device a more valuable adviser and helper. Future releases will build on predictive power as well as the breadth of recommendations."

Comments (13 posted)

ODB 1.7.0

ODB is a C++ object-relational mapping library for C++. The 1.7.0 release includes a new "optimistic concurrency" mechanism, SQL statement tracing, Oracle database support, read-only data members, and more; see this posting for more information.

Full Story (comments: none)

Open Dylan 2011.1

Open Dylan is an implementation of Dylan, "an advanced, object-oriented, dynamic language which supports rapid program development." The 2011.1 release - the first since beta4 in 2007 - is out. This release includes some relicensing (to the MIT license) and a sizeable reduction in code size among other things.

Comments (none posted)

Rockbox 3.10 released

On its tenth anniversary, December 7, Rockbox released version 3.10 of the free alternative firmware for a wide variety of music players. Version 3.10 is considered stable on more than two dozen different players as can be seen in the release notes. Notable features in the release include better catalog handling, theming improvements, a fully functional audio mixer, support for a bunch of gaming audio formats, additional embedded album art support, Ogg Vorbis decoding performance improvements, and more. More information can be found on the Rockbox home page.

Comments (10 posted)

Upstart 1.4 released

Version 1.4 of the upstart system init daemon is out. New features include the ability to capture standard error and output streams from system jobs to a log file, the ability to run system jobs under specific user and group IDs, and more.

Full Story (comments: none)

WordPress 3.3 released

The WordPress 3.3 release (code-named "Sonny") is available. "Experienced users will appreciate the new drag-and-drop uploader, hover menus for the navigation, the new toolbar, improved co-editing support, and the new Tumblr importer. We've also been thinking a ton about what the WordPress experience is like for people completely new to the software. Version 3.3 has significant improvements there with pointer tips for new features included in each update, a friendly welcome message for first-time users, and revamped help tabs throughout the interface. Finally we've improved the dashboard experience on the iPad and other tablets with better touch support."

On this topic the LWN site (which is not based on WordPress) is seeing a flood of attempts to exploit the TimThumb vulnerability; anybody running a WordPress site who has not closed this hole should do so immediately.

Comments (14 posted)

Newsletters and articles

Development newsletters from the last week

Comments (none posted)

Hamano: GitTogether 2011

Git maintainer Junio C. Hamano reports on GitTogether 2011 on the Google Open Source blog. A two-day "unconference" event was held at Google's Mountain View headquarters to discuss various Git features, including: "Support for large blobs that would not fit in the memory has been always lacking in Git. There recently has been a lot of work in the native support (e.g. storing them straight to the object store without having to read and hold the whole thing in core, checking out from the object store to the working tree without having to hold the whole thing in core, etc.). There are a few third-party tools and approaches with their own pros-and-cons, but it was generally agreed that adding a split-object encoding like Avery Pennarun's "bup" tools uses would be the right way to help support object transfer between repositories to advance the native support of large objects in Git further."

Comments (47 posted)

Page editor: Jonathan Corbet

Announcements

Brief items

Creative Commons 4.0 process starts

The Creative Commons project has announced the beginning of the process leading to version 4.0 of its license suite. They have some specific issues to solve and are looking for input on other improvements that should be made to the licenses at the same time. "The treatment of sui generis database rights in the 3.0 licenses continues to be a show-stopper for many, including governments in Europe. This fosters an environment in which custom licenses proliferate, inevitably resulting in silos of incompatibly-licensed content that cannot be maximally shared and remixed. But there exist still other reasons for pursuing 4.0 at this time, including the desire to adjust the licenses to more fully support adoption by intergovernmental organizations and those looking for a more internationally-oriented license suite."

Comments (3 posted)

The Linaro Community Contributor Process

The Linaro project has announced the creation of its "community contributor process." "Linaro itself is now an organisation of around 120 engineers, but as we continue to grow the community around us is also growing fast. We're grateful to the many people who are participating in our success, and so we're introducing the Community Contributor process to recognise those community members who have sustained contributions over a significant period of time." Benefits to being named a "Community Contributor" include a Linaro email address and "IRC cloak"; all one has to do is to sign an agreement giving a broad copyright and patent license to Linaro for all contributions made from that email address.

Full Story (comments: 13)

Articles of interest

2011: The Year of Linux Disappointments (Datamation)

According to this Datamation article by Bruce Byfield, 2011 was "a kidney stone of a year" for free software. "Not that any great disaster struck in the last twelve months. For many -- even most -- businesses and community projects, the year was routine, with new products and releases rolling out like any other year. However, at the same time, opposition to free software continued to build in 2011. Nor was the year a lucky one for anyone taking a new direction. In fact, when you look back at 2011, most of the major events were disappointments, only occasionally softened by unexpected secondary results."

Comments (112 posted)

FSFE: Helsinki city officials report high satisfaction with Free Software

The Free Software Foundation Europe reports on a program to switch Helsinki, Finland government employees away from MS Office to OpenOffice. "During year 2011 a number of projects have been started to increase of use of Free Software in the public administration in Finland. Besides Helsinki, similar initiatives have been undertaken in the city councils of Tampere, Turku, Paimio and Salo, usually started by the council members. In the spring of 2011 71 % of members of parliament responded "yes" to the claim that the state should prefer Free Software (such as GNU/Linux and OpenOffice) in its ICT acquisitions."

Full Story (comments: none)

Calls for Presentations

Embedded Linux Conference Call for Presentations

The Embedded Linux Conference (ELC 2012) will take place February 15-17 in Redwood Shores, California. The Call For Participation is open until January 6, 2012.

Full Story (comments: none)

GNU Tools Cauldron 2012 - 2nd Call for Abstracts and Participation

GNU Tools Cauldron 2012 is scheduled for July 9-11 in Prague, Czech Republic. Abstract submissions must be in by the end of January 2012. "The purpose of this workshop is to gather all GNU tools developers, discuss current/future work, coordinate efforts, exchange reports on ongoing efforts, discuss development plans for the next 12 months, developer tutorials and any other related discussions." The event is still in the planning stage so the exact date in July is subject to change.

Full Story (comments: none)

Upcoming Events

LibrePlanet 2012 conference announced

The Free Software Foundation (FSF) has announced that the 2012 LibrePlanet conference will take place in Boston, Massachusetts on March 24-25, 2012. The call for papers is open until January 14.

Full Story (comments: none)

SCALE prep continues

Early bird registration for the Southern California Linux Expo (SCALE) closes December 21. The conference takes place January 20-22, 2012 in Los Angeles, California.

Full Story (comments: none)

Events: December 15, 2011 to February 13, 2012

The following event listing is taken from the LWN.net Calendar.

Date(s)EventLocation
December 27
December 30
28th Chaos Communication Congress Berlin, Germany
January 12
January 13
Open Source World Conference 2012 Granada, Spain
January 13
January 15
Fedora User and Developer Conference, North America Blacksburg, VA, USA
January 16
January 20
linux.conf.au 2012 Ballarat, Australia
January 20
January 22
Wikipedia & MediaWiki hackathon & workshops San Francisco, CA, USA
January 20
January 22
SCALE 10x - Southern California Linux Expo Los Angeles, CA, USA
January 27
January 29
DebianMed Meeting Southport2012 Southport, UK
January 31
February 2
Ubuntu Developer Week #ubuntu-classroom, irc.freenode.net
February 4
February 5
Free and Open Source Developers Meeting Brussels, Belgium
February 6
February 10
Linux on ARM: Linaro Connect Q1.12 San Francisco, CA, USA
February 7
February 8
Open Source Now 2012 Geneva, Switzerland
February 10
February 12
Skolelinux/Debian Edu developer gathering Oslo, Norway
February 10
February 12
Linux Vacation / Eastern Europe Winter session 2012 Minsk, Belarus

If your event does not appear here, please tell us about it.

Page editor: Rebecca Sobol

Copyright © 2011, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds