LWN.net Weekly Edition for March 18, 2010

Applications and bundled libraries

By Jake Edge
March 17, 2010

Package installation for Linux distributions has traditionally separated libraries and application binaries into different packages, so that only one version of a library would be installed and it would be shared by applications that use it. Other operating systems (e.g. Windows, MacOS X) often bundle a particular version of a library with each application, which can lead to many copies and versions of the same library co-existing on the system. While each model has its advocates, the Linux method is seen by many as superior because a security fix in a particular commonly-used library doesn't require updating multiple different applications—not to mention the space savings. But, it would seem that both Mozilla and Google may be causing distributions to switch to library-bundling mode in order to support the Firefox and Chromium web browsers.

One of the problems that distributions have run into when packaging Chromium—the free software version of Google's Chrome browser—is that it includes code for multiple, forked libraries. As Fedora engineering manager Tom "spot" Callaway put it: "Google is forking existing FOSS code bits for Chromium like a rabbit makes babies: frequently, and usually, without much thought." For distributions like Fedora, with a "No Bundled Libraries" policy, that makes it very difficult to include Chromium. But it's not just Chromium.

Mozilla is moving to a different release model, which may necessitate distribution changes. The idea is to include feature upgrades as part of minor releases—many of which are done to fix security flaws—which would come out every 4-6 weeks or so. Major releases would be done at roughly six-month intervals and older major releases would stop being supported soon after a subsequent release. Though the plan is controversial—particularly merging security and features into the minor releases—it may work well for Mozilla, and the bulk of Mozilla's users who are on Windows.

Linux distributions often extend support well beyond six-months or a year, though. While Mozilla is still supporting a particular release, that's easy to do, but once Mozilla stops that support, it becomes more difficult. Distributions have typically backported security fixes from newer Firefox versions into the versions that they shipped, but as Mozilla moves to a shorter support window that gets harder to do. Backporting may also run afoul of the Mozilla trademark guidelines—something that led Debian to create "Iceweasel". The alternative, updating Firefox to the most recent version, has its own set of problems.

A new version of Mozilla is likely to use updated libraries, different from those that the other packages in the distribution use. Depending on the library change, it may be fairly straightforward to use it for those other applications, but there is a testing burden. Multiple changed libraries have a ripple effect as well. Then there is the problem of xulrunner.

Xulrunner is meant to isolate applications that want to embed Mozilla components (e.g. the Gecko renderer) from changes in the Mozilla platform. But xulrunner hasn't really committed to a stable API, so updates to xulrunner can result in a cascade of other updates. There are many different packages (e.g. Miro, epiphany, liferea, yelp, etc.) that use xulrunner, so changes to that package may require updates to those dependencies, which may require other updated libraries, and so on.

The Windows/Mac solution has the advantage that updates to Firefox do not require any coordination with other applications, but it has its set of downsides as well. Each application needs some way to alert users that there are important security fixes available and have some mechanism for users to update the application. Rather than a central repository that can be checked for any pending security issues, users have to run each of their installed applications to update their system. Furthermore, a flaw in a widely used library may require updating tens or hundreds of applications, whereas, in the Linux model, just upgrading the one library may be sufficient.

It would appear that Ubuntu is preparing to move to the bundled library approach for Firefox in its upcoming 10.04 (Lucid Lynx) release. That is a "long-term support" (LTS) release that Ubuntu commits to supporting for three years on the desktop. One can imagine that it will be rather difficult to support Firefox 3.6 in 2013, so the move makes sense from that perspective. But there are some other implications of that change.

For one thing, the spec mentions the need to "eliminate embedders" because they could make it difficult to update Firefox: "non-trivial gecko embedders must be eliminated in stable ubuntu releases; this needs to happen by moving them to an existing webkit variant; if no webkit port exists, porting them to next xulrunner branch needs to be done." Further action items make it clear that finding WebKit alternatives for Gecko-embedders is the priority, with removal from Ubuntu (presumably to "universe") being the likely outcome for most of the xulrunner-using packages.

In addition, Ubuntu plans to use the libraries that are bundled with Firefox, rather than those that the rest of the system uses, at least partially because of user experience issues: "enabling system libs is not officially supported upstream and supporting this caused notable work in the past while sometimes leading to a suboptimal user experience due to version variants in the ubuntu released compared to the optimize version shipped in the firefox upstream tarballs." While it may be more in keeping with Mozilla's wishes, it certainly violates a basic principle of Linux distributions. It doesn't necessarily seem too dangerous for one package, but it is something of a slippery slope.

The release model for Chromium is even more constricting as each new version is meant to supplant the previous version. As Callaway described, it contains various modified versions of libraries, which makes it difficult for distributions to officially package in any way other than with bundled libraries. If that happens in Ubuntu for example, that would double the number of applications shipped with bundled libraries. Going from one to two may seem like a fairly small thing, but will other upstreams start heading down that path?

The Fedora policy linked above is worth reading for some good reasons not to bundle libraries, but there are some interesting possibilities in a system where that was the norm. Sandboxing applications for security purposes would be much more easily done if all the code lives in one place and could be put into some kind of restrictive container or jail. Supporting multiple different versions of an application also becomes easier.

It is fundamentally different from the way Linux distributions have generally operated, but some of that is historical. While bandwidth may not be free, it is, in general, dropping in price fairly quickly. Disk space is cheap, and getting cheaper; maybe there is room to try a different approach. The distribution could still serve as a central repository for packages and, perhaps more importantly, as a clearinghouse for security advisories on those packages.

Taking it one step further and sandboxing those applications, so that any damage caused by an exploit is limited, might be a very interesting experiment. The free software world is an excellent candidate for that kind of trial, in fact it is hard to imagine it being done any other way; the proprietary operating systems don't have as a free a hand to repackage the applications that they run. It seems likely that the negatives will outweigh the advantages, but we won't really know until someone gives it a try.

Comments (115 posted)

Archiveopteryx

By Jonathan Corbet
March 16, 2010

Your editor, like many LWN readers, deals in large quantities of electronic mail. As a result, tools which can help with the mail flood are always of interest. One tool which has been on the radar for some time is Archiveopteryx, a database-backed mail store which is meant to deal with high mail volumes. Archiveopteryx does not seem to have a hugely high profile, but it does have a dedicated user base and a steady development pace; Archiveopteryx 3.1.3 was released on March 10.

The idea behind Archiveopteryx is simple enough: build a mail store around the PostgreSQL database, then provide access to it through the usual protocols. Installation is relatively easy for a site which already has PostgreSQL in place; a simple "make install" does the bulk of the work. A straightforward configuration file allows for control over protocols, ports, etc., and there is an administrative program which can be used to set up users within the mail store.

On the protocol side, Archiveopteryx supports POP and IMAP for access to email. It can handle mail receipt directly through SMTP, but that is not normally how one would do things; there is still value in having a real mail transfer agent in the process. The preferred mode is to use the ~~LTMP~~ LMTP protocol to accept mail from the MTA; there is also a command-line utility which can be used for that purpose if need be. The installation instructions include straightforward recipes for configuring Archiveopteryx to work with a number of MTAs. Archiveopteryx also supports the Sieve filtering standard and the associated protocol for managing scripts.

Those who set up a large-scale mail store can be expected to have some archived mail sitting around. Archiveopteryx provides an aoximport tool for importing this email into the system. Your editor found it to be overly simple and inflexible, though. It is unable to create subfolders when importing an entire folder tree (they must already be in place or the import fails), and it failed to import the bulk of the messages when working with a Dovecot-managed maildir mailbox. The importer, perhaps, is like the Debian installer: users tend to only need it once, so it gets relatively little work once the basic functionality is in place.

Archiveopteryx works well as an IMAP server, and it is indeed fast when dealing with folders containing many messages. Operations like deleting or refiling groups of messages go notably faster than with Dovecot on the same server. On the other hand, your editor was unable to get the Sieve script functionality to work at all; this is probably more a matter of incomplete configuration than fundamental problems with Archiveopteryx itself, but it was still a discouraging development.

That ties into the biggest disappointment with Archiveopteryx, though, which is probably totally unjustified: your editor would like this tool to be something that it is not. If one is going to go to the trouble of storing all of one's email into a complex database, it would be nice to be able to do fast, complex searches on that email. That way, the next time it becomes necessary to, say, collect linux-kernel zombie posts, a quick search will do. Archiveopteryx seems to have a search feature built into it, but actually using that feature appears to be limited to exporting messages with the aoxexport tool. The IMAP protocol is not particularly friendly toward the implementation of fast, server-side searching, but it still seems like something better should be possible.

All that should not detract from what Archiveopteryx does well: store and serve email in large volumes using standard protocols. As a tool for ISPs and for others needing to make email available to lots of users, it seems highly useful; it is clearly meant to scale in ways that servers like Dovecot are not.

There is one remaining problem, though: the future of Archiveopteryx is not entirely assured. For years, this program has been developed by a company called Oryx, which offered commercial support for it. In June, 2009, though, the developers behind Oryx announced that the company was shutting down, with the final closure expected in October of this year. They say:

So we're gradually closing down Oryx, BUT NOT ARCHIVEOPTERYX. We'll relicense it using either the BSD or Apache 2 licenses and continue making new releases for years to come. We both feel obliged to keep the existing archives viable.

(The code is currently licensed under OSLv3).

A sense of obligation may keep Archiveopteryx going for a while, but if it's going to be something that people can count on for years into the future, it will have to develop a more active development community. Archiveopteryx has the look of a solidly company-controlled project - the project's git repository is overwhelmingly dominated by commits from the two principal developers. Such projects are always at a bit of risk if the backing company runs into trouble. But Archiveopteryx is free software, and highly useful free software at that; it seems like its user community should be able to carry it forward.

Comments (28 posted)

OpenTaxSolver solves taxes, openly

March 17, 2010

This article was contributed by Nathan Willis

OpenTaxSolver (OTS) takes on one of open source software's long-standing criticisms: the lack of a simple-to-use tax return preparation application on the level of TaxCut or TurboTax. Although OTS does not feature the step-by-step, question-driven interface popular in the proprietary products, it includes an optional graphical front-end, and enables the user to systematically fill out the most popular US federal income tax forms: 1040, Schedules A,B, C, and D, and eight US state income tax returns.

Over the years there have been several other open source tax preparation projects, but most tend to produce working solutions for only a few years, then fall out of maintainership or disappear altogether. Because the income tax code changes every year, the math and the interface must change every year — in unpredictable and sometimes complicated ways. Consequently, the fact that OTS has been making stable releases since 2004 makes it a stand-out. The team is composed largely of individuals who choose to support specific tax solvers, thus explaining the specific state returns — in past years a few different forms were supported, but not continued in subsequent annual releases. Understandably, work on the code is cyclical, with discussion picking up each year as tax time in the US draws near.

OTS is written in C, and at its heart is a text-driven utility that reads data for input from an external file, "solves" the tax calculation, and writes its output to a separate file. Experienced users may still prefer this approach, but the project's site says most choose to use the bundled GUI instead. The GUI version reads in data from an example or template file, but allows the user to input the correct numbers, then performs the back-end calculations. Using OTS will not fill out your return for you, it will just perform the calculations you need to fill it out correctly yourself.

The latest release is version 7.05, updated March 9, 2010. It includes support for 2009 US Form 1040 (individual federal tax return), plus Schedule A (itemized deductions), Schedule B (Interest and Ordinary Dividends), Schedule C (Profit or Loss From Business), and Schedule D (Capital Gains and Losses), and state income tax returns for 8 of the 41 US states with a state income tax: California, Massachusetts, North Carolina, New Jersey, New York, Ohio, Pennsylvania, and Virginia. Packages are provided for Linux, Windows, and (new for this year), Mac OS X; each of which contains the appropriate binaries as well as the GPLv2-licensed source code.

The Linux package is a 421KB tarball, containing the command line and GUI versions of the program, example data files, and a build script that can be used to rebuild the binaries. The GUI is implemented in Open Tool Kit (Otk), a tiny cross-platform widget library that is entirely self-contained. There is no installation process required; one needs only to unpack the tarball to an appropriate directory and run the binaries.

Command-line usage

There is a separate binary for each state return, a binary for Schedule C, and a single binary that handles US 1040 and Schedules A, B, and D. To generate a return from the command line, first open up a template or example file for the appropriate form from the examples_and_templates directory. The only difference between the two is that "examples" are completely filled-in with test data, while the "templates" contain all zeroes in the numeric entries and blanks in the text entries.

The site's instructions say to create a copy of the template for each individual return being prepared, a helpful tip for those who do taxes for friends and family members. The templates use the .dat extension, but are plain text, and line-oriented. Each field from the official IRS form which you are expected to fill in is represented by a labeled line in the file, and comments both expand on the purpose of the line and give valid input, such as:

    Dependents     ??       {Number of Dependents, self=1, spouse, etc.}
    [...]
    D4		;	{ Short-term gain from 6252, gain or loss from Forms 4684, 6781, 8824. }

The input file does not include every line in the final form, of course; the idea is that the user fills in the basic data, and the solver calculates all of the intermediate lines with the relevant formula. This is where the examples come in handy. While the template provides a line for every required field, the filled-out example input is more helpful because it gives clues as to how to enter data for specific situations. For example, more than one source of interest income is entered for L8a:

    L8a                     {Interest 1099-INT}
              37.71           {Bank Savings}
              12.65           {Credit Union}
              16.85           {Savings Bank}
                    ;

When the input file is complete, simply execute the appropriate binary from the shell prompt, passing the input file as an argument, such as: ./bin/taxsolve_US1040_2009 my_2009_1040.dat. The solver will generate an output file named my_2009_1040.out, containing the correct number for every line of the form, including the final amount owed or to be refunded. For the 1040 solver, numbers for the various Schedules are included in the output in tab-offset blocks at the point just before the line where they are referenced in the main form.

From there, filling out the final forms (whether on paper or PDF) is as simple as copying the data from the output file. There has been talk in past years of adding additional output techniques, including automatically filling-in the editable PDF forms provided by the IRS, or of transforming OTS's output into TurboTax's Tax Exchange Format (TXF) files, but thus far neither technique has made it into a release. A discussion thread on the project's SourceForge site mentions several methods that an entrepreneurial hacker can use to transform OTS text output into a format that can be imported to a PDF form directly.

OTS_GUI

The final binary in the package is the OTS GUI application. Unlike the command-line solvers, though, it must be launched with the provided shell script, Run_taxsolve_GUI_Unix.sh. At launch, it presents a menu of the available tax solvers, and a button with which to select an input file. The input file is loaded in, pre-filling the form fields in the GUI. If the input file is already correct, hitting the "Compute Tax" button generates the output file automatically.

But the advantage of the GUI, of course, is that browsing through the fields and editing the input numbers is easier than editing the text file before hand. The GUI breaks up the long list of fields into convenient, page-sized chunks, including the line numbers and editable comments.

Otk is far from being a "flashy" user interface toolkit; it is very limited in layout and text options, and incorporates a look-and-feel that might even elicit sneers from Motif and Tcl/Tk scripters. Aesthetics aside, though, in practical usage the bare-bones text rendering can be difficult to read — horizontal and vertical scaling seem to be calculated as a percentage of the window dimensions, causing some fields and comments to be overly compressed, and others stretched out. Still, with a little trial and error, it is easy enough to step through all of the pages and produce an accurate output file, and that is ultimately the only goal.

Speaking of Tcl/Tk, there is an alternate, Tcl/Tk-based GUI available for download in a separate package. The timestamp on the latest release is from February of 2010. However, it is source code only, and depends on several external Tcl libraries; building it is not for the faint-of-Tcl-heart.

Technology and taxes

OTS keeps it simple, which is probably the key to its survival over this many years: TaxGeek (the second-most active project) has not been updated since early 2008, the once-promising Tax Code Software Foundation site now redirects to a holding page. The results are not much better for countries other than the US; a few dormant projects exist for UK, German, and Australian returns, but nothing is active.

A combination of factors are proposed to explain the lack of open source tax preparation software whenever the discussion comes up, including the level of legal expertise required to keep up with every-changing tax code, and the fact that most geeks do not find the arithmetic of filling out the paperwork difficult enough to warrant writing an application to do it for them.

Furthermore, the "correctness guarantee" question comes up in any discussion, despite the fact that other tax preparation services and programs only offer guarantees subject to their own list of restrictions and limitations. The OTS site argues that the only way to know for sure that a tax preparation program's numbers are correct is to examine the formulas it uses — something impossible to do with proprietary code. Searching on the web for "mistakes" and "TurboTax" indeed turns up a massive number of hits, many coming from the professional web sites of human tax preparers. OTS at least makes its math accessible, and it does log its steps to stdout and mark them carefully in the output file to assist in double-checking.

To some, the answer is that the government should provide free software enabling its citizens to prepare and file their returns electronically. In the US, there is at least a partial solution, which may also detract from the willpower of the open source community to produce its own alternative. The IRS provides all of its forms, instructions, and ancillary publications for free in PDF format. It has also started allowing individuals and small businesses to file returns electronically, at no charge — through the use of approved, regulated third-party companies.

But this option creates another set of problems for some people with free software leanings. As the OTS site observes, the third-party electronic filing services require entrusting a stranger with highly sensitive personal information, but they also impose other arbitrary restrictions: the return must be prepared and filed all in one session, there are income-limit and business-size restrictions, and each individual must file his or her own return only. To the OTS team, those are reasons enough to take the (tax) law into their own hands.

Any tax preparer will tell you that nothing trumps personal experience when it comes to getting the most deductions and advantages when filing your return. OTS does not attempt to do the professional tax preparer's job; it merely attempts to speed up the process of composing a normal return for a user who already knows more or less what that return should include. Then again, especially if you are the resident tax preparer for your family or friends, a tool to crank out those returns rapidly and systematically is still a win. Time is money.

Comments (17 posted)

Page editor: Jonathan Corbet

Inside this week's LWN.net Weekly Edition

Security: Linux adds router denial-of-service prevention; New vulnerabilities in drupal, kernel, moin, tar/cpio,...
Kernel: Big reader locks; Who let the hogs out?; Huge pages part 4: benchmarking; A critical look at sysfs attribute values.
Distributions: Elive 2.0: Where Debian meets Enlightenment; Mandriva Enterprise Server 5.1; openSUSE Build Service 1.7.2; openSUSE 11.3 Milestone 3; Fedora's "stable release updates vision"; Shuttleworth: 2 year cadence for major releases; interviews with Canonical CEO and COO...
Development: Fun with free maps on the free desktop; Amarok 2.3.0; RE2; passwdqc 1.2.0; ...
Announcements: Ingres DB now available within SUSE; Building an open source business; Hackable Linux clamshell; Simon Phipps elected as OSI director; Eben Moglen interview; Mozilla Add-On Challenge; LinuxCon Japan; Debconf 10; EuroPython 2010; Embedded Linux Conference 2010...

Next page: Security>>