Leading items

In search of a home for Thunderbird

By Nathan Willis
May 4, 2016

After nearly a decade of trying, Mozilla is finally making the move of formally spinning off ownership of the Thunderbird email client to a third party. The identity of the new owner is still up for debate; Simon Phipps prepared a report [PDF] analyzing several possible options. But Mozilla does seem intent on divesting itself of the project for real this time. Whoever does take over Thunderbird development, though, will likely face a considerable technical challenge, since much of the application is built on frameworks and components that Mozilla will soon stop developing.

Bird versus fox

To say that Mozilla has had a difficult relationship with Thunderbird would be putting things mildly. The first release was in 2003, with version 1.0 following in late 2004. As soon as 2007, though, Mozilla's Mitchell Baker announced that Mozilla wished to rid itself of Thunderbird and find a new home for the project. Instead, Mozilla ended up separating Thunderbird off into a distinct unit (Mozilla Messaging) under the Mozilla Foundation umbrella. It then reabsorbed that unit in 2011, with Baker noting:

The Thunderbird team has re-made Thunderbird into a modern email client. Thunderbird now has a more modular architecture, vastly modernized codebase, effective add-on mechanisms, a vastly improved user interface, and incremental innovations that continue to evolve and move the product forward. We intend to continue our work with the Thunderbird email product to meet this need.

But, in July 2012, Mozilla began pulling paid developers from Thunderbird and left its development primarily in the hands of community volunteers, with a few Mozilla employees performing QA and build duties to support the Extended Support Release (ESR) program. At the time, Baker offered this justification:

We’ve tried for years to build Thunderbird as a highly innovative offering, where it plays a role in moving modern Internet messaging to a more open, innovative space, and where there is a growing, more active contributor base. To date, we haven’t achieved this.

By 2014, Mozilla had ramped down its involvement to the point where the Thunderbird team lacked any clear leadership, so the developer community voted to establish a Thunderbird Council made up of volunteers.

Most recently, Baker announced in December 2015 that Thunderbird would be formally separated from Mozilla. Phipps was engaged to research the options that he later published in the aforementioned report. In April 2016, Gervase Markham announced that the search for a new home for the project was underway, with Phipps's recommendations serving as a guide.

Lizard tech

For fans of Thunderbird, the repeated back-and-forth from Mozilla leadership can be a source of frustration on its own, but it probably does not help that Mozilla has started multiple other non-browser projects (such as ChatZilla, Raindrop, Grendel, and Firefox Hello) over the years while insisting that Thunderbird was a distraction from Firefox. Although it might seem like Mozilla management displays an inconsistent attitude toward messaging and other non-web application projects, each call for Mozilla to rid itself of Thunderbird has also highlighted the difficulty of maintaining Thunderbird and Firefox in the same engineering and release infrastructure.

In recent years, due in no small part to pressure coming from the rapid release schedule of Google's Chrome, the Firefox development process has shifted considerably. There are new stable releases made approximately every six weeks, and development builds are provided for the next two releases in separate release channels.

In addition, the Firefox codebase itself is changing. The XUL and XPCOM frameworks are on their way out, to be replaced with components and add-ons written in JavaScript. The Gecko rendering engine is also marked for replacement by Servo, and the entire Firefox architecture may be replaced with the multi-process Electrolysis model.

While these changes are exciting news for Firefox, none of them have made their way into Thunderbird. In April, Mozilla's Mark Surman highlighted the divergence issue in a blog post, noting:

Many people who work on Firefox care about Thunderbird and do everything they can to accommodate Thunderbird as they evolve the code base, which slows down Firefox development when it needs to be speeding up. People in the Thunderbird community also remain committed to building on the Firefox codebase. This puts pressure on a small, dedicated group of volunteer coders who struggle to keep up.

Surman also pointed to a new job listing posted by Mozilla for a contractor who would oversee the transition. The posting describes two key responsibilities: to list all significant technical issues facing Thunderbird (including impact assessments) and to compile an outline of the options available to address those issues to move Thunderbird forward.

Former Mozilla developer Daniel Glazman responded to Surman's post on his own blog, with a more blunt assessment of the technical challenges facing Thunderbird developers. He pointed to the job posting's mention of XUL and XPCOM deprecation and said:

In practice, the last line above means for Thunderbird:

rewrite the whole UI and the whole JS layer with it
most probably rewrite the whole SMTP/MIME/POP/IMAP/LDAP/... layer
most probably have a new Add-on layer or, far worse, no more Add-ons

Glazman concluded that it is too soon to select a new host for the Thunderbird project, given that a decision has yet to be made about how to rewrite the application. Furthermore, he pointed out, Mozilla has not yet begun the transition away from XUL and XPCOM in the Firefox codebase. Only when that process starts, he said, will it be possible to assess the complexity of such a move for Thunderbird.

As far as the build infrastructure goes, Markham sent a proposal to the Thunderbird Council in March suggesting a path forward for separating Thunderbird from the Firefox engineering infrastructure. It did not spawn much discussion, but there did not seem to be any objection either.

Out of the nest

For now, Mozilla seems set on finding a new fiscal and organizational sponsor for Thunderbird, with The Document Foundation and the Software Freedom Conservancy (both highlighted in Phipps's report) currently the leading candidates. But the discussion has only just begun on the technical aspects of maintaining and evolving Thunderbird as a standalone application.

Surman contended that the needs of Firefox and Thunderbird are simply too different today for them to be tied to the same codebase and release process. Essentially, the web changes rapidly, while email changes slowly. It is hard to argue with that assertion (setting aside discussions of how email should change), but Thunderbird fans might contend that Mozilla not contributing developer time to the Thunderbird codebase only exacerbates any inherent difference between the browser and email client.

Whether one thinks Mozilla has not adequately supported Thunderbird over the years or has done its level best, the Thunderbird and Firefox projects today are moving in different directions. Given their shared history, it may seem sad to watch them part ways, but perhaps the Thunderbird community can make the most of the opportunity and drive the application forward where Mozilla could (or would) not.

Comments (32 posted)

Caravel data visualization

By Nathan Willis
May 4, 2016

One aspect of the heavily hyped Internet of Things (IoT) that can easily get overlooked is that each of the Things one hooks up to the Internet invariably spews out a near non-stop stream of data. While commercial IoT users—such as utility companies—generally have a well-established grasp of what data interests them and how to process it, the DIY crowd is better served by flexible tools that make exploring and transforming data easy. Airbnb maintains an open-source Python utility called Caravel that provides such tools. There are many alternatives, of course, but Caravel does a good job at ingesting data and smoothly molding it into nice-looking interactive graphs—with a few exceptions.

My own interest in data-visualization tools stems from IoT projects (namely home-automation and automotive sensors), but Caravel itself is in no way limited to such uses. Like most contemporary web-based service providers, Airbnb collects a lot of data about its users and their transactions (in this case, short-term housing rentals, renters, and property owners). The company also prides itself on having a slick-looking web interface, and Caravel reflects that: it sports modern charts and graphs—no crusty old PNGs with jagged lines generated by Graphviz here; everything is done in JavaScript.

In a nutshell, what Caravel provides is a connection layer supporting a variety of database types, the tools to configure the metrics of interest for any tables one wishes to explore, and an interactive utility for creating data visualizations. Several dozen visualization options are built in, and all of the charts the user creates can be saved and put into convenient "dashboards" for regular usage.

On top of all that, Caravel's interface is web-based and is almost entirely point-and-click. Perhaps the closest parallel would be to a tool like Orange, where the goal is to mask over the complexities of SQL and statistics. Caravel does not quite walk the user through adding new data sources or defining metrics, but it does take care of as many of the repetitive steps as it can.

For example, when you add a database table to your Caravel work space, there are rows of checkboxes by every field. If you want to track the minima, maxima, or sums for certain fields, you check them at load time, and those metrics are automatically available on the relevant pages of the application from then on. Similar checkboxes are available for selecting which fields should be used as categorical groups and which should be available for filtering the data set.

The first public release of Caravel was in September 2015. The most recent is version 0.8.9, from April 2016. The code is hosted at GitHub and packages are also available on the Python Package Index (PyPI). For the moment, only 2.7 is supported. On Linux, installation also requires the development packages for libssl and libffi. When Caravel is installed, one only needs to initialize the database and create an administrator account to get started.

A Caravel instance is multi-user, and the system supports an array of permissions and access controls. For testing, though, that is not necessary. Out of the box, the system provides a local web UI and comes pre-loaded with a demo data set. SQLite support is built in, and any other database (local or remote) with SQLAlchemy support can be used as well. Druid database clusters are also supported, and users can define a custom schema for any database that requires one. For those working with large data sets, the good news is that Caravel also supports a number of open-source caching layers, although none of them are required. All of these configuration options are presented in the web UI's "add a database" screen.

The birds-eye view of Caravel usage is that the user adds a new database, then selects and adds each table of interest. From then on, working with Caravel is a matter of using the visualization builder to hone in on a chart or graph that presents some meaningful information. The visualizations include everything from line charts to bubble graphs, box plots to directed graphs, and heatmaps to Sankey diagrams. There are also less scientific options, such as word clouds.

A visualization can be saved as a "slice," and any number of slices can be collected onto the same page as a "dashboard." Dashboards are updated regularly as the database is refreshed, so they can be deployed for internal or public consumption. Finally, although dashboard graphics are interactive JavaScript (with additional information shown where the mouse hovers), all charts and graphs can also be exported as image files.

This set of features is fairly complete, but one might well ask whether the implementation is up to snuff. For the most part, the answer to that question is yes.

Adding new databases and choosing which tables to use borders on trivial, thanks to the well-optimized add-and-edit pages. There are a few caveats, such as the fact that the user cannot simply add all of the tables of interest from a database at once—each table requires a separate round trip through the "add a table" page. And when Caravel does not like something about a table, it is hard to debug.

For example, Caravel includes special treatment for time-series data; the user can mark any field in a table as being of the datetime type and it will be automatically plugged into various time-series charts in the visualization tool. But Caravel could not make sense out of the timestamps in one example data set I downloaded from datahub.io, and there is no easy way to inspect the data directly, nor does there seem to be any way in the UI to transform the timestamps into an acceptable datetime format. Nor even to see what Caravel thinks is wrong with them.

Clearly, this issue falls under a "you must know your data" warning, which is a fair expectation. But the error reporting that Caravel presents yanks the user right out of the UI, displaying a generic, low-level exception warning and a traceback from the Python interpreter.

And this sometimes happens through no fault of the user, like when the user selects a new graph type from the drop-down menu in the visualization builder and the newly-selected graph takes a different number of parameters. By and large, the visualization tool is quite handy—the point-and-click settings and controls are not merely a coat of "UI paint" on the top; they help the user play around with their data sets to find the visualization settings that work best. Thus, it is more disappointing when that friendly interactivity breaks down.

There are a couple of troubling technical limitations to mention. First, users must construct any new metrics of interest (other than sample counts, sums, and minima/maxima) by entering raw SQL expressions. Some additional statistical tools would be handy. Perhaps more fundamental is the fact that Caravel cannot join or query multiple tables; all of the visualizations are therefore limited to what information one can extract from a single table.

It might be interesting to pair Caravel with a tool like OpenRefine that specializes in data transformation, but I suspect that for a great many users, what Caravel can do already will serve them well. It handles the database connectivity in the background, putting the emphasis on exploring and manipulating visualizations. The visualizations and dashboards it provides are top-notch by modern standards, but the fact that they are easy for the user to create is Caravel's real advantage.

Comments (none posted)

Page editor: Jonathan Corbet
Next page: Security>>