Development

A year with Notmuch mail

November 9, 2016

This article was contributed by Neil Brown

For a little longer than a year now, I have been using Notmuch as my primary means of reading email. Though the experience has not been without some annoyances, I feel that it has been a net improvement and expect to keep using Notmuch for quite some time. Undoubtedly it is not a tool suitable for everyone though; managing email is a task where individual needs and preferences play an important role and different tools will each fill a different niche. So before I discuss what I have learned about Notmuch, it will be necessary to describe my own peculiar preferences and practices.

Notmuch context

I can identify three driving forces in my attitude to email. First there is a desire to be in control. I want my email to be stored primarily on my hardware, preferably in my home. For this reason, various popular hosted email services are of little interest to me. Second is the difficulty I have with throwing things away; I'm sure I'll never want to look at 99.999% of my email a second time, but I don't know which 0.001% will interest me again, and I don't want to risk deleting something I may someday want. Finally, I am somewhat obsessive about categorization. "A place for everything, and everything in its place" is a goal, but rarely a reality, for me. Email is one area where that goal seems achievable, so I have a broad collection of categories for filing incoming mail, possibly more than I need.

My most recent experience before committing to Notmuch was to use Claws Mail as an IMAP client that accessed email using the Dovecot IMAP server running on a machine in my home. This was sufficient for many years, mostly because I work at home with good network connectivity between client and server. On those rare occasions when I traveled to the other side of the world, latency worsened and the upstream bandwidth over my home ADSL connection was not sufficient to provide acceptable service; it was bearable, but that is all. Using procmail to filter my email into different folders met most of my obsessive need to categorize, but when an email sensibly belonged in multiple categories, there wasn't really any good solution.

I think the frustration that finally pushed me to commit to the rather large step of transitioning to Notmuch was the difficulty of searching. Claws doesn't have a very good search interface and Dovecot (at least in the default configuration) doesn't have very good performance. I knew Notmuch could do better, so I made the break.

A close second in the frustration stakes was that I had to use a different editor for composing email than I used for writing code. Prior to choosing Claws, I used the Emacs View Mail mode; it had been difficult giving up that seamless integration between code editor and email editor. Notmuch offered a chance to recover that uniformity.

Notmuch of a mail system

Notmuch has been introduced and reviewed (twice) in these pages previously so only a brief recap will be provided here. Notmuch describes itself as "not much of an email program"; it doesn't aim to provide a friendly user interface, just a back-end engine that indexes and retrieves email messages. Most of the user-interface tasks are left to a separate tool such as an Emacs mode that I use. In this vein of self-deprecation, the web site states that even "for what it does do, that work is provided by an external library, Xapian". This is a little unfair as Notmuch does contain other functionality. It decodes MIME messages in order to index the decoded text with the help of libgmime. It manages a configuration file with the help of g_key_file from GLib. And it will decrypt encrypted messages, using GnuPG. It even has some built-in functionality for managing tags and tracking message threads.

The genius of Notmuch is really the way it combines all these various libraries together into a useful whole that can then be used to build a user interface. That interface can run the Notmuch tool separately, or can link with the libnotmuch library to perform searches and access email messages.

Notmuch need for initial tagging

Notmuch provides powerful functionality but, quite appropriately, does not impose any particular policy for how this functionality should be used. It quickly became clear to me that there is a tension between using tags and using saved searches as the primary means of categorizing incoming email. Tags are simple words, such as "unread", "inbox", "spam", or "list-lkml", that can be associated with individual messages. Saved searches were not natively supported by Notmuch before version 0.23, which was released in early October (and which calls them "named queries"), but are easily supported by user-interface implementations.

Using tags as a primary categorization is the idea behind the "Approaches to initial tagging" section of the Notmuch documentation. This page provides some examples of how a "hook" can be run when new mail arrives to test each message against a number of rules and then to possibly add a selection of tags to that message. The user interface can then be asked to display all messages with a particular tag.

I chose not to pursue this approach, primarily because I want to be able to change the rules and have the new rule apply equally to old emails, which doesn't work when rules are applied at the moment of mail delivery. The alternative is to use fairly complex saved searches. This ran into a problem when I wanted one saved search to make reference to another, as neither the Emacs interface nor the Notmuch backend had a syntax including one saved search in another search. For example, I have one saved search to identify email from businesses (that I am happy to receive email from) whose mail otherwise looks like spam. So my "spam" saved search is something like:

    tag:spam and not saved:commercial

The new "named queries" support should make this easy to handle but, until I upgrade my Notmuch installation, I have a wrapper script around the "notmuch" tool that performs simple substitutions to interpolate saved searches as required.

It also causes a minor problem in that I have several saved searches that are intermediaries that I'm not directly interested in, but which still appear in my list of saved searches. Those tend to clutter up the main screen in the Emacs interface.

Unfortunately, the indexing that Notmuch performs is not quite complete, so some information is not directly accessible to saved searches, resulting in the need for some limited handling at mail delivery time. Notmuch does not index all headers; two missed headers that are of interest to me are "X-Bogosity" and "References".

I use bogofilter to detect spam, which adds the "X-Bogosity" header to messages to indicate their status. Further, when someone replies to an email that I sent out, I like that reply to be treated differently from regular email, and particularly to get a free pass through my spam filter. I can detect replies by simple pattern matching on the References or In-reply-to headers. While Notmuch does include these values in the index so that threads can be tracked, it does not index them in a way that allows pattern matching, so there is no way for Notmuch to directly find replies to my emails.

To address this need, I have a small procmail filter that runs bogofilter and then files email in one of the folders "spam", "reply", or "recv" depending on which headers are found. Notmuch supports "folder:" queries for searches, so that my saved search can now differentiate based on these headers that Notmuch cannot directly see.

I find that tags still are useful, but that use is largely orthogonal to classification based on content. When new mail arrives, it is automatically tagged as both "unread" and "inbox". When I read a message, the "unread" tag is cleared; when I archive it, the "inbox" tag is cleared. I would like an extra tag, "new", which would be cleared as soon as I see the subject in a list of new email, but the Emacs interface I use doesn't yet support that.

There are other uses for tags, such as marking emails that I should review when submitting my tax return or that need to be reported to bogofilter because it guessed wrongly about their spam status, but they all reflect decisions that I consciously make rather than decisions that are made automatically.

Notmuch remote access

Remote access via IMAP can be slow, but that is still faster than not having remote access at all, which is the default situation when the mail store only provides local access. I have two mechanisms for remote access that work well enough for me.

When I am in my home city, I only need occasional remote access; this is easily achieved by logging in remotely with SSH and running "emacsclient -t" in a terminal window. This connects to my running Emacs instance and gives me a new window through which I can access Notmuch almost as easily as on my desktop. A few things don't work transparently, viewing PDF files and other non-text attachments in particular, but as this is only an occasional need, lack of access to non-text content is not a real barrier. Here we see again the genius of Notmuch in making use of existing technology rather than inventing everything itself. Notmuch isn't contributing at all to this remote access but, since it supports Emacs as a user-interface, all the power of Emacs is immediately available.

For times when I am away from home and need more regular and complete remote access, there is muchsync, a tool that synchronizes two Notmuch mail stores. All email messages are stored one per file, so synchronizing those simply requires determining which files have been added or removed since the last synchronization and copying or deleting them. Tags are stored in the Xapian database, so a little more effort is required there but, again, muchsync just looks to see what has changed since the last sync and copies the relevant tags. I don't know yet if muchsync will synchronize the named queries and other configuration that can be stored in the database in the latest Notmuch release. Confirming that is a major prerequisite to upgrading.

Before discovering muchsync, I had used rsync to synchronize mail stores; I was happy to find that muchsync was significantly faster. While rsync is efficient when there are small changes to large files, it is not so efficient when there are small changes to a large list of files. The first step in an rsync transaction is to exchange a complete list of file names, which can be slow when there are tens of thousands of them. Muchsync doesn't waste time on this step as it remembers what is known to be on the replica, so it can deduce changes locally.

With muchsync, reading email on my notebook is much like reading email on my desktop. Unfortunately, I cannot yet read email on my phone, though I don't personally find that to be a big cost. There is a web interface for Notmuch written in Haskell, but I have not put enough effort into that to get it working so I don't know if it would be a usable interface for me.

When Notmuch mail is too much

As noted above, I don't like deleting email because I'm never quite sure what I want to keep. Notmuch allows me to simply clear the inbox flag; thereafter I'll never see the message again unless I explicitly search for older messages, as my saved searches all include that tag. As a result, I haven't deleted email since I started using Notmuch and have over 600,000 messages at present (528,000 in the last year, over half of that total from the linux-kernel mailing list). The mail store and associated index consume nearly ten gigabytes. I'm hoping that Moore's law will save me from ever having to delete any of this. This large store allows me to see if very large amounts of email is too much or if, as the program claims, "that's not much mail".

As far as I can tell, the total number of messages has no effect on operations that don't try to access all of those messages, so extracting a message by message ID, listing messages with a particular tag, or adding or clearing a tag, for example, are just as fast in a mail store with 100,000 messages as in one with 100 messages. The times when a lot of mail can seem to be too much is when a search matches thousands of messages or more. There are two particular times when I find this noticeable.

As you might imagine, given my need for categorization, I have quite a few saved searches. The Emacs front end for Notmuch has a "hello" page that helpfully lists all the saved searches together with the number of matching messages. Some of these searches are quite complex and, while the complexity doesn't seem to be a particular problem, the number of matches does. Counting the 217,952 linux-kernel messages still marked as in my inbox takes four to eight seconds, depending on the hardware. It only takes a few saved searches that take more than a couple of seconds for there to be an irritating lag when Emacs wants to update the "hello" page. Similarly, generating the list of matches for a large search can take a couple of seconds just to start producing the list, and much longer to create the whole list.

None of these delays need to be a problem. Having precise up-to-the-moment counts for each search is not really necessary, so updating those counts asynchronously would be perfectly satisfactory and rarely noticeable. Unfortunately, the Notmuch Emacs mode updates them all synchronously and (in the default configuration) does so every time the "hello" window is displayed. This delay can become tiresome.

When displaying the summary lines for a saved search, the Emacs interface is not synchronous, so there is no need to wait for the full list to be generated, but one still needs to wait the second or two for the first few entries in a large list to be displayed. If the condition "date:-1month.." is added to a search, only messages that arrived in the last month will be displayed, but they will normally be displayed without any noticeable delay as there are far fewer of them. The user interface could then collect earlier months asynchronously so they can be displayed quickly if the user scrolls down. The Emacs interface doesn't yet support this approach.

Notmuch locking

As a general rule, those Notmuch operations that have the potential to be slow can usually be run asynchronously, thus removing much of the cost of the slowness. Putting this principle into practice causes one to quickly run up against the somewhat interesting approach to locking that Xapian uses for the indexing database.

When Xapian tries to open the database for write access and finds that it is already being written to, its response is to return an error. As I run "notmuch new" periodically in the background to incorporate new mail, attempts to, for example, clear the "inbox" flag sometimes fail because the database cannot be updated, and I have to wait a moment and try again. I'd much rather Notmuch did the waiting for me transparently.

If one process has the database open for read access and another process wants write access, the writer gets the access it wants and the reader will get an error the next time that it tries to retrieve data. This may be an appropriate approach for the original use case for Xapian but seems poorly suited for email access. It was sufficient to drive me to extend my wrapper script to take a lock on a file before calling the real Notmuch program, so that it would never be confronted with unsupported concurrency.

The most recent version of Xapian, the 1.4 series released in recent months, adds support for blocking locks, and Notmuch 0.23 makes use of these to provide a more acceptable experience when running Notmuch asynchronously.

Working with threads

One feature of Notmuch that I cannot quite make my mind up about is the behavior of threads. In a clear contrast to my finding with JMAP, the problem is not that the threads are too simplistic, but that they are rich and I'm not sure how best to tame them.

As I never delete email, every message in a thread remains in the mail store indefinitely. When Notmuch performs a search against the mail store it will normally list all the threads in which any message matches the search criteria. The information about the thread includes the parent/child relationship between messages, flags indicating which messages matched the search query, and what tags each individual message has.

The Emacs interface uses the parent/child information to display a tree structure using indenting. It uses the "matched" flag to de-emphasize the non-matching messages, either greying them out in the message summary list or collapsing them to a single line in the default thread display, which concatenates all messages in a thread into a single text buffer. It uses some of tags to adjust the color or font such as to highlight unread messages.

This all makes perfect sense and I cannot logically fault it, yet working with threads sometimes feels a little clumsy and I cannot say why. The most probable answer is that I haven't made the effort to learn all the navigation commands that are available; a rich structure will naturally require more subtle navigation and I'm too lazy to learn more than the basics until they prove insufficient. Maybe a focus on some self-education will go a long way here. Certainly I like the power provided by Notmuch threads, I just don't feel that I personally have tamed that power yet.

Notmuch of a wish list

Though I am sufficiently happy with Notmuch to continue using it, I always seem to want more. The need for sensible locking and for native saved searches should be addressed once I upgrade to the latest release, so I expect to be able to cross them off my wish list soon.

Asynchronous updates of the match-counts for saved searches and for the messages in a summary is the wish that is at the top of my list, but my familiarity with Emacs Lisp is not sufficient to even begin to address that, so I expect to have to live without it for a while yet.

One feature that is close to the sweet spot for being both desirable and achievable is to support an outgoing mail queue. Usually when I send email it is delivered quite promptly, thought not instantly, to the server for my email provider. Sometimes it takes longer, possibly due to a network outage, or possibly due to a configuration problem. I would like outgoing email to be immediately stored in the Notmuch database with a tag to say that it is queued. Then some Notmuch hook could periodically try to send any queued messages, and update the tag once the transmission was successful. This would mean that I never have to wait while mail is sent, but can easily see if there is anything in the outgoing queue, and can investigate at my leisure.

There are plenty of other little changes I would like to see in the user interface, but none really interesting enough to discuss here. The important aspect of Notmuch is that the underlying indexing model is sound and efficient and suits my needs. It is a good basis on which to experiment with different possibilities in the user interface.

Comments (20 posted)

Brief items

Development quotes of the week

In a world in which free software wins, are locked-down cloud architectures dominant? Would most hand-held devices be proprietary and difficult to change? Would it be difficult to use any service on any platform? Would we so easily hand over our privacy to media companies? Why, then, in a world in which open source is hyper-successful, are all of the above true? If we declare that open source has won—and I believe it's safe to do so—how could we possibly declare that free software has also won? This is where the conflation of terms is actively toxic. By using them interchangeably, you are taking the air out of the sails of free software advocates everywhere who want to ensure sharing in the cloud, freedom on the web, equal access to technology, and improved privacy for everyone.

— John Mark Walker

It was a hobby, in and of itself. And, you know, I got 1990s computers. And early 2000s computers. I was jacked in and surfing the wave, maaan. It was gonna be the year of the Linux desktop real soon now.

Somewhere along the way, in the last OH GOD TWENTY YEARS, we – along with a bunch of vulture capitalists and wacky Valley libertarians and government spooks and whoever else – built this whole big crazy thing out of that 1990s Internet and…I don’t like it any more.

— Adam Williamson

Comments (none posted)

digiKam 5.3.0 is published

The digiKam Software Collection 5.3.0 has been released. This version is available as an AppImage bundle. "AppImage is an open-source project dedicated to provide a simple way to distribute portable software as compressed binary file, that standard user can run as well, without to install special dependencies. All is included into the bundle, as last Qt5 and KF5 frameworks. AppImage use Fuse file-system, which is de-compressed into a temporary directory to start the application. You don't need to install digiKam on your system to be able to use it. Better, you can use the official digiKam from your Linux distribution in parallel, and test the new version without any conflict with one used in production. This permit to quickly test a new release without to wait an official package dedicated for your Linux box. Another AppImage advantage is to be able to provide quickly a pre-release bundle to test last patches applied to source code, outside the releases plan."

Comments (1 posted)

Paperwork 1.0

Paperwork 1.0, code-named "it's about time !", has been released. Some of the main changes include a switch to Python 3, generated PDFs now include the text from the OCR, 'paperwork-chkdeps' has been replaced by 'paperwork-shell', an option has been added to automatically simplify the content for export, an option has been added to automatically adjust the colors, allow scrolling using the middle click, and more. LWN covered Paperwork last March. (Thanks to Martin Michlmayr)

Comments (none posted)

RPM 4.13.0 released

RPM 4.13.0 has been released. Notable changes include support for file triggers and boolean dependency expressions.

Full Story (comments: none)

systemd 232

Systemd 232 has been released "many new features and even more fixes". There is a new RemoveIPC= option that can be used to remove IPC objects owned by the user or group of a service when that service exits, the new ProtectKernelModules= option can be used to disable explicit load and unload operations of kernel modules by a service, the ProtectSystem= option gained a new value "strict", and much more.

Full Story (comments: none)

Trac 1.2 Released

Trac 1.2 has been released. It is the first major release of Trac in more than 4 years. Highlights from the release include extensible notification system, notification preference panel, usernames replaced with full names, restyled ticket changelog, workflow controls on the New Ticket page, editable wiki page version comments, and datetime custom fields.

Full Story (comments: none)

Newsletters and articles

Development newsletters

Emacs News (November 7)
This week in GTK+ (November 7)
OCaml Weekly News (November 8)
OpenStack Developer Mailing List Digest (November 4)
Perl Weekly (November 7)
Python Weekly (November 3)
Ruby Weekly (November 3)
This Week in Rust (November 8)
Wikimedia Tech News (November 7)

Comments (none posted)

First 64-bit Orange Pi slips in under $20 (HackerBoards.com)

HackerBoards takes a look at the 64-bit Orange Pi. "Shenzhen Xunlong is keeping up its prolific pace in spinning off new Allwinner SoCs into open source SBCs, and now it has released its first 64-bit ARM model, and one of the cheapest quad-core -A53 boards around. The Orange Pi PC 2 runs Linux or Android on a new Allwinner H5 SoC featuring four Cortex-A53 cores and a more powerful Mali-450 GPU."

Comments (29 posted)

The iconic text editor Vim celebrates 25 years (Opensource.com)

Opensource.com celebrates 25 years of Vim. "Vim is a flexible, extensible text editor with a powerful plugin system, rock-solid integration with many development tools, and support for hundreds of programming languages and file formats. Twenty-five years after its creation, Bram Moolenaar still leads development and maintenance of the project—a feat in itself! Vim had been chugging along in maintenance mode for more than a decade, but in September 2016 version 8.0 was released, adding new features to the editor of use to modern programmers."

Comments (2 posted)

Move over Raspberry Pi, here is a $4, coin-sized, open-source Linux computer (ZDNet)

ZDNet takes a look at the VoCore2, a coin-sized computer. "VoCore2 is an open source Linux computer and a fully-functional wireless router that is smaller than a coin. It can also act as a VPN gateway for a network, an AirPlay station to play lossless music, a private cloud to store your photos, video, and code, and much more. The Lite version of the VoCore2 features a 580MHz MT7688AN MediaTek system on chip (SoC), 64MB of DDR2 RAM, 8MB of NOR storage, and a single antenna slot for Wi-Fi that supports 150Mbps."

Comments (22 posted)

Page editor: Rebecca Sobol
Next page: Announcements>>