Akonadi – still alive and rocking

Posted Jan 12, 2016 5:02 UTC (Tue) by drag (guest, #31333)
In reply to: Akonadi – still alive and rocking by pboddie
Parent article: Akonadi – still alive and rocking

> Then again, if the IMAP stuff sits below some kind of data mapper, maybe the necessary tools just aren't exposed at the right level.

I think so.

My reasoning:

IMAP itself didn't provide enough advantages, or at least provide the right type of advantages, over POP to make it really useful. People with heavy technical focus on email generally just kept treating IMAP servers like POP servers with a added bonus they could easily have multiple email clients on different computers. It was still a advantage to having email database locally on the machine for management. IMAP didn't provide the necessary features to win over that.

And, nowadays, it should be obvious that server-side processing was still the right way to go as evidenced by the dominance of webmail.

My fantasies about a solution:

IMAP + Seive is closer to the right thing, but I still don't think it's enough. Lack of client support is a major problem, but I think even if that was not the biggest problem Seive/IMAP isn't the correct solution. I see things like people dealing with duplicate emails, missing emails, having to craft special rules to filter email into different folders.. etc etc.. this are symptoms that the general mentality of management email is wrong. That mentality being that you copy around and delete and move email from one folder to another.

What I would love to see (other then everybody world-wide deciding email is so abhorrent that we should develop a secure replacement together) is a solution based around stuffing email into a single database and kept 'raw'.

No editing of the email, changing of the headers, adding tags, basing anything on file system dates, or moving it around folders, or anything like that.

That is get rid of the 'mdir' format, or shoving everything into a SQL database concept... but a return to something more closely resembling the original mbox storage format, but optimized. A Log-stuctured Mbox format for lack of a better term/concept. Get rid of all the file-based locking and use a single-purpose service for managing reads/writes to that 'log-structured mbox'. One for each user. Make it as trivial as possible to trigger a robust backup through that service.

Journalctl, for as terrible as it is sometimes, still has no problem reading the 3,500,000+ messages my system has logged on my desktop in less then a minute. Mail applications should have, at the very least, the same level of performance.

Then integrate something like 'notmuch' into it somewhere so every interaction a client has with the service is nothing but the result of searches performed on the original data. Maybe a separate mail-manage service that talks to the lmbox-service, but maybe the same. I would like to make it as trivial as possible for somebody to self-host so simpler the better, dealing with large number of users is a specialized problem. Be able to use just a cheap Linux server at home, or rasberry pi-level machine, or a single cloud instance.

'True' edits and deletes should be special operations and it shouldn't matter much if they are expensive. Due to spam and such things some pre-proccessing of the mail before it gets added to the main database, but it should be kept to a minimum.

Normal operations should just be performed by 'live filters' or 'views' or 'search folders'. You should be able to do things like 'Emails sent directly to me with low spam ratings shall be my inbox'. Then you can say 'emails sent to mailing list X is in Y folder', and then layer it further and have a folder that says 'emails sent to mailing list X and addressed to me directly is in Y+W folder'. So then the original data and format of each mail is preserved and you interact with the mail through various 'views'. Thus things like duplicate emails are not a sign of corruption, but just because they happen to meet multiple different criteria.

Thus you end up with a 'do no harm' approach to managing these messages. No action will normally be performed that risks corrupting the original data. No matter what sort of insane or batshit crazy filters and folders or whatever you place on the data and even if clients conflict with each other and trigger bugs and crashes in your services... the worst thing that can happen is that cause a self-inflicted denial of service attack. Then the only data loss you can suffer is a few missed messages or something corrupt tagged on the end of your 'log structured mbox'. Sure the indexes and metadata databases can be jacked up and unrecoverable, but that is something that can be recreated since all the original data it was derived from is still present. To recover from a disaster you blow away all the files, except the main one, and then selectively re-apply your filters until they are back to where you want them.

Of course optimizations will be necessary. You can't expect a 'view/filter/search folder' to be active instantly after you create it. Indexes of email should be kept so that you don't have to go through your entire history of email every time you open your application. Results from searches should be available 'on demand' as much as possible. Try to set it up so that the mail-manage service just has to look through anything relatively recent and add it to the data it's already derived from your backend store. So to make it possible to have a nice UI some sort of timing information on operations should be accessible, I suppose, so people can have a reasonably accurate idea of how long expensive/long running processing is going to take.

I doubt IMAP could provide a rich enough API to access the 'mail-manage' service, but a close approximation may be able to made for current generation clients. Probably through some sort of IMAP gateway service.