Akonadi – still alive and rocking

Posted Jan 11, 2016 19:16 UTC (Mon) by pboddie (guest, #50784)
In reply to: Akonadi – still alive and rocking by drag
Parent article: Akonadi – still alive and rocking

If the problem is that clients are required to do to much (filtering client-side) then it seems a better move is to put the filtering on the server-side rather then just move 100% of the functionality to the clients.

I was going to write something about that, actually. Some performance problems might be a consequence of the need for lots of service calls to get individual items of data - I seem to recall that things like DCOP and D-Bus were motivated by some apparent need for making lots of calls - but in the traditional database development realm, this is typically a bad thing exhibited by programs that have a loop doing "select something from table where item = ?" over and over again for a list of values that has been obtained from a previous query. (Yes, code like this really does exist in the real world.)

The mention of IMAP got my attention. IMAP - again, if I remember correctly - is also supposed to allow querying, or maybe some extension of it does. So, in principle, the tools would be there to do this efficiently. Then again, if the IMAP stuff sits below some kind of data mapper, maybe the necessary tools just aren't exposed at the right level.

Akonadi – still alive and rocking

Posted Jan 11, 2016 20:16 UTC (Mon) by anselm (subscriber, #2796) [Link] (3 responses)

The IMAP functionality in question is called “SIEVE” and is optional. This makes it difficult to stipulate to its existence in a lowest-common-denominator framework such as Akonadi, which also accepts files in the local file system as a source of e-mail messages. You could probably put equivalent functionality into the local Akonadi component for the benefit of local files, but once you're done you'll have implemented most of an IMAP server, and there are very good free IMAP servers around already.

Akonadi – still alive and rocking

Posted Jan 11, 2016 20:50 UTC (Mon) by pboddie (guest, #50784) [Link] (1 responses)

I guess this is why solutions like Kolab try and put everything in an IMAP/SIEVE-accessible message store, then. (Kolab relies on various KDE-related libraries, too.)

I'll accept that if you want to provide some kind of protocol for accessing messages and similar things, then POP and IMAP are obvious things that potentially leverage compatibility, or at least familiarity, if the protocol resembles them. But ultimately, there may be no escaping a full message store, even though I do also read about people's performance issues with their IMAP infrastructure every now and again (but that's more likely to be related to having lots of users and to potentially misconfigured components).

Akonadi – still alive and rocking

Posted Jan 11, 2016 21:09 UTC (Mon) by anselm (subscriber, #2796) [Link]

POP3 isn't really very useful as a message store protocol (for one, it doesn't have anything resembling SIEVE), so if you want to build on something you're pretty much stuck with IMAP. Dovecot is a very good free high-performance IMAP server that does SIEVE (among other useful things) and sits comfortably on Maildirs (and its own more efficient message store format), so you're basically covered. The downside is that as layer-7 protocols go, IMAP is a very nasty specimen, and it is difficult to fault programmers for not wanting to have anything to do with it unless they really can't avoid it.

Akonadi – still alive and rocking

Posted Jan 12, 2016 11:57 UTC (Tue) by Wol (subscriber, #4433) [Link]

> You could probably put equivalent functionality into the local Akonadi component for the benefit of local files, but once you're done you'll have implemented most of an IMAP server, and there are very good free IMAP servers around already.

I get the feeling they are trying to index EVERYTHING, without bothering to ask the question "Is this WORTH indexing?". Which is where users get so frustrated - the system insists on spending a large chunk of its (and by extension the user's) time doing stuff the user considers *counter*productive*. The classic example is pre-loading the office suite into ram for when the user wants it - I used to have three office suites, all of which I rarely used but sometimes needed, and on a system that's short of ram ...

Cheers,
Wol

Akonadi – still alive and rocking

Posted Jan 12, 2016 5:02 UTC (Tue) by drag (guest, #31333) [Link]

> Then again, if the IMAP stuff sits below some kind of data mapper, maybe the necessary tools just aren't exposed at the right level.

I think so.

My reasoning:

IMAP itself didn't provide enough advantages, or at least provide the right type of advantages, over POP to make it really useful. People with heavy technical focus on email generally just kept treating IMAP servers like POP servers with a added bonus they could easily have multiple email clients on different computers. It was still a advantage to having email database locally on the machine for management. IMAP didn't provide the necessary features to win over that.

And, nowadays, it should be obvious that server-side processing was still the right way to go as evidenced by the dominance of webmail.

My fantasies about a solution:

IMAP + Seive is closer to the right thing, but I still don't think it's enough. Lack of client support is a major problem, but I think even if that was not the biggest problem Seive/IMAP isn't the correct solution. I see things like people dealing with duplicate emails, missing emails, having to craft special rules to filter email into different folders.. etc etc.. this are symptoms that the general mentality of management email is wrong. That mentality being that you copy around and delete and move email from one folder to another.

What I would love to see (other then everybody world-wide deciding email is so abhorrent that we should develop a secure replacement together) is a solution based around stuffing email into a single database and kept 'raw'.

No editing of the email, changing of the headers, adding tags, basing anything on file system dates, or moving it around folders, or anything like that.

That is get rid of the 'mdir' format, or shoving everything into a SQL database concept... but a return to something more closely resembling the original mbox storage format, but optimized. A Log-stuctured Mbox format for lack of a better term/concept. Get rid of all the file-based locking and use a single-purpose service for managing reads/writes to that 'log-structured mbox'. One for each user. Make it as trivial as possible to trigger a robust backup through that service.

Journalctl, for as terrible as it is sometimes, still has no problem reading the 3,500,000+ messages my system has logged on my desktop in less then a minute. Mail applications should have, at the very least, the same level of performance.

Then integrate something like 'notmuch' into it somewhere so every interaction a client has with the service is nothing but the result of searches performed on the original data. Maybe a separate mail-manage service that talks to the lmbox-service, but maybe the same. I would like to make it as trivial as possible for somebody to self-host so simpler the better, dealing with large number of users is a specialized problem. Be able to use just a cheap Linux server at home, or rasberry pi-level machine, or a single cloud instance.

'True' edits and deletes should be special operations and it shouldn't matter much if they are expensive. Due to spam and such things some pre-proccessing of the mail before it gets added to the main database, but it should be kept to a minimum.

Normal operations should just be performed by 'live filters' or 'views' or 'search folders'. You should be able to do things like 'Emails sent directly to me with low spam ratings shall be my inbox'. Then you can say 'emails sent to mailing list X is in Y folder', and then layer it further and have a folder that says 'emails sent to mailing list X and addressed to me directly is in Y+W folder'. So then the original data and format of each mail is preserved and you interact with the mail through various 'views'. Thus things like duplicate emails are not a sign of corruption, but just because they happen to meet multiple different criteria.

Thus you end up with a 'do no harm' approach to managing these messages. No action will normally be performed that risks corrupting the original data. No matter what sort of insane or batshit crazy filters and folders or whatever you place on the data and even if clients conflict with each other and trigger bugs and crashes in your services... the worst thing that can happen is that cause a self-inflicted denial of service attack. Then the only data loss you can suffer is a few missed messages or something corrupt tagged on the end of your 'log structured mbox'. Sure the indexes and metadata databases can be jacked up and unrecoverable, but that is something that can be recreated since all the original data it was derived from is still present. To recover from a disaster you blow away all the files, except the main one, and then selectively re-apply your filters until they are back to where you want them.

Of course optimizations will be necessary. You can't expect a 'view/filter/search folder' to be active instantly after you create it. Indexes of email should be kept so that you don't have to go through your entire history of email every time you open your application. Results from searches should be available 'on demand' as much as possible. Try to set it up so that the mail-manage service just has to look through anything relatively recent and add it to the data it's already derived from your backend store. So to make it possible to have a nice UI some sort of timing information on operations should be accessible, I suppose, so people can have a reasonably accurate idea of how long expensive/long running processing is going to take.

I doubt IMAP could provide a rich enough API to access the 'mail-manage' service, but a close approximation may be able to made for current generation clients. Probably through some sort of IMAP gateway service.