Akonadi – still alive and rocking

Posted Jan 11, 2016 12:32 UTC (Mon) by jospoortvliet (guest, #33164)
In reply to: Akonadi – still alive and rocking by pboddie
Parent article: Akonadi – still alive and rocking

Well, I'm sure more was contributing to poor performance, they've fixed dozens of small and large things and after every optimization something else became the biggest problem, of course. That's how it always works, isn't it?

One of the biggest issues remains, which is very fundamental to the Akonadi design: it is entirely data-agnostic and leaves specific handling of stuff to the client. One consequence is that filtering has to happen client-side: if you want to show the calendar items of today, you must retrieve ALL OF THEM and throw away what you don't want. Welcome, massive overhead...

I've seen some presentations on the design of Akonadi-Next which should fix it, in two ways. First, it does away with the server inbetween altogether, rather letting client apps do everything themselves by loading a library. Concurrency is handled by the database/storage used, which can be specific for each resource (eg sqlite or nosql solutions, even flat text or a binary format where it makes sense). Interestingly, this design is similar to the direction Baloo has taken when taking the lessons from Nepomuk and discarding the single large database and letting apps manage things themselves in specific, data-type optimized databases.

Note that, in the time Akonadi was designed (2001-ish!) much tech simply wasn't around. I guess it was overengineered but it made sense at the time, just like Nepomuk. Better understanding of requirements and real world needs, as well as the emergence of new technology have resulted in the need for a new design...

Anyway, you can find more info if you want by googling.

Akonadi – still alive and rocking

Posted Jan 11, 2016 12:58 UTC (Mon) by aleXXX (subscriber, #2742) [Link] (2 responses)

Wasn't it more like 2005 ? I remember that at the KDE 4 Core meeting in Trysil in 2006 Akonadi already existed, but was still very fresh.

Akonadi – still alive and rocking

Posted Jan 11, 2016 22:42 UTC (Mon) by jospoortvliet (guest, #33164) [Link] (1 responses)

Well, I said designed, it was probably more that the first ideas were sketched around the 2002-2002 timeframe if I am right. Code and the name came later but the problems it was to solve and the how came early that decade.

Akonadi – still alive and rocking

Posted Jan 14, 2016 22:42 UTC (Thu) by cornelius (subscriber, #72264) [Link]

Akonadi was born pretty much exactly ten years ago at the fourth Osnabrück meeting in January 2006. That's where we came up with the name, the original architecture, and a plan for the implementation.

Akonadi – still alive and rocking

Posted Jan 11, 2016 14:35 UTC (Mon) by pboddie (guest, #50784) [Link] (1 responses)

Thanks for the overview!

Note that, in the time Akonadi was designed (2001-ish!) much tech simply wasn't around. I guess it was overengineered but it made sense at the time, just like Nepomuk. Better understanding of requirements and real world needs, as well as the emergence of new technology have resulted in the need for a new design...

Oh, lots of us dabbled with RDF back in the day, that's for sure, but for quite a bit of this kind of thing I'd dispute that "much tech simply wasn't around", although I'd agree that a more widespread understanding of the problems probably wasn't around back then.

Another thing that really needs fixing is the multiple levels of indirection on display in the user interface stuff. When I last looked at doing calendar-related stuff with Kontact, alongside the laundry list of obsolete groupware solutions was some magic Akonadi thing, if I remember correctly (honestly, I'd rather not look at it again), and although this might showcase some technical wizardry, it is just confusing even to people who have the patience, motivation and general background to hunt down the appropriate dialogue choice.

Akonadi – still alive and rocking

Posted Jan 11, 2016 17:49 UTC (Mon) by Wol (subscriber, #4433) [Link]

> Oh, lots of us dabbled with RDF back in the day, that's for sure, but for quite a bit of this kind of thing I'd dispute that "much tech simply wasn't around", although I'd agree that a more widespread understanding of the problems probably wasn't around back then.

The problem is that there aren't any old greybeards like us involved in the design - you know, people who were used to machines where CPU speeds were single-digit megahurtz OR SLOWER. Where RAM was measured in kilobytes and cost 100s or even 1000s of <insert currency unit here>.

People who's instinctive reaction to bloat is "is it necessary?". So much software of today *needs * an overspecified, super-powered machine simply to provide reasonable performance. My triple-core Athlon with 16Gig of ram struggles sometimes (well, I am a geek, I do try to run a high-spec machine). But at work I ran 30 or 40 online users on a R3000 machine (that's equivalent to a 386). And even more on an even older machine a few years before that. User response hasn't improved with faster machines ...

(And it doesn't help when the response to "my machine is so slow it takes over 24 hours to get to the login screen" is "you should be grateful to us. If you want us to help you you need to provide loads of debugging info". Hello? I can't even get to a prompt to try and get that info!!! And yes, I *did* have a system where I had to disable KDE for precisely that reason!)

Cheers,
Wol

Akonadi – still alive and rocking

Posted Jan 11, 2016 17:39 UTC (Mon) by drag (guest, #31333) [Link] (9 responses)

> First, it does away with the server inbetween altogether, rather letting client apps do everything themselves by loading a library.

That seems backwards. I never liked a approach that took functionality that seems to make sense to put in daemons and stuff them into a library.

> One of the biggest issues remains, which is very fundamental to the Akonadi design: it is entirely data-agnostic and leaves specific handling of stuff to the client. One consequence is that filtering has to happen client-side: if you want to show the calendar items of today, you must retrieve ALL OF THEM and throw away what you don't want. Welcome, massive overhead...

If the problem is that clients are required to do to much (filtering client-side) then it seems a better move is to put the filtering on the server-side rather then just move 100% of the functionality to the clients.

I would feel a lot more comfortable with a daemon with a 'REST-ful' API and a simple key value store back-end to take care of things. That way you can not only use Akonadi as a basis for a desktop clients, but have something that actively manages the data, backs it up, and deals with differences between major revisions instead of leaving it up to individual app authors to figure it out among other things.

Then that makes it also useful for building other services to take advantage of it. For example create a 'sister daemon' that takes the binary representation of data from the Akonadi API, textualizes and encapsulates it in JSON (or whatever) for the consumption of remote clients over HTTPS. Thus the KDE desktop could serve as a basis for synchronizing many devices and/or have a easier time integrating with other services. When it comes time to introduce more PIM-functionality then the bulk of the work can be done on the server side and thus client changes are minimal with much easier time of handling backwards compatibility.

Akonadi – still alive and rocking

Posted Jan 11, 2016 19:16 UTC (Mon) by pboddie (guest, #50784) [Link] (5 responses)

If the problem is that clients are required to do to much (filtering client-side) then it seems a better move is to put the filtering on the server-side rather then just move 100% of the functionality to the clients.

I was going to write something about that, actually. Some performance problems might be a consequence of the need for lots of service calls to get individual items of data - I seem to recall that things like DCOP and D-Bus were motivated by some apparent need for making lots of calls - but in the traditional database development realm, this is typically a bad thing exhibited by programs that have a loop doing "select something from table where item = ?" over and over again for a list of values that has been obtained from a previous query. (Yes, code like this really does exist in the real world.)

The mention of IMAP got my attention. IMAP - again, if I remember correctly - is also supposed to allow querying, or maybe some extension of it does. So, in principle, the tools would be there to do this efficiently. Then again, if the IMAP stuff sits below some kind of data mapper, maybe the necessary tools just aren't exposed at the right level.

Akonadi – still alive and rocking

Posted Jan 11, 2016 20:16 UTC (Mon) by anselm (subscriber, #2796) [Link] (3 responses)

The IMAP functionality in question is called “SIEVE” and is optional. This makes it difficult to stipulate to its existence in a lowest-common-denominator framework such as Akonadi, which also accepts files in the local file system as a source of e-mail messages. You could probably put equivalent functionality into the local Akonadi component for the benefit of local files, but once you're done you'll have implemented most of an IMAP server, and there are very good free IMAP servers around already.

Akonadi – still alive and rocking

Posted Jan 11, 2016 20:50 UTC (Mon) by pboddie (guest, #50784) [Link] (1 responses)

I guess this is why solutions like Kolab try and put everything in an IMAP/SIEVE-accessible message store, then. (Kolab relies on various KDE-related libraries, too.)

I'll accept that if you want to provide some kind of protocol for accessing messages and similar things, then POP and IMAP are obvious things that potentially leverage compatibility, or at least familiarity, if the protocol resembles them. But ultimately, there may be no escaping a full message store, even though I do also read about people's performance issues with their IMAP infrastructure every now and again (but that's more likely to be related to having lots of users and to potentially misconfigured components).

Akonadi – still alive and rocking

Posted Jan 11, 2016 21:09 UTC (Mon) by anselm (subscriber, #2796) [Link]

POP3 isn't really very useful as a message store protocol (for one, it doesn't have anything resembling SIEVE), so if you want to build on something you're pretty much stuck with IMAP. Dovecot is a very good free high-performance IMAP server that does SIEVE (among other useful things) and sits comfortably on Maildirs (and its own more efficient message store format), so you're basically covered. The downside is that as layer-7 protocols go, IMAP is a very nasty specimen, and it is difficult to fault programmers for not wanting to have anything to do with it unless they really can't avoid it.

Akonadi – still alive and rocking

Posted Jan 12, 2016 11:57 UTC (Tue) by Wol (subscriber, #4433) [Link]

> You could probably put equivalent functionality into the local Akonadi component for the benefit of local files, but once you're done you'll have implemented most of an IMAP server, and there are very good free IMAP servers around already.

I get the feeling they are trying to index EVERYTHING, without bothering to ask the question "Is this WORTH indexing?". Which is where users get so frustrated - the system insists on spending a large chunk of its (and by extension the user's) time doing stuff the user considers *counter*productive*. The classic example is pre-loading the office suite into ram for when the user wants it - I used to have three office suites, all of which I rarely used but sometimes needed, and on a system that's short of ram ...

Cheers,
Wol

Akonadi – still alive and rocking

Posted Jan 12, 2016 5:02 UTC (Tue) by drag (guest, #31333) [Link]

> Then again, if the IMAP stuff sits below some kind of data mapper, maybe the necessary tools just aren't exposed at the right level.

I think so.

My reasoning:

IMAP itself didn't provide enough advantages, or at least provide the right type of advantages, over POP to make it really useful. People with heavy technical focus on email generally just kept treating IMAP servers like POP servers with a added bonus they could easily have multiple email clients on different computers. It was still a advantage to having email database locally on the machine for management. IMAP didn't provide the necessary features to win over that.

And, nowadays, it should be obvious that server-side processing was still the right way to go as evidenced by the dominance of webmail.

My fantasies about a solution:

IMAP + Seive is closer to the right thing, but I still don't think it's enough. Lack of client support is a major problem, but I think even if that was not the biggest problem Seive/IMAP isn't the correct solution. I see things like people dealing with duplicate emails, missing emails, having to craft special rules to filter email into different folders.. etc etc.. this are symptoms that the general mentality of management email is wrong. That mentality being that you copy around and delete and move email from one folder to another.

What I would love to see (other then everybody world-wide deciding email is so abhorrent that we should develop a secure replacement together) is a solution based around stuffing email into a single database and kept 'raw'.

No editing of the email, changing of the headers, adding tags, basing anything on file system dates, or moving it around folders, or anything like that.

That is get rid of the 'mdir' format, or shoving everything into a SQL database concept... but a return to something more closely resembling the original mbox storage format, but optimized. A Log-stuctured Mbox format for lack of a better term/concept. Get rid of all the file-based locking and use a single-purpose service for managing reads/writes to that 'log-structured mbox'. One for each user. Make it as trivial as possible to trigger a robust backup through that service.

Journalctl, for as terrible as it is sometimes, still has no problem reading the 3,500,000+ messages my system has logged on my desktop in less then a minute. Mail applications should have, at the very least, the same level of performance.

Then integrate something like 'notmuch' into it somewhere so every interaction a client has with the service is nothing but the result of searches performed on the original data. Maybe a separate mail-manage service that talks to the lmbox-service, but maybe the same. I would like to make it as trivial as possible for somebody to self-host so simpler the better, dealing with large number of users is a specialized problem. Be able to use just a cheap Linux server at home, or rasberry pi-level machine, or a single cloud instance.

'True' edits and deletes should be special operations and it shouldn't matter much if they are expensive. Due to spam and such things some pre-proccessing of the mail before it gets added to the main database, but it should be kept to a minimum.

Normal operations should just be performed by 'live filters' or 'views' or 'search folders'. You should be able to do things like 'Emails sent directly to me with low spam ratings shall be my inbox'. Then you can say 'emails sent to mailing list X is in Y folder', and then layer it further and have a folder that says 'emails sent to mailing list X and addressed to me directly is in Y+W folder'. So then the original data and format of each mail is preserved and you interact with the mail through various 'views'. Thus things like duplicate emails are not a sign of corruption, but just because they happen to meet multiple different criteria.

Thus you end up with a 'do no harm' approach to managing these messages. No action will normally be performed that risks corrupting the original data. No matter what sort of insane or batshit crazy filters and folders or whatever you place on the data and even if clients conflict with each other and trigger bugs and crashes in your services... the worst thing that can happen is that cause a self-inflicted denial of service attack. Then the only data loss you can suffer is a few missed messages or something corrupt tagged on the end of your 'log structured mbox'. Sure the indexes and metadata databases can be jacked up and unrecoverable, but that is something that can be recreated since all the original data it was derived from is still present. To recover from a disaster you blow away all the files, except the main one, and then selectively re-apply your filters until they are back to where you want them.

Of course optimizations will be necessary. You can't expect a 'view/filter/search folder' to be active instantly after you create it. Indexes of email should be kept so that you don't have to go through your entire history of email every time you open your application. Results from searches should be available 'on demand' as much as possible. Try to set it up so that the mail-manage service just has to look through anything relatively recent and add it to the data it's already derived from your backend store. So to make it possible to have a nice UI some sort of timing information on operations should be accessible, I suppose, so people can have a reasonably accurate idea of how long expensive/long running processing is going to take.

I doubt IMAP could provide a rich enough API to access the 'mail-manage' service, but a close approximation may be able to made for current generation clients. Probably through some sort of IMAP gateway service.

Akonadi – still alive and rocking

Posted Jan 12, 2016 10:47 UTC (Tue) by jospoortvliet (guest, #33164) [Link] (2 responses)

Note that we're on the edge of what I understand about this, technically.

But from my understanding, there's no more central server, but not ALL work is done by the clients, only the reading, which is implemented in a library. Resources (eg IMAP client) are independent processes which retrieve and cache data.

Let me quote
> Basically the new design absents central server in favor of using
> per-resource storage. Each resource would maintain it's own key-value store
> (allowing each resource to store data in a way that is efficient for them
> to work with) and would provide an implementation of standardized interface
> to access this storage directly. Clients would be allowed direct read-only
> access to these storages, while resources are the only one with write
> access. Internally flatbuffers would be used to provide access to the data
> in super-efficient way (lazy-loading, memory mapping, all the fancy stuff).

> Resources would implement pipelines allowing some pre-processing of newly
> incoming data before storing them persistently (think mail filtering,
> indexing, new mail notifications etc.). Inter-resource communication (for
> example to perform inter-resource move or copy), and client->resource
> communication would be done through a binary protocol based on what we have
> now in Akonadi. This design also gives us on-demand start/stop of resources
> for free. Something that requires ridiculous amount of work to make work
> with the current design.

> API-wise, while we can't completely get rid of the "imperative" API of
> having jobs, the core method to provide access to data would be through
> models. Making use of storage data versioning, on update the model simply
> requests changes between current and last revision of the stored data. This
> should prevent us from ending up with overcomplicated beast-models like
> ETM.

See
https://community.kde.org/KDE_PIM/Akonadi_Next/Design
https://community.kde.org/KDE_PIM/Akonadi_Next#Design

https://cmollekopf.wordpress.com/2015/02/08/progress-on-t...
https://cmollekopf.wordpress.com/2015/08/29/bringing-akon...
https://kolab.org/blog/mollekopf/2015/10/22/progress-prot...

Akonadi – still alive and rocking

Posted Jan 12, 2016 16:06 UTC (Tue) by drag (guest, #31333) [Link]

thank you.

Akonadi – still alive and rocking

Posted Feb 9, 2016 4:49 UTC (Tue) by daniel (guest, #3181) [Link]

"from my understanding, there's no more central server..."

And, blessedly, no more relational database. It looks like the new maintainer has a sensible attitude, and the experimental refactoring that is Akonadi might yet prove to be useful. Too bad about sacrificing the entire Kmail user community in the process, myself included. The big lesson here is that there is never a valid reason to perform trapeze without a safety net - kmail2 should have been deployed in parallel with kmail1 until such time as proved to be a complete and viable replacement, just as Apache2 lived beside Apache1 for years.