|
|
Log in / Subscribe / Register

Spam filtering with Rspamd

By Jonathan Corbet
September 1, 2017
Running one's own mail system on the Internet has become an increasingly difficult thing to do, to the point that many people don't bother, even if they have the necessary skills. Among the challenges is spam; without effective spam filtering, an email account will quickly drown under a deluge of vile offers, phishing attempts, malware, and alternative facts. Many of us turn to SpamAssassin for this task, but it's not the only alternative; Rspamd is increasingly worth considering in this role. Your editor gave Rspamd a spin to get a sense for whether switching would be a good thing to do.

SpamAssassin is a highly effective tool; its developers could be forgiven for thinking that they have solved the spam problem and can move on. Which is good, because they would appear to have concluded exactly that. The "latest news" on the project's page reveals that the last release was 3.4.1, which came out in April 2015. Stability in a core communications tool is good but, still, it is worth asking whether there is really nothing more to be done in the area of spam filtering.

The Rspamd developers appear to believe that there is; this project is moving quickly with several releases over the past year, the last being 1.6.3 at the end of July. The project's repository shows 2,545 commits since the 1.3.5 release on September 1, 2016; 32 developers contributed to the project in that time, though one of them (Vsevolod Stakhov) was the source of 71% of the commits. The project is distributed under the Apache License v2.

The Rspamd developers clearly see processing speed as one of their selling points. SpamAssassin, written in Perl, is known to be a bit of a resource hog. Rspamd is written in C (with rules and extensions in Lua), and claims to be able to "process up to 100 emails per second using a single CPU core". That should be sufficiently fast for most small-to-medium sites, though it is probably advisable to dedicate another CPU to the task if there are any linux-kernel subscribers in the mix.

One of the nice things about SpamAssassin is that it's relatively easy to set up; in an extreme, it can be run from a nonprivileged account using a procmail incantation with no daemon process required. Rspamd is not so simple; it really wants to run as a separate daemon that is tightly tied into the mail transport agent (MTA). That means, for example, configuring Postfix to pass messages to the Rspamd server; the configuration of Rspamd itself can also be fairly involved. As a result, experimenting with Rspamd is not quite so simple. But, in return, one gets a number of useful features.

Perhaps foremost, the direct integration with the MTA means that spam filtering takes place while the SMTP conversation is ongoing. That makes techniques like greylisting possible. It also enables the rejection of overt spam outright, before it has been accepted from the remote server; this has a couple of advantages: there is no need to store the spam locally, and the sender will get a bounce — assuming there is a real sender who cares about such things. Yes, one can configure things to use SpamAssassin in this way, but it involves a rather larger amount of duct tape.

Rspamd offers many of the same filtering mechanisms that SpamAssassin supports, including regular-expression matching, DKIM and SPF checks, and online blacklists. It has a bayesian engine that, the project claims, is more sophisticated and effective than SpamAssassin's; it looks at groups of words, rather than just single words. There is a "fuzzy hash" mechanism that is meant to catch messages that look like previous spam with trivial changes. As with SpamAssassin, each classification mechanism has a score associated with it; the sum of all the scores gives the overall spam score for a given message.

While it doesn't have to be this way, SpamAssassin is normally used in a binary mode: a message is either determined to be spam or it is not. Rspamd classifies messages into several groups, depending on how obvious its nature is. At different scores, a message might be greylisted, have its subject line marked, have an X-Spam header added, or be rejected outright. Implementing all of these actions requires cooperation from the MTA, of course.

Rspamd comes with its own built-in web server which, by default, is only available through the loopback interface. It can present various types of plots describing the traffic it has processed, as can be seen on the [Rspamd] right. The server can also be used to alter the configuration on the fly, changing the scores associated with various tests, and more. These changes do not appear to be saved permanently, though, so the system administrator still has to edit the (numerous) configuration files to make a change that will stick.

Your editor set up and ran Rspamd with a copy of his email stream. What followed was an unpleasant exercise in going carefully through the spam folder to see what the results were — a task that resembles cleaning up after the family pet with one's bare hands and which quickly reduces one's faith in humanity as a whole. The initial results were a little discouraging, in that Rspamd filtered spam less effectively than SpamAssassin. More discouraging was a fair number of false positives. When the number of incoming spam messages reaches into the thousands per day, one tends not to spend much time looking for messages that were erroneously classified as spam, especially as confidence in the filter grows. So false positives are legitimate email that will probably never be seen; avoiding false positives thus tends to be a high priority for developers of spam filters.

At this point, though, the comparison was somewhat unfair: a fresh Rspamd was pitted against a SpamAssassin with a well-trained bayesian filter. Like SpamAssassin, Rspamd provides a tool that can be used to feed messages for filter training. Your editor happened to have both a mail archive and a massive folder full of spam sitting around. Training the filter with both of those yielded considerably better results and, in particular, an apparent end to false positives — with one exception. And yes, the rspamc tool, used to train the filter, runs far more quickly than sa_learn does.

The one exception regarding false positives is significant. The documentation of Rspamd's pattern-matching rules is poor relative to SpamAssassin, so it took a while to find out what MULTIPLE_UNIQUE_HEADERS is looking for. In short, it is checking the message for multiple instances of headers that should appear only once (References: or In-Reply-to:, for example). The penalty for this infraction is severe: ten points, enough to condemn a message on its own, even if, say, the bayesian filter gives a 100% probability that the message is legitimate. Unfortunately, git send-email is prone to duplicating just those headers at times, with the result that patches end up in the spam folder.

SpamAssassin has an interesting mechanism for automatically computing what the score for each rule should be. Rspamd does not appear to have anything equivalent; how its scores have been determined is not entirely clear. The overall feeling the results suggests a relative lack of maturity that has the potential to create the occasional surprise.

After a few days of use, the overall subjective impression is that Rspamd is nearly — but not quite — as effective as SpamAssassin. It seems especially likely to miss the current crop of "your receipt" spams containing nothing but a hostile attachment. That said, training has improved its performance quickly and may well continue to do so. The experiment will be allowed to run for a while yet.

So is moving from SpamAssassin to Rspamd a reasonable thing to do? A site with a working SpamAssassin setup may well want to stay with it if the users are happy with the results. There might also be value in staying put for anybody who fears the security implications of a program written in C that is fully exposed to a steady stream of hostile input. The project does not appear to have ever called out an update with security implications; it seems unlikely that there have never been any security-relevant bugs fixed in a tool of this complexity.

But, for anybody who sees the benefit of a more active development community, better performance, better MTA integration, newer filtering mechanisms, and a web interface with cute pie charts, changing over might make sense. There is even a module to import custom SpamAssassin rules to make the task easier (but there is no way to import an existing SpamAssassin bayesian database). In any case, it is good to see that development on spam filters continues, even if the SpamAssassin community has mostly moved on to other things.


to post comments

Spam filtering with Rspamd

Posted Sep 1, 2017 18:32 UTC (Fri) by post-factum (subscriber, #53836) [Link]

Using rspamd for quite a long time, it replaces me spamassassin which I simply do not understand, and dspam, which is dead, unfortunately. So far so good.

SpamAssassin development is still active

Posted Sep 1, 2017 19:49 UTC (Fri) by dskoll (subscriber, #1630) [Link] (2 responses)

SpamAssassin may not have had a release in a while, but its SVN repository is still active, with the most recent commits less than a day ago. Most of SpamAssassin's development happens around the rule sets, which are updated outside the normal release cycle.

SpamAssassin development is still active

Posted Sep 8, 2017 8:09 UTC (Fri) by madhatter (subscriber, #4665) [Link] (1 responses)

Thanks, that was the point I thought to make, also. You having kindly made it, I went back and looked at the output from my nightly sa-update job to see how often the rulesets were getting updated; there have been 52 updates so far in 2017, the most recent of which was on 24/6.

They seem to come in waves - no updates for several weeks, then updates every night (or nearly so) for about a couple of weeks; there have been four such waves this year so far. I'm guessing it's part of an incremental development model where they wait until there's good reason to release new rulesets, then iterate towards a maximally-effective version over several nights, but that's just a guess.

At any rate, I think you're very right to make the point that SA's decoupling of the rule-running framework from the rules themselves means that looking only to updates of the framework to determine whether the project is still evolving will give an inaccurate answer. 52 releases this calendar year doesn't look moribund to me.

SpamAssassin development is still active

Posted Sep 8, 2017 21:13 UTC (Fri) by nix (subscriber, #2304) [Link]

They seem to come in waves - no updates for several weeks, then updates every night (or nearly so) for about a couple of weeks; there have been four such waves this year so far. I'm guessing it's part of an incremental development model where they wait until there's good reason to release new rulesets, then iterate towards a maximally-effective version over several nights, but that's just a guess.
It's actually part of a model where Apache-side changes kept breaking the score generation, and it was mostly undocumented so people had to figure out how it worked again and fix it. Also, there are only just enough emails in the corpus, so it keeps on dipping below the threshold and throttling. More volunteers needed!

Spam filtering with Rspamd

Posted Sep 1, 2017 20:01 UTC (Fri) by lkraav (subscriber, #76113) [Link]

A comparison with bogofilter, my current choice, would be interesting, too.

Spam filtering with Rspamd

Posted Sep 1, 2017 20:26 UTC (Fri) by jkingweb (subscriber, #113039) [Link]

I've been using Rspamd for several months by now. Overall its accuracy is significantly better than SpamAssassin's, and I found it much simpler to configure once I could wrap my head around it (which, admitted, was -not- easy).

Learning is fast, but kind of depends on how you use it. Unlike sa-learn, rspamc doesn't (or didn't last I looked) learn messages in batches, and if you're using SQLite, you incur the performance penalty of a sync after every message (which is probably why Redis is recommended). I have managed to tune it to my particular circumstances, though, and I haven't had to fiddle with it since. The only time I had difficulty was after v1.5 was released: it had huge changes, not all of them terribly obvious.

Spam filtering with Rspamd

Posted Sep 1, 2017 22:35 UTC (Fri) by Sesse (subscriber, #53779) [Link]

Lots of duct tape needed to reject spam SMTP-time? Three lines in an Exim ACL:

deny message = Rejected by spam filter (score=$spam_score, req=8.0)
spam = nobody
condition = ${if >={$spam_score_int}{80}{1}{0}}

All the world's not Postfix…

Spam filtering with Rspamd

Posted Sep 2, 2017 2:43 UTC (Sat) by josh (subscriber, #17465) [Link] (5 responses)

> SpamAssassin, written in Perl, is known to be a bit of a resource hog. Rspamd is written in C

For something that will process actively hostile data from the network, this is...not confidence-inspiring.

Spam filtering with Rspamd

Posted Sep 2, 2017 18:46 UTC (Sat) by pwfxq (subscriber, #84695) [Link] (3 responses)

> For something that will process actively hostile data from the network, this is...not confidence-inspiring.

Good job HTTP, SMTP, SSH, et al servers aren't written in C....

Spam filtering with Rspamd

Posted Sep 2, 2017 18:58 UTC (Sat) by jhoblitt (subscriber, #77733) [Link] (2 responses)

Great examples of software that never has CVEs....

Spam filtering with Rspamd

Posted Sep 4, 2017 3:53 UTC (Mon) by marcH (subscriber, #57642) [Link]

as in: "C Vulnerabilities and Exposures"?

Spam filtering with Rspamd

Posted Sep 4, 2017 15:39 UTC (Mon) by dsommers (subscriber, #55274) [Link]

Spam filtering with Rspamd

Posted Sep 11, 2017 21:15 UTC (Mon) by ncm (guest, #165) [Link]

Yes, it's kind of stupid nowadays to start a new user-space project in C, or to continue development in C. Even C programs that have been in the works for a long time are trivially switched over to building with a C++ compiler, and then new work can be done with more modern, less failure-prone, and usually faster facilities.

That's not to claim C++ is perfect -- like many other languages, it remains vulnerable to C's integer overflow / wraparound bugs, with their security implications -- but the overwhelming majority of C pitfalls just disappear, and at negative cost in both performance and programmer time.

Spam filtering with Rspamd

Posted Sep 3, 2017 15:03 UTC (Sun) by mb (subscriber, #50428) [Link]

This looks very promising. I really like the integration of LUA.
And it can't be a lot worse than by current ancient crm114 based setup. :)

But I'm also a bit worried about the attack surface this piece of software has.
The source code looks very tidy at first glance, but it seems to use things like libmagic, which I'm not sure I would prefer to use on an untrusted datastream.
There also seems to be a good deal of low level data processing on char buffers in C. Let's hope it gets the array boundaries correct.

All in all, I'm certainly going to test it.
But not on the production machine, yet.

Spam filtering with Rspamd

Posted Sep 4, 2017 3:51 UTC (Mon) by marcH (subscriber, #57642) [Link] (14 responses)

> Running one's own mail system on the Internet has become an increasingly difficult thing to do, to the point that many people don't bother, even if they have the necessary skills.

A bit off-topic sorry: why would one want to run his or her own mail system?
I understand webmail isn't good enough for many use cases, but don't many providers still provide good old SMTP/POP/IMAP services as in the old days? What's wrong with them?

> Among the challenges is spam; without effective spam filtering, an email account will quickly drown under a deluge of vile offers, phishing attempts, malware, and alternative facts.

I suspect spam is the challenge not just for receiving but also for... sending. Given email from some unusual origin, how do gmail and other big providers make the difference between the rare and honest SMTP senders versus the spammers? Simple answer: they don't and rank everything down. Problem 99.99% solved.

Spam filtering with Rspamd

Posted Sep 4, 2017 3:57 UTC (Mon) by marcH (subscriber, #57642) [Link] (2 responses)

BTW: content-based filtering is to email what antiviruses are to filesystems and firewalls to networking. Funny how some can be popular and others not - in the same communities.

Spam filtering with Rspamd

Posted Sep 4, 2017 17:48 UTC (Mon) by mb (subscriber, #50428) [Link] (1 responses)

>BTW: content-based filtering is to email what antiviruses are to filesystems and firewalls to networking.

Not really.
A single virus on your machine or a single intruder in your network means that you are completely roasted.
A single spam mail means you are a little bit annoyed.

The impact differs a _lot_.

But that does not mean antivirus snake oil does anything to lower the risk or the impact. Quite the reverse.

Spam filtering with Rspamd

Posted Sep 22, 2017 18:58 UTC (Fri) by marcH (subscriber, #57642) [Link]

I was looking at it from a pure and (too?) narrow implementation perspective.

Spam filtering with Rspamd

Posted Sep 4, 2017 12:13 UTC (Mon) by pizza (subscriber, #46) [Link]

> A bit off-topic sorry: why would one want to run his or her own mail system?

Security, privacy, and/or legal concerns?

Unified account/login systems?

Special integration with other stuff?

Etc etc..

Spam filtering with Rspamd

Posted Sep 4, 2017 12:42 UTC (Mon) by copsewood (subscriber, #199) [Link]

"A bit off-topic sorry: why would one want to run his or her own mail system?
I understand webmail isn't good enough for many use cases, but don't many providers still provide good old SMTP/POP/IMAP services as in the old days? What's wrong with them?"

I operate my own mail server partly because I can and partly because I want to know how, as knowhow is what I sell. It also minimises the cost of having your own email identity and per correspondent aliasing while maximising your ability to create web services which use transactional email. Concerning the 2nd question, Roundcube webmail is easier to setup than getting the right IMAP/POP configuration working and stable on any of many mobile devices, while old school MUAs which use IMAP etc. seem to work better on traditional desktop/laptop operating systems, so if operating your own mail server you'll probably want to support both approaches.

Spam filtering with Rspamd

Posted Sep 4, 2017 22:22 UTC (Mon) by raegis (subscriber, #19594) [Link] (8 responses)

> A bit off-topic sorry: why would one want to run his or her own mail system?

Gmail, for example, has limits on attachment size and the number of mails sent and received daily. Also, getting Gmail to work with a course management system is not as easy as setting up a basic Postfix configuration on Debian. Google scans your mail too.

These are the reasons I run my own mail server. The thousands of others my have thousands more reasons.

Incidentally, on my 17 year old email address I've been getting less than 10 spams per day, and they are almost all obviously spam ("C1alis"). Things used to be much, much worse. I used to use greylisting, but nowadays I think it's overkill. A simple bash script (using spamc) triggered by Direvent watching my maildir is pretty simple stuff. Do it yourself spam filtering will never be better than Gmail, but it really is already good enough.

Spam filtering with Rspamd

Posted Sep 6, 2017 0:14 UTC (Wed) by davidstrauss (guest, #85867) [Link] (1 responses)

> Google scans your mail too.

Assuming you're referring to scans to target advertising, they claim to have ceased that [1]. Of course, they're still scanning email to provide the features of Gmail and Google Inbox, including thread auto-classification (Updates, Finance, Low Priority, etc.), automatic calendar event creation, trip itinerary detection, spam detection, filter/rule processing, and search indexing.

[1] https://www.bloomberg.com/news/articles/2017-06-23/google...

Spam filtering with Rspamd

Posted Sep 6, 2017 20:44 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

Apparently that promise has a fuse of unknown length on it.

https://arstechnica.com/tech-policy/2017/09/google-promis...

dream: personal/home email servers an option for all

Posted Sep 14, 2017 5:20 UTC (Thu) by Garak (guest, #99377) [Link] (5 responses)

And some of us, or at least me, have an unkillable dream that decent communication software under total control of the individual should be available for all. It's hard to express to the non-obsessed, but consider it like this- imagine a world where Trump privatizes the post-office. In such a world, many would be chilled from taking hard political and journalistic stances against their dominant/monopoly service provider. The same way many people self-censor themselves to avoid being perceived as a political enemy of Google. Or at least, I believe I personally witnessed a fair amount of that in the pre-Snowden timeframe. Less so lately. But the point is that communication should IMHO be practically considered a sacred activity. Having to agree to terms and conditions of communications service providers that make creative outlier type people always have to second guess whether or not what they say might impact their future carreer and lifetime earnings potential in the business world because Google is such a practically unavoidable lifelong business parter to all of us...

The dream will never die. Of course it will take a DNS system that is better secured, and less than one tenth the current registration prices for things to fall into place. But that, along with spam filtering, seem like entirely solvable problems by the FOSS community. The problem that seems to be taking too long to solve however is getting governments to tell ISPs that they may not discriminate against home email server traffic. That seems consistent with the language of "Network Neutrality" (FCC-10-201) to me. But unfortunately I don't see a right to operate home email/communication servers being enforced well anyday soon. Maybe next decade sometime. It's gonna f'n rock when it finally happens though. Or maybe humanity will just suck in that regard for a good long while past my lifetime. Best thing about dreams- they don't have to come true in your lifetime to bring you some satisfaction.

dream: personal/home email servers an option for all

Posted Sep 22, 2017 19:03 UTC (Fri) by marcH (subscriber, #57642) [Link] (4 responses)

> The problem that seems to be taking too long to solve however is getting governments to tell ISPs that they may not discriminate against home email server traffic.

Can you please explain what are these current "discriminations"? I have no idea sorry. If they're like "You can't send thousands of messages per second to thousands of different recipients from your so-called "home" then I'm afraid I agree with them. Wouldn't threaten political and journalistic stances much.

dream: personal/home email servers an option for all

Posted Sep 25, 2017 11:06 UTC (Mon) by nix (subscriber, #2304) [Link] (3 responses)

Unfortunately that stance is tantamount to saying "you can't run a mailing list". That's the problem with this sort of technical quick-fix: it eliminates the good (opt-in mailing lists of the sort we all use all the time) with the bad (spammers infecting your machine to send spam).

dream: personal/home email servers an option for all

Posted Sep 28, 2017 18:02 UTC (Thu) by marcH (subscriber, #57642) [Link] (2 responses)

> Unfortunately that stance is tantamount to saying "you can't run a mailing list".

... from your home.

I'm fine with that, I don't consider running a mailing-list *from home* part of the first amendment.

(Plus mailing-lists suck anyway, it's fortunate there's NNTP to make them vaguely usable)

dream: personal/home email servers an option for all

Posted Oct 17, 2017 21:14 UTC (Tue) by nix (subscriber, #2304) [Link] (1 responses)

> I'm fine with that, I don't consider running a mailing-list *from home* part of the first amendment.

And I don't consider "does not violate the first amendment" the only constraint proposed public policy should pass. (Given that I live in a country that doesn't have anything like it, though I wish it did...)

dream: personal/home email servers an option for all

Posted Oct 17, 2017 23:58 UTC (Tue) by marcH (subscriber, #57642) [Link]

It would be great if policy makers were helping goals like "decent communication software under total control of the individual available for all". Not holding my breath either.

However if/when they do, I hope they'll find a higher level and more efficient way than trying to fix and/or regulate broken-by-design SMTP email.

Spam filtering with Rspamd

Posted Sep 4, 2017 6:55 UTC (Mon) by gibbon_ (guest, #106637) [Link] (3 responses)

Working for a global hosting company until recently, we had some problems with customers that were using rspamd and causing a massive amount of DNS queries in the process. One customer claiming to process 10-20 mails per second caused about 80.000 queries per second and could not find the problem. Not a huge problem for modern DNS servers but if you use rspamd, perhaps be in the lookout for this issue. I never found out if it was a configuration issue of the customer or some problem with how DNS is done in rspamd. The bug report was not so well received by the developers though.

While I am writing this I realize that we saw this kind of behavior with weirdly configured redis instances too. I don't know if it was redis or rspamd in the end...

Just saying. The project and features look promising. I would like a more easy integration for exim though. Spam assassin is working great so far with exim as someone here has stated before, even with checking mail after end-of-DATA before the mail is accepted.

Spam filtering with Rspamd

Posted Sep 8, 2017 3:11 UTC (Fri) by hmh (subscriber, #3838) [Link]

Hmm, you should almost always have a node-local caching DNS resolver when doing content-filtering that uses DNS lists, be it rspamd, or spamassissin, or even just a bunch of DNS-based blocklists in the MTA. Maybe the rspamd docs should emphasize that detail...

Spam filtering with Rspamd

Posted Sep 8, 2017 10:22 UTC (Fri) by vstakhov (guest, #118379) [Link] (1 responses)

This issue was introduced in 1.6.0 and has been fixed in 1.6.2. The current stable version is 1.6.3. I don't quite understand your comment about Exim integration: what's wrong with it? It works out of the box and provides even more advanced functionality than SA integration.

Spam filtering with Rspamd

Posted Sep 8, 2017 11:49 UTC (Fri) by gibbon_ (guest, #106637) [Link]

Hi,

I think the DNS issue is fine then. Thank you for the fix. :)

About the Exim integration: I did obviously mix up the Exim versions quoted in the documentation. I wrote a reply to my comment to clarify that but probably did never sent that. Sorry for the confusion.

All that remains valid from my comment is then the following part:

> The project and features look promising.

I will probably try it out when I find the time, even though I am currently doing OK with spam-assassin.

Security of Rspamd

Posted Sep 8, 2017 11:19 UTC (Fri) by jnareb (subscriber, #46500) [Link] (4 responses)

You can avoid many problems of C by using standard library appropriately, or by switching to internal or external safer library (like e.g. Git did with moving to it's own managed string microlibrary: strbuf).

I wonder if Rspamd follows OSS best practices, like those described in https://bestpractices.coreinfrastructure.org/

Security of Rspamd

Posted Sep 8, 2017 12:29 UTC (Fri) by kpfleming (subscriber, #23250) [Link] (2 responses)

A quick search in the CII Badge page doesn't show anything, so maybe someone who is an rspamd community member should encourage the project to apply for a badge.

Security of Rspamd

Posted Sep 9, 2017 20:49 UTC (Sat) by gerdesj (subscriber, #5446) [Link] (1 responses)

This makes my teeth itch:

"Best Practices badge is a way for Free/Libre and Open Source Software (FLOSS) projects to show that they follow best practices."

Beware of anyone touting "best practices". There is good practice and there is bad practice. You had better be some form of deity if you think you are good enough to define best practice.

Security of Rspamd

Posted Sep 11, 2017 12:03 UTC (Mon) by hkario (subscriber, #94864) [Link]

Those are best practices we know of, and that implicitly means they *are* best practices.

They don't claim that those are only practices or that in the future we won't be able to find better ones - it's a living document.

Security of Rspamd

Posted Sep 22, 2017 19:09 UTC (Fri) by marcH (subscriber, #57642) [Link]

> You can avoid many problems of C by using standard library appropriately, or by switching to internal or external safer library

There are no bugs in an ideal world where every developer does a perfect job. The real world is different. In the real world, the sharper the tools and the deadlier the bugs.

Spam filtering with Rspamd

Posted Sep 8, 2017 23:16 UTC (Fri) by gerdesj (subscriber, #5446) [Link]

I've been running many smallish MTAs for some time now - about 20 years for customers as well as my own company. My weapon of choice is Exim. Until about a year ago I always deployed SpamAssassin with sa-exim (so you can rewrite subject lines etc) That was fine but spam gets through. The main problem with relying on a Bayesian Classifier is actually getting someone to hand sort ham/spam but for SA, the BC is the most efficient way to get at spam. The other rules are not the best (IMO) and as is pointed out repeatedly here, development has slowed somewhat. I have spent a *lot* of time with SA.

I came across rspamd and decided to give it a whirl. Out of the box it simply came across as a bit good. I threw it onto an Ubuntu minimal, read the docs, enabled all the redis stuff, made a few tweaks to the stock config and let it loose on my company mail feed. The effects were immediate - less spam and more importantly, far more insight into what the heck is going on. The symbol mechanism and the logging in the web interface is extremely good. The config is perhaps a little too flexible to the uninitiated and it can be a bit tricky to decide whether to put something into /etc/rspamd/rspamd.conf.local or a file in /etc/rspamd/ or /etc/rspamd/local.d/ or /etc/rspamd/override.d/ etc. However when you get the hang of it, it is awesome. You can expose lists etc to the web interface so that they can be edited on the fly, for example whitelists and blacklists. You can also amend the symbol scores on the fly via the web interface and there are a lot of them, all in conveniently named groups that are set to 0 by default that might help in your use case.

The pace of development is excellent and the devs (especially vstakhov and fatalbanana - cool handle) are very responsive to issues on GitHub and indeed anywhere. I note vstakhov appears in this house. The documentation is pretty decent as well.

SA has had a place in my heart for many years but I forget how many times I've been nearly blinded by trawling through the output from spamassassin -D lint. rspamd is simply better.

Spam filtering with Rspamd

Posted Sep 15, 2017 7:08 UTC (Fri) by vstakhov (guest, #118379) [Link]

This article misses a lot of important things about running Rspamd to my opinion. First of all, even in a small setup, Redis instance is *highly* recommended for all stuff: greylisting, ratelimits, IP reputation, statistics and so on.

Secondly, Rspamd has been created mainly for large email systems not for personal ones. Specifically, it has a lot of features to process millions of messages: IP reputation which is extremely useful in large scale systems, neural network classifier, adaptive rate limits and so on. Even Bayes is mostly targeted to avoid false positives than to learn fast as it uses more advanced hidden Markov model inside which allows to consider not only individual words but their combinations.

It provides ability to use per-user specific settings, perform DKIM/ARC signing, splitting inbound and outbound scanning. In terms of customisation there is also very significant difference: Rspamd provides Lua API to all aspects of message processing (e.g. HTML parser internals or Redis/HTTP async API). It is also possible to use unlimited (almost) amount of regular expression expressions using hyperscan: I have some examples of 4k regular expressions rulesets which are processed in less than 100ms.

Finally, after GSoC 2017 project, Rspamd will soon have automatic scoring system: https://github.com/cpragadeesh/rspamd/tree/rescore/src/re... The next major release will include it as well as more advanced Machine Learning techniques support (thanks to lua-torch).


Copyright © 2017, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds