A look at Rspamd

By Nathan Willis
July 1, 2015

SpamAssassin has been the dominant spam-filtering program in the free-software community for at least a decade. But the relative newcomer Rspamd is making significant gains, and may be a project worth serious consideration by those running email servers. It is designed to be easier to extend with new functionality, and the architecture supports running multiple, parallel processes even on clusters.

Vsevolod Stakhov started work on Rspamd in 2010 with the goal of building a more modular alternative to SpamAssassin—both in the architecture of program and in how its spam-filtering tests are designed. The latest major release was version 0.9.0 in May, which added support for native SpamAssassin rules, Domain-based Message Authentication, Reporting and Conformance (DMARC), and many other improvements. A series of minor updates have followed, the latest being 0.9.8 on June 25. The program is packaged for Debian, Ubuntu, Fedora, openSUSE, and is available in the FreeBSD ports collection.

Regarding its modular architecture, Rspamd is designed to run in multiple processes, with one controller process coordinating several workers that may be running locally or on other machines. The workers can each handle a separate task, such as performing statistical analysis of message content or conducting traditional rule-based filtering. The various processes communicate over HTTPS. The system is event-driven, with the intent to not block anywhere in the code, so that it can process many incoming messages simultaneously.

As for the actual spam-filtering tests, Rspamd's core modules are written in C, but the project also supports a flexible system for writing new filters as Lua plugins. The documentation lists eight Lua modules that ship with the default release. They include support for real-time blacklists (RBLs), recognizing common mailing-list signatures, detecting phishing URLs, and more.

The C modules, though, constitute what the project considers the core filtering functionality. They include support for Sender Policy Framework (SPF) validation checking, SURBL blacklists, DomainKeys Identified Mail (DKIM), and more. Like SpamAssassin, Rspamd uses the output of its various filters to assign a score to each message, indicating the likelihood that it is spam. The central filtering tool is a regular-expression engine that scans message body and header content, much like the one found in SpamAssassin.

In fact, there is a separate plugin available that lets one use unaltered SpamAssassin rules with Rspamd. But one difference that the Rpsamd project takes particular pride in is that it can support more sophisticated rules. Its expression-matching engine uses a more complex classifier than the traditional, single-word Bayesian algorithm used in SpamAssassin. Rspamd's algorithm, known as OSB-Bayes (for "Orthogonal Sparse Bigrams") was originally described in a 2006 paper [PDF].

In essence, OSB-Bayes adjusts the "spam probability" factor calculated for a message by de-emphasizing those trigger features (such as suspicious words) that seem to be evenly distributed in a message and emphasizing those trigger features that appear in clusters. The purported advantage is that this approach detects trigger features that are found in statistically unlikely groups, minimizing the "noise" caused by features that one would expect to find, randomly, spread out over the whole message. Rspamd's implementation uses five-word chunks, skipping small words. The process of tuning the implementation has been an ongoing one; the 0.9.0 release notes mention that the latest revisions significantly reduced the false-positive rate.

One of the other core modules implements the fuzzy hash-checking filter. This filter uses the Shingles algorithm (which chops up the text into chunks and searches for matches) to detect phrases that are similar to, but not exact matches for, suspected spam content. Rspamd provides separate fuzzy storage workers to save the hashes calculated on incoming message contents to a local database. That allows the fuzzy-checking module to learn based on a mail server's particular message traffic. Storing the hashes rather than plain text, of course, has privacy benefits.

All the filtering features in the world mean nothing if the system cannot be used in the real world, however. On the deployment front, Rspamd offers an SMTP proxy mode that can be used as an intermediary between any Mail Transfer Agent (MTA) and Mail User Agent (MUA)—although most users will probably prefer to take advantage of the direct MTA integration features. Currently, Rspamd supports integration with Exim, Postfix, Sendmail, and Haraka.

Postfix and Sendmail integration relies on Stakhov's rmilter mail filter. Haraka includes an Rspamd plugin, while Exim integration requires patching the Exim source in several places.

One other distinction between Rspamd and SpamAssassin is that Rspamd includes a web-based administration interface. It allows the user to monitor the status of worker processes, see statistics about the incoming mail traffic, and adjust several of the configuration knobs for calculating filter scores and triggering the resultant actions. There are several competing web interfaces for SpamAssassin, although they are developed by third parties.

Rspamd has certainly not supplanted SpamAssassin for many system administrators, although the project's documentation and release notes frequently mention that it is in use in real-world, high-volume deployments. In 2013, Stakhov told the SpamAssassin mailing list that he had originally written Rspamd for a single client, and has been in the process of growing the project since.

Whatever its origins, one does increasingly see it mentioned in discussions and blog posts about spam filtering—particularly for running a personal mail server, which bodes well for the long-term future. Smaller deployments are often less risky, but they are frequently how a new project gains a foothold with administrators. The project is also a 2015 Google Summer of Code mentoring organization. Between those factors and the rapid pace of development over the past year or two, Rspamd appears to be a healthy project on firm footing—one that mail server administrators might be wise to keep an eye on.

Index entries for this article
Security	Email/Spam prevention
Security	Spam

A look at Rspamd

Posted Jul 2, 2015 7:21 UTC (Thu) by ametlwn (subscriber, #10544) [Link]

Exim integration requires patching the Exim source in several places
The respective changes have been integrated upstream (see exim bug 1573) and are included in the current beta versions (4.86RC4).

Speaking of spam...

Posted Jul 2, 2015 8:01 UTC (Thu) by jezuch (subscriber, #52988) [Link] (2 responses)

Any progress on fulfilling Peter Watts' prophecy (in "Maelstrom") about putting neural networks ("smart gels") in the job of fighting spam (and malware)?

Speaking of spam...

Posted Jul 2, 2015 14:15 UTC (Thu) by nix (subscriber, #2304) [Link] (1 responses)

God, I hope not -- not if that means the *rest* of that book is prophetic too.

I like my biosphere to be not imploding.

Speaking of spam...

Posted Jul 4, 2015 14:49 UTC (Sat) by ghane (guest, #1805) [Link]

> I like my biosphere to be not imploding.

Chicken!

:-)

A look at Rspamd

Posted Jul 2, 2015 18:38 UTC (Thu) by osma (subscriber, #6912) [Link] (2 responses)

I've used SpamAssassin on a small-to-medium mail system (hundreds of users) for many years and I've been quite happy with it. But in the last two years or so, the amount of spam getting through has increased and despite some effort to fine-tune the setup, I haven't been able to improve the situation much.

I'm wondering if there's any experience about the accuracy of Rspamd vs. SpamAssassin. Of course a lot depends on the circumstances, but it should still be possible to perform a study comparing relative accuracy on a sufficiently large classified corpus of spam and non-spam. With some quick searching I couldn't find any, but maybe I didn't look hard enough.

Any experience of how well Rspamd detects spam? I'm not so concerned about server resources and such, they have never been a problem with SpamAssassin for me.

A look at Rspamd

Posted Jul 4, 2015 0:04 UTC (Sat) by HenrikH (subscriber, #31152) [Link]

The main problem is that the spammers route their spam through SpamAssassin themselves and make small changes here and there until the score is low enough and then they send it.

A look at Rspamd

Posted Jul 4, 2015 19:23 UTC (Sat) by smckay (guest, #103253) [Link]

If you own the incoming gateway, maybe you could mirror the mail to Rspamd while still using SpamAssassin and look at the cases where SpamAssassin and Rspamd disagree. This is basically what AOL Mail does to evaluate spam filter changes, but I also don't know how much work it was to set up.