LWN.net Logo

E-mail filters not fooled by signed spam (News.com)

E-mail filters not fooled by signed spam (News.com)

Posted Oct 11, 2003 21:55 UTC (Sat) by proski (subscriber, #104)
Parent article: E-mail filters not fooled by signed spam (News.com)

Openness of the filters certainly helps spammers foil them quickly. But the same openness also makes upgrades easier (the software costs nothing and can be inspected for changes), so the new filters are deployed faster.

By the end of the day, it's a good thing because it speeds up evolution of the filters. It makes filters stronger nad selects the best of them. Only those filters are effective that prevent spammers from getting their message to the user in the form in which it can generate sales.

All other heuristics is not effective. The signature doesn't affect the spammers' profits very much if at all. It's easy to include and it's doesn't distract the recipient. But if spammers cannot include large images or spell "penis" correctly to pass the filter, then it will affect sales.

I believe we should combat HTML e-mail because it's primarily used by spammers. If your friends write you in HTML, please explain them that they are helping spammers.


(Log in to post comments)

E-mail filters not fooled by signed spam (News.com)

Posted Oct 11, 2003 22:13 UTC (Sat) by arcticwolf (guest, #8341) [Link]

Actually, if you really want a good spamfilter, try bogofilter (http://bogofilter.sf.net), and train it a lot, with lots of ham and spam. I've been using it for almost a year, and I can count both the number of false positives as well as the number of false negatives I had in the last three months on one hand each, while getting around 40 real mails and 70 pieces of spam each day.

No problem with people sending html mail, either (even those who send *only* html and no plain text), no need for kludges like "if it contains words like penis, it's (probably) spam", it also sorts out worm emails and the like, and it trains itself while categorizing mail (so I only have to interact when it does something wrong).

I can only recommend giving it a try. It needs an initial training period to give good results, and will take a while until it gives great results, but it's worth it.

E-mail filters not fooled by signed spam (News.com)

Posted Oct 12, 2003 17:40 UTC (Sun) by RobSeace (subscriber, #4435) [Link]

Indeed Bayesian filters are definitely the way to go... I use bogofilter at
work, and SpamBayes at home, and both work wonderfully... I see almost NO
spam at all anymore... And, the only things that have ever gotten tagged
as spam that I actually wanted to see have been spammy-looking messages from
businesses which I've ordered things from, or certain spammy-looking mailing
list posts, etc... And, after a bit of retraining, all is well even there...
(In the case of a couple mailing lists, I set up explicit pass-throughs for
the addresses in my ".procmailrc", to just skip the filtering completely,
and always let them through... But, I'm sure if I retrained enough, I
wouldn't have even needed to bother with that...) That's the only downside
to Bayesian filters: the initial training time you have to put in before
they become fully effective... But, it's definitely more than worth it...
Because, once you're over that initial hump, they just train themselves, and
you don't have to do much of anything, other than correct the very rare
mistakes it makes... The main problem I've found is people who don't save
all their legit E-mail, so they don't have a good sized corpus of legit
mail to train it on... (You can get large amounts of spam from several
sources, and all spam is pretty much alike, so no need for that to be
personalized... But, legit mail really does differ quite a bit from person
to person...) In that case, they have to just train as they go for a
while, until enough messages have been received... (Or, you CAN start them
off with an initial database trained from someone else's legit E-mail,
which I've found does work relatively ok, for the most part... But, it
definitely requires a bit of tweaking work, and is far more likely to lead
to some false-positives at least early on... But, it's probably better
than starting from scratch, at least from the user's perspective, as it'll
wipe out most of the spam, right from the start...)

E-mail filters not fooled by signed spam (News.com)

Posted Oct 16, 2003 4:59 UTC (Thu) by arcticwolf (guest, #8341) [Link]

You're right, the initial training period needed is a bit of a problem with bayesian approaches. However, I think it hardly can be avoided; the reason why a bayesian filter actually works well, after all, is that it learns to distinguish between what the *user* considers spam and what he/she/shi considers legitimate email. And that - obviously - means that pretraining is not possible; if you started distributing tools like bogofilter, for example, with premade token databases, then you'd just create another weak link in the chain that spammers could attack, similar to SpamAssassin rules etc.

Getting an initial database from a friend might work; however, I, personally, would be reluctant to give anyone my token databases. Maybe it's just paranoia, but I prefur to keep them just as "secret" as my email.

It might be an idea, maybe, to use a distributed token database instead of per-user ones (P2P-based?), but I personally do not think this would work: it not only would allow spammers to pollute the database, it would also take away the individuality of users' databases that actually makes the bayesian filtering approach more effective.

The best way to train a bayesian filter is probably to just grit one's teeth and do it the hard way - put up with the spam and manually classify it until the filter starts working reasonably well, or - if you get too much spam to do this - use a tool like SpamAssassin to create an initial token database.

As far as setting up procmail rules to bypass filtering for messages you know will get misclassified is concerned - that works, of course, but the more elegant approach is still to train the filter, and I am happy to be able to say that it has worked in my case, too. I have one friend whose emails were notorious for being classified as spam; since he rarely ever sends one, the filter didn't get much exposure to them, either, so it wouldn't learn much about them, but by now, it classifies them correctly and leaves me with no known false positives.

It's amazing, really.

E-mail filters not fooled by signed spam (News.com)

Posted Oct 12, 2003 19:32 UTC (Sun) by nix (subscriber, #2304) [Link]

Er, SA's Bayesian algorithm is (an enhancement of) bogofilter's.

I think that any single-method attack is likely to fail; to catch things you really need every method you can find: so body-content heuristics and statistical methods and network checks and header analysis combined will be stronger than any one on its own.

(The immune system uses the same approach; strength in depth.)

E-mail filters not fooled by signed spam (News.com)

Posted Oct 16, 2003 5:14 UTC (Thu) by arcticwolf (guest, #8341) [Link]

Actually, I think that body-content heuristics and header analysis can be viewed as being included in statistical analysis, at least as far as bayesian filtering is concerned. Outside of that, I agree that having both depth and breadth in your approach to spam is a good thing; but for now, bayesian filtering (as implemented by bogofilter - I don't have experience with other tools) seems to do the job so well that there's no need to worry, and with the filter training itself automatically as it classifies messages, only requiring user interaction for false positives or negatives, it seems that there is little that spammers can do, either.

In fact, more or less the only approach I can think of right now would be to change spam characteristics so drastically that the (bayesian) filters wouldn't catch them anymore; however, this would require not only a concerted action in which most spammers participate (otherwise, only a few pieces of spam would get through), it would also be effective only for a very short amount of time, until the filters' token databases have been updated.

What else could a spammer do? Try to make messages look as much as legitimate email as possible, I assume, but then again, this likely won't be effective - spam is, after all, ultimately about advertising, and a message that does not advertise products anymore in any way does not justify being sent. The filters *will* catch on, and the fact that they are completely dynamical in generation (no static rules) and specific to each user means you can't just attack them.

Or at least that's what common sense tells me. Maybe the future will show that there is a fundamental flaw not only in the existing tools, but in the bayesian approach in general, but I can't see it right now; and even if there is, a better technique will follow. Ultimately, the war against spam can only be won.

(and I probably shouldn't post comments this early in the morning - or, rather, this late at night -; I seem to get a bit overdramatic. oh well.)

HTML mail...

Posted Oct 13, 2003 10:50 UTC (Mon) by eru (subscriber, #2753) [Link]

I believe we should combat HTML e-mail because it's primarily used by spammers. If your friends write you in HTML, please explain them that they are helping spammers.

I agree HTML mail is a bad idea, but getting rid of it is probably hopeless now, thanks to feature-happy e-mail client implementors. In too many mail clients it is the default, sometimes so that the default cannot be easily changed. Especially by technically unsophisticated friends or relatives... And I am not talking only about Outlook in its various incarnations. Some open-source mailers have the same flaw. For example, I have yet to find out how to tell Mozilla 1.0 "no, I don't ever want to send HTML mail to anyone, and if someone sends me one, I want to reply with plain text". Now it seems I have to use "options->format..." every time.

HTML mail...

Posted Oct 13, 2003 15:30 UTC (Mon) by kfiles (subscriber, #11628) [Link]

In addition to setting mozilla's send format to convert to plaintext, I use the following prefs. This combo gets rid of pretty much every case of rich-formatted email, and makes the result as close to Mutt+Emacs as possible.

user_pref("mail.quoted_graphical", false);
// To get rid of the sending window
user_pref("mailnews.show_send_progress", false);
// Change the reply header
// 0 - No Reply-Text
// 1 - <Author> wrote:   - Netscape 3.xx/4.xx style
// 2 - On <date> <author> wrote:
// 3 - user-defined string. Use the prefs below in conjuction with this.
user_pref("mailnews.reply_header_type", 3);
// If you set 3 for the pref above then you may set the following prefs.
// The end result will be <authorwrote><separator><ondate><colon>
user_pref("mailnews.reply_header_authorwrote", "%s wrote");
user_pref("mailnews.reply_header_ondate", "on %s");
user_pref("mailnews.reply_header_separator", " ");
user_pref("mailnews.reply_header_colon", ":");
// This should change attached image and text files from inline to attachment.
user_pref("mail.content_disposition_type", 1);
// To change the color of the quote bar
// Replace #0000A0 with the colour of your choice.
user_pref("mail.citation_color", "#0000A0");
// Format=flowed prefs, RFC 2646
pref("mailnews.send_plaintext_flowed", false);
user_pref("mailnews.display.disable_format_flowed_support", false);
pref("mail.display_struct", true);
pref("mail.send_struct", false);

--kirby

HTML mail...

Posted Oct 13, 2003 15:37 UTC (Mon) by proski (subscriber, #104) [Link]

If you are using Mozilla Mail, perhaps you should try Thunderbird. In Thunderbird, you select Tools->Account Settings->Composition & Addressing->Compose Messages in HTML Format. It should be similar in Mozilla.

As for the default, see bug #115439. I'm sure there are other bugs filed for this issue, it's just the first one I could find.

HTML based mail

Posted Oct 13, 2003 12:57 UTC (Mon) by Duncan (guest, #6647) [Link]

> I believe we should combat HTML e-mail because it's primarily used
> by spammers. If your friends write you in HTML, please explain
> them that they are helping spammers.

Add crackers to that list as well. Anyone using HTML mail is helping spammers
AND CRACKERS. When this comes up, I simply ask folks to consider how many
exploits OE and Outlook proper have had, and how many they WOULD have had,
if they'd stuck to plain text. That should persuade virtually ANYONE (with any
tech knowledge, or who knows how to look it up, anyway).

My top priority rule deletes HTML formatted mail, whether or not it includes a
plain text version also. Yes, that's prioritized ABOVE the whitelist rules, even if
I'm not affected by HTML vulns, I wouldn't accept an offer to shake hands if I'd
seen someone slime their hand with snot before they offered it to me, even if I was
wearing gloves to protect myself, and I'm not going to accept HTML mail,
regardless of my client's vulnerability to it, for the same reason. If they want to
shake hands, they can learn not to be disrespectful of my health in the process. If
they want to exchange messages, they need to respect the health of my computer as
well. To do otherwise is simply rude, and they can just go be rude to someone else.

Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds