On comment spam

By Jonathan Corbet
July 28, 2010

There are both good and bad things that come from LWN's use of its own content management system; one strong "good" point has always been our relative freedom from comment spam problems. Many comment spammers seem to rely on automated tools written for commonly-used publication platforms; these tools don't work on LWN, so spammers have to do their work by hand. That said, some readers may have noticed that spammers have been making occasional appearances here.

The biggest offender appears to be associated with a shady-looking apparel store. Even though it's shady-looking, though, we know it's a legitimate business, because the site's FAQ tells us so:

Is this a legit website? Yes.We are selling the items displayed on our website. We have sent many packages to different countries.This is James,a real Person,working for you now,not machine.Thank you.

However, we would like it to be known that even businesses as proper, upstanding, and trustworthy as this one are not welcome to post their spam on LWN. We have spent years building this site and even convincing people that it is something worth paying for. How these people might think that we would allow them to destroy it is beyond imagining. Comment spam, for us, is truly a security issue.

Our recent discovery that nearly 3,000 LWN accounts had been created from a single site known as the origin of much comment spam has also helped to focus our minds on this issue. We don't know what the intended use of all those accounts was, but we doubt it was anything good.

Thus far, we have responded to spam by deleting it immediately on discovery and blocking the accounts and site it came from. The problem appears to be growing, though, to the point that the manual deletion approach will eventually run into scalability problems. Besides, we would rather be writing useful stuff than scrubbing graffiti from the site. But options for dealing with comment spam appear to be somewhat limited.

We could, of course, moderate all comments, but that approach, too, scales poorly; it also delays and distorts conversations. Full-scale moderation is just not a business we want to get into. There are blacklists out there which identify known sources of spam, but they are far from complete. One could try content-based filtering approaches, but they have their own hazards.

What we are likely to do, in the plausible scenario that this problem persists, is to impose some sort of moderation on comments from new accounts. After a legitimate comment or two, the moderation block will be removed and comments will be posted immediately; existing accounts would not be affected. We might also automatically remove the block if a subscription is purchased - spammers have shown a surprising reluctance to support LWN, for some reason.

Nothing is decided yet, so plans could change. We'd be more than interested in any ideas that readers might have; please post them as (non-spam) comments on this article. One thing that won't change, though, is our absolute determination that we will not allow LWN to be used as a platform for the spamming of our readers.

Index entries for this article
Security	Email/Spam prevention
Security	Spam

Why not force people to pay to be able to comment?

Posted Jul 29, 2010 1:40 UTC (Thu) by dskoll (subscriber, #1630) [Link] (6 responses)

Perhaps LWN could sell two levels of subscription: For some very low amount (let's say $10/year), you get the right to comment. For the full fee, you get the right to comment and also access to subscriber-only content.

And if you spam, your account is deleted immediately and there's no refund.

Why not force people to pay to be able to comment?

Posted Jul 29, 2010 1:49 UTC (Thu) by davecb (subscriber, #1574) [Link]

You can at least congratulate yourselves that you're going at it in the right way, subsuming both the "transmission" and "reception"

One of my Smarter Colleagues[tm] is of the opinion that neither transmission nor reception filtering alone is sufficient, but that both are necessary and sufficient. I don't grok his math, but his arm-waving seems persuasive...

And yes, at the current price, I'll subscribe to be able to comment, or have a free account auto-canceled for spamming.

--dave

Why not force people to pay to be able to comment?

Posted Jul 29, 2010 2:54 UTC (Thu) by vapier (guest, #15768) [Link] (3 responses)

i think that's a terrible idea. people should be able to fully participate in the discussions here without needing to pay for the access. walled gardens like that are destined to die, not flourish.

Why not force people to pay to be able to comment?

Posted Jul 29, 2010 4:13 UTC (Thu) by jhhaller (guest, #56103) [Link] (2 responses)

To a large extent, the non-paid subscribers don't get seen in the discussion because of the week delay between release to paid vs non-paid users, at least for articles available only to paid subscribers initially. While there are a few articles which I may revisit, most are read-once.

Why not force people to pay to be able to comment?

Posted Jul 29, 2010 7:43 UTC (Thu) by dlang (guest, #313) [Link]

the solution to this is to check the http://lwn.net/Comments/unread page periodically, that way if an old post becomes active again you see the comments to it.

Why not force people to pay to be able to comment?

Posted Jul 31, 2010 19:40 UTC (Sat) by vapier (guest, #15768) [Link]

... and the only reason i'm here is because i used lwn for a while unpaid, including a few comments

Why not force people to pay to be able to comment?

Posted Jul 29, 2010 3:15 UTC (Thu) by cesarb (subscriber, #6266) [Link]

This would only reduce the value of the site (by losing insightful "guest" comments), and as an effect the number of people who have an interest in subscribing to it.

Not to mention people who do not have credit cards.

On comment spam

Posted Jul 29, 2010 1:59 UTC (Thu) by cesarb (subscriber, #6266) [Link] (3 responses)

If moderation for new accounts is chosen, you could allow users of old accounts in good standing to see the not-yet-moderated comments (with a different-colored border to make it obvious that they are currently hidden), and also allow them to approve the comment.

This would lighten a lot of the burden on the moderators, and also allow for faster response to legitimate comments (they would not have to wait for a moderator to become available; there should be a lot more legitimate users in good standing than moderators). The fast approval in particular should lessen a lot of the friction of "your comment has been held for moderation, and will be posted whenever someone on a different time zone feels like looking through the backlog instead of watching a movie".

To avoid discussions going into a black hole, posting a reply should also automatically approve the parent comment.

This idea only helps with the legitimate comments "caught in the net"; but I feel causing the least amount of friction to legitimate comments is more important for the health of the LWN community. I will braindump ideas on what to do to non-legitimate comments on another comment.

On comment spam

Posted Jul 29, 2010 2:40 UTC (Thu) by rgmoore (✭ supporter ✭, #75) [Link] (2 responses)

You might want to be careful about both of those suggestions, since they're potentially open to abuse. A spammer could set up an apparently legitimate account, post valid comments with it for long enough to get moderation privileges, and then use it to approve spamming sockpuppets. You could probably come up with some way of removing the spamming accounts in bulk, but a dedicated spammer could make it hard. And making any reply to a message in moderation count as a positive review would be problematic because there are some people who will reflexively respond negatively to spam/troll messages, which would have a perverse effect in this case.

If you want to take advantage of the existing users to find spam, it would probably be better to give paid subscribers a "report as spam" button by each message. The site software could filter the reports so operators wouldn't get multiple reports for a single message. You could even add a layer of sophistication so that messages that got multiple spam reports- or reports from known reliable reporters- would generate a higher priority message.

On comment spam

Posted Jul 29, 2010 3:09 UTC (Thu) by cesarb (subscriber, #6266) [Link]

> A spammer could set up an apparently legitimate account, post valid comments with it for long enough to get moderation privileges, and then use it to approve spamming sockpuppets.

In that case, it reverts back to what we have now: the comment appears, people report it, and some moderator removes it. The "wait for moderator approval on posts from a new account" is just an extra layer. Which is why my suggestion makes it very biased towards allowing the comment to appear.

All this is not intended for troll comments, which are a diferent sort of creature. And the people who see the spam comments and thus reply to them could be the same people who could see a "do not allow this to appear, it is spam" button, so they would not feel a need to answer (the negative reaction would be mostly directed towards obliterating the spam comment with that button).

Making the comment be approved if a child comment is posted is to avoid conversations is a way to avoid "restricted" conversations that only some people can see. If a conversation starts, it should become visible. If the original post turns out to be spam, again the moderators can remove it as they already do.

I believe the spam problem is not yet bad enough to risk damaging the community with overly agressive filtering.

Finally, I posted my "report as spam" ideas in a separate (and less organized) comment.

On comment spam

Posted Jul 29, 2010 8:17 UTC (Thu) by mjthayer (guest, #39183) [Link]

> A spammer could set up an apparently legitimate account, post valid comments with it for long enough to get moderation privileges, and then use it to approve spamming sockpuppets.

If only subscribers are allowed to approve comments (and spamming subscriptions can be cancelled without a refund) then that would quickly become expensive for the spammers. It might be a good way to support the site!

On comment spam

Posted Jul 29, 2010 2:29 UTC (Thu) by smoogen (subscriber, #97) [Link]

Hmmm I would go with paying people get to post without a capcha and people who don't pay must go through a capcha. Though for making sure that you guys have good health insurance and a house over your heads.. I would just go with you pay you comment. You don't pay.. well too bad.

Random ideas braindump

Posted Jul 29, 2010 2:31 UTC (Thu) by cesarb (subscriber, #6266) [Link] (2 responses)

A first thing could be to add a "report spam" button on recent comments (for instance, written on the last few hours) to subscribers. This could put the comment on a "to review" queue, which the moderators could then look at. Most of the content of this queue would be spam (and for people who persistently report non-spam as spam, you could disable this feature for them, to reduce the noise in that queue).

Another option would be to allow subscribers with old enough accounts (something like members of the site for X years who posted at least Y comments) to directly hide spam comments (again they would go in a queue for a moderator to review). A related option would be to allow for these subscribers to vote for hiding the comment; after a number of votes it is hidden (the number of votes would have to be hidden for non-moderators to prevent abuse).

You could go even crazier and do a bit of "meta-moderation" (a bit like on slashdot): allow people to see the queue of comments flagged as spam, and allow people to mark whether the flagging was correct or not. You could also allow people to mark comments as not spam (but this could be abused by the spammers themselves).

And you could even go the full crazy, and start computing trust metrics on each poster, IP address, and network, based on past history (even things like "user X has been often flagged as non-spam by user Y, which has a high trust metric, so we also increase the trust metric for user X"). You could adapt naive Bayesian filters from email anti-spam solutions (not to block comments, but just to flag them for moderation). You could combine both and use trust metrics as a feedback to the Bayesian filter. You could show the "score" from the filter as a border color for the comments, updating them based on subscriber feedback. And so on and on.

The common thread of all these ideas is to use the community itself to help with the filtering, and do so in an unobtrusive way (if I am reading a thread and see a spam comment, it is natural to reach for the "kill it with fire" button, especially if there is one nearby). It might use only a few seconds for each subscriber, and in a natural way since we are already reading the comment thread (and not having to go out of our way to use a separate system only to fight spam). A "recent comments" feature (if there isn't one already) can help even more (and be a useful feature on its own).

Random ideas braindump

Posted Jul 29, 2010 3:16 UTC (Thu) by freemars (subscriber, #4235) [Link] (1 responses)

I favor letting newly registered members post comments, but paid members with enough seniority (maybe a year at the paid level) can mark comments as spam. The stuff marked as spam gets moved to a queue for a real editor to review. If the editor decides the message is legit the member who marked it as spam no longer is allowed to mark spam (but doesn't suffer in any other way).

Random ideas braindump

Posted Jul 29, 2010 3:26 UTC (Thu) by cesarb (subscriber, #6266) [Link]

> If the editor decides the message is legit the member who marked it as spam no longer is allowed to mark spam

It is best if this happens only after a few incorrect markings; after all, mouse slips happen.

I also think that what is relevant for seniority is time since the first non-spam comment from that account, not time with a paid subscription. If the user was a subscriber for a long time, but dropped to the non-paid level due to financial difficulties, plain forgetting to pay, or the account was a gift subscription and the user did not have a credit card at the time it expired, and the user subscribed again last week, that user should still be considered as having a lot of seniority.

On comment spam

Posted Jul 29, 2010 3:23 UTC (Thu) by jmranger (guest, #43784) [Link] (2 responses)

What you propose seems very reasonable to me.

As proposed by others, a community-based spam elimination (like what wiki do) seems good also, altough this opens the door to other abuses.

I know that bugs.debian.org also has a mechanism in place, but I don't know the details or whether it'd work in this context.

On comment spam

Posted Jul 30, 2010 13:38 UTC (Fri) by jzbiciak (guest, #5246) [Link] (1 responses)

Of course, it's also a question of effort vs. reward.

For example, if someone needs to set up sock-puppet accounts and post a few innocuous posts before they can spam, will they really go to the trouble to do so? At some point you have to tightly tailor your attack to a given site, and it may not be worth bothering.

A number of the proposals I saw have this property--that is, it makes successful spamming too involved to be worthwhile. There's more leverage attacking widely deployed blog software than something custom that exists only at one site.

On comment spam

Posted Jul 30, 2010 15:45 UTC (Fri) by dlang (guest, #313) [Link]

they already have to tailor their attack to lwn.

Akismet.

Posted Jul 29, 2010 5:27 UTC (Thu) by elanthis (guest, #6227) [Link] (2 responses)

Akismet works just as well for forum comments and even bug report forms as it does for blog comments. You will probably have to pay their commercial fee (which isn't too heinous at all).

I can count the number of spams that have gotten through my sites in the last year with the fingers on a single hand. I don't even require a log in at all for things like my bug reporting tool (asking people to spend 10 minutes creating an account and waiting for emails just to report a single bug is the absolute stupidest thing you can do, IMO) and I allow anonymous posting on every news article, blog posting, and most of my forums. Akismet catches pretty much all of it.

Akismet.

Posted Jul 30, 2010 22:01 UTC (Fri) by hingo (guest, #14792) [Link] (1 responses)

+1 for Akismet. I use it too and spam is not a problem. (Reliability is comparable to GMail for email.)

So is it cheaper to subscribe to LWN (to be able to spam) or to actually pay for advertising space?

Akismet.

Posted Jul 31, 2010 10:08 UTC (Sat) by johill (subscriber, #25196) [Link]

I think all of this is forgetting that those spams would never be accepted as ads to start with.

Some ideas....

Posted Jul 29, 2010 5:36 UTC (Thu) by jmorris42 (guest, #2203) [Link] (14 responses)

First one must know the enemy.

The spammer is a volume guy and is cheap. He wants to spew as many copies of his ad as possible at the lowest possible cost.

So first off, any post by a subscriber can be assumed good. After all, if the spammer is willing to pay to put his ad here he could just write sales@lwn.net and if he isn't pitching something totally bogus/illegal he could just buy a real ad. This means that while in theory a spammer could post spam with paid accounts or use paid accounts to approve posts in a moderation scenario, etc. in reality it isn't likely to happen unless the spam scene drastically changes in the future.

While they really like automated tools some do pay third world labor to manually post because humans can still get past filters better than current bots. They can grind out accounts and posts and a captcha just slows them down a little bit. So what are their weaknesses? How about leverage the fact this site caters to a very specific demo? Instead of a typical captcha use a small set of multiple choice questions that anyone who should be posting here could answer but spammers would have to go google. Or display five distro mascots/logos and require matching them to their names. That would waste a lot more of their time than a typical captcha, thus encouraging them to go somewhere they can get more bang for their buck.

And again, after one or two successful ontopic posts it can be assumed that the user is legit. Again, this is a specialty site and a lowest possible labor rate third worlder (who can't graduate to outsourced call center work or something more legit) probably isn't likely to be able to make a couple of cogent posts in a place like this just to get to spew comment spam for a few hours until the account gets closed and every post they made gets rubbed out. The return on the labor is bad.

Don't try to stop the spammers. Realize you can't. What you can do is make it too expensive for most to bother. You will always get a few who try it as they figure this out.

Now know thyself (and the users). You are short on labor but have a highly technical and generally spam hostile readership. Only new accounts need to be suspected so put a spam mallet icon beside those user's posts and let the readers bang on it. Three strikes and it is out. Two posts go out and the account goes dead and after a quick moderation double check by a staffer all posts from that account go away with a single click. Add one final feature to limit how many posts a new account can make in a day and the spam problem should be under control.

Only one problem remains... the spammers who creep in and post in old threads might survive the probationary period. So just don't allow a newly created account to post in a thread over an age threshold, perhaps a month?

Some ideas....

Posted Jul 29, 2010 7:00 UTC (Thu) by PaulWay (guest, #45600) [Link] (1 responses)

> The spammer is a volume guy and is cheap. He wants to spew as many copies
> of his ad as possible at the lowest possible cost.

I suspect that this, like many generalisations, is not always true. I suspect, for instance, that some companies would see the cost of buying a subscriber account as cheap compared to the cost of getting an ad placed legitimately on LWN. There are plenty of dumb comment spammers that are already blocked by the site mechanics, others that are smart enough to bypass those but get blocked by general moderation, and fewer still that are prepared to invest the time (and perhaps the money) for what they see as advertising to a select, high-reputation community.

In this regard I think LWN is already filtering out the vast majority of unwanted comments.

Sadly, it sounds like these people aren't selling a service that is in LWN's sphere of interest, so even regular advertising isn't an option to them...

FWIW I'm in favour of greylisting new accounts, and perhaps giving meta-moderator status to subscribers of high standing. A 'web of trust' element might work as well - put in your GPG key ID and if it's been signed by one of the editors (tested by decrypting a URL in a message from them) then you're in :-) (for example).

Have fun,

Paul

P.S. A friend runs a website that allows people to post comments, and if you don't have the cookie you have to supply the name of a particular celebrity known in that community. Maybe asking new posters for the surname of the inventor of Linux would be a good filter to check the knowledge credentials of the poster... :-)

Some ideas....

Posted Jul 29, 2010 10:22 UTC (Thu) by dunlapg (guest, #57764) [Link]

>I suspect, for instance, that some companies would see the cost of buying a
>subscriber account as cheap compared to the cost of getting an ad placed
>legitimately on LWN.

Cost of ads on LWN: 10000 views for $1. Views before a post is marked as spam (if normal users can click the "spam" button): probably 10, maybe 100 max. Say you get away with 3 before your account is canceled; that's still a whole lot less effective than just paying for advertisement.

Some ideas....

Posted Jul 29, 2010 10:32 UTC (Thu) by zmi (guest, #4829) [Link]

> Or display five distro mascots/logos and require matching them to their names.

Let's make that an intellectual game: Anybody who finds Austria on a world map is allowed to spam. You can write a hint that there are "no Kangaroos in Austria". But I guess even then not a lot of people can comment anymore ;-)

Some ideas....

Posted Jul 29, 2010 16:47 UTC (Thu) by mrshiny (guest, #4266) [Link]

I think I should point out that on the site where I work we have constant problems of spammers using stolen credit cards to buy accounts in order to send spam. So even paid membership is not necessarily an indicator of goodness.

Personally I favour user-moderation such as Slashdot or StackOverflow. Users with enough karma/reputation can perform certain actions without requiring site admin oversight. This seems to work well on those sites, but I suspect they have more traffic.

Some ideas....

Posted Jul 29, 2010 21:46 UTC (Thu) by nix (subscriber, #2304) [Link]

If you demand cogency in posts you would eliminate a number of current posters, like petegn... oh, wait. Actually that seems like a very good idea. ;}

Some ideas....

Posted Jul 30, 2010 2:10 UTC (Fri) by vonbrand (subscriber, #4458) [Link] (8 responses)

What a spammer looks for is eyes on their stuff. Posting in old threads, which few people see, is a waste of time for them.

Old threads

Posted Jul 30, 2010 13:24 UTC (Fri) by corbet (editor, #1) [Link] (7 responses)

Actually, spammers are quite happy to throw their crap into old threads. Much of the time, it seems that being seen by Google is all they actually care about.

Old threads

Posted Jul 30, 2010 13:45 UTC (Fri) by jzbiciak (guest, #5246) [Link]

Is this part of why you limited email notifications to a month, max?

Old threads

Posted Jul 30, 2010 14:43 UTC (Fri) by ortalo (guest, #4654) [Link] (4 responses)

Have you considered simply hiding comments from Google indexing then?
That's not realistic?

Old threads

Posted Jul 30, 2010 15:07 UTC (Fri) by corbet (editor, #1) [Link] (3 responses)

I guess that never really crossed our minds. Comments are content too, and some of them are very much worthwhile. I'd prefer not to hide them from the net.

That said, we do put rel=nofollow onto links in comments in some situations.

LWN quiz?

Posted Jul 30, 2010 16:56 UTC (Fri) by dmarti (subscriber, #11625) [Link]

How about just making new non-subscriber comment posters answer a few basic questions?

When would you run the "make oldconfig" command?

If a manufacturer installs Linux on mobile phones and sells them, which of the following actions does the GNU GPL require?

Which of these is _not_ a Linux filesystem?

Old threads

Posted Aug 3, 2010 23:48 UTC (Tue) by PaXTeam (guest, #24616) [Link] (1 responses)

> That said, we do put rel=nofollow onto links in comments in some situations.

what we do on the grsec forums is that for 'new' users (registered for less than X days and/or posted less than Y times) we disable the rendering of the url tag (i.e., the url is rendered as plain text, and not lost). this doesn't prevent spamming but is an annoyance for those semi-automated drive-by spammers who want to lure readers to their own sites with a click of a button. and for targeted spams it's hand-to-hand combat as usual ;).

Old threads

Posted Aug 5, 2010 11:19 UTC (Thu) by yodermk (subscriber, #3803) [Link]

I was going to suggest something like that. Actually I was thinking more like banning new users posts with URLs, but close. :)

Obviously, virtually every spam message contains a URL. Most legitimate comments do not.

Preventing those with fewer than 5 legit comments from posting messages with URLs seems like a small price to pay.

Old threads

Posted Jul 30, 2010 16:45 UTC (Fri) by james (subscriber, #1325) [Link]

It would be quite in character for spammers to send out millions of spams, each containing little more than a generic tease (hard for spam filters to filter) and a link to a LWN comment.

Regular LWN readers would not be the target of the spam, except that the spammers might hope LWN-reading sysadmins would be less likely to block lwn.net and more likely to unblock it, thinking that a LWN block was a mistake by the filtering software...

On comment spam

Posted Jul 29, 2010 9:34 UTC (Thu) by ortalo (guest, #4654) [Link] (6 responses)

First, note I second the preceeding post starting by "Know the enemy". Sounded sensible and pragmatic to me.

Maybe I am opening a can of worms, but I am tempted to suggest you too to explore also the other side: "Know your allies".
Currently, opening an account on LWN is pretty anonymous. This is nice too and some level of anonymity between readers is certainly desirable. However, maybe a stronger and better identification of the owner of an account could be set up. Honestly, I would not mind giving you more personal information about myself if it can help you spot abuses and build a more trusted environment [1].
Technically, note that I have always been admirative of PGP-like decentralized webs of trust (via cross peer to peer key signing/certification) and, well, reluctant to X.509-like hierarchical certification organizations (which probably only work for organizations which can afford the administrative burden and the associated workforce, like a public service).

Both from the technical point of view and from the users side, it comes to me now that I may just be suggesting to extend LWN in the direction of a *secure* social networking site. For blocking spam comments.
I wonder if it may be interesting on its own too ;-) or if I am simply dreaming too much...

Anyway, just my 0.02

Rodolphe

[1] As a side note, this is absolutely the first time I tell that to a web-based business. And maybe this is due to the fact that, historically, this site is also the first web-based business I trusted enough to pay for online (well, not exacly true: first payment was more like clearing an old debt). And probably also because it is certainly the only one that *never* asked me to provide personal information even if I used it regularly for more than a decade.

On comment spam

Posted Jul 29, 2010 11:22 UTC (Thu) by cesarb (subscriber, #6266) [Link] (1 responses)

Some people are just shy. Some people cannot use their real name as it would cause problems with their employers. And you could imagine several other reasons for not wanting to use their real name.

Not to mention how would one be identified: by the name on the credit card used to pay for the subscription? You have things like gift subscriptions, people who do not have credit cards yet using a friend's credit card, and so on.

Not to mention the "barrier" effect: the more hoops you have to jump through to make a comment, the greater amount of people who will give up before commenting. The less insightful comments, the less the utility of the site (the articles are very good, but the comments are also very good; I do not know how much of the value of the site comes from each, but the comments do contribute to its value).

On comment spam

Posted Jul 30, 2010 10:22 UTC (Fri) by ortalo (guest, #4654) [Link]

I know this would not be easy, and I first seconded the proposal to set up obstruction measures targetted at spammers.
I am simply saying that it would be nice to further complement these by mechanisms allowing us to increase the overall trust level between legitimate users. It seems to me some form of better identification is needed for that (not necessarily withdrawing anonymity).
Furthermore, if successfull, I wonder if these mechanisms would not be a real plus for the site. But the editors may have other more urgent work to do of course (including fighting comment spammers in the first place).

On comment spam

Posted Jul 29, 2010 21:49 UTC (Thu) by nix (subscriber, #2304) [Link] (3 responses)

Honestly, I would not mind giving you more personal information about myself if it can help you spot abuses and build a more trusted environment

You are forgetting the Second Rule of Antispam: 'spammers lie'. They'll hand you lots of personal information happily: someone else's.

(Unfortunately with the advent of organized crime, the First Rule, 'spammers are stupid', no longer appears to be so true, if it ever was. They're not stupid, just evil.)

On comment spam

Posted Jul 30, 2010 10:26 UTC (Fri) by ortalo (guest, #4654) [Link] (2 responses)

Cross-certification ala PGP sounds interesting for me to explore as a way to defeat such malicious information, while simultaneously building a network of trust.
Spammers may try to cross recommend themselves but, first it raises the bar for them; and second I suppose legitimate users would not be so easy to trick into recommending a comment spammer. (Unless someone recommends everyone without thinking - which is certainly a problem to solve too - like in key-signing.)

On comment spam

Posted Jul 30, 2010 13:58 UTC (Fri) by farnz (subscriber, #17727) [Link] (1 responses)

On that note, the Advogato trust metric is interesting. In the LWN case, you could treat the staff as the seeds, and have the staff certify all subscribers at a low level.

On comment spam

Posted Jul 30, 2010 15:18 UTC (Fri) by ortalo (guest, #4654) [Link]

Thanks for the link!

Note that I was probably thinking even further; possibly to extend to other aspects than simply the integrity of the identifier [1] - a sort of multi-dimensional trust graph.

Well, maybe I inconsciously try to reach the highest level in the "computer dreamer" metric dimension. More useful dimensions I can think of may be e.g.: kernel programmer, security tester, company X employee, non-spam commenter, etc. The kind of things that social networking sites like to display albeit weakly verified and without user control. A further difficulty is to draw a line between public facts - suitable for certification - and opinions or privacy-related facts - things you must leave out of scope...

[1] Not so simple of course, but still leaves room for improvement: see comment starting by "User Foo is certified but..." on the FAQ page of Advogato.

On comment spam

Posted Jul 29, 2010 11:57 UTC (Thu) by hjb (subscriber, #25523) [Link]

A captcha is quite useful, it can reduce the spam to a very low number. However it is not always necessary, only if the content is classified as "possible spam". This can be easily checked with a database table containing typical spam keywords. We implemented this on www.pro-linux.de.

On comment spam

Posted Jul 29, 2010 16:10 UTC (Thu) by ssam (guest, #46587) [Link] (1 responses)

i'd vote for a 'report spam' button. you have many eyes viewing pages, so you may as well take advantage. you could then put these in a queue for you to check, or trust some members,or have a threshold (if 3 people think it is spam then it probably is).

one thing to watch out for is spam on very old posts. i often come across old blog posts that have lots of comment spam, i guess because nobody is keeping an eye on them.

one good thing about LWN is that often developers of software in an article will reply to comments and questions. it would be a shame to loose that by forcing money from them.

On comment spam

Posted Jul 29, 2010 16:20 UTC (Thu) by farnz (subscriber, #17727) [Link]

I read the comments via http://lwn.net/Comments/unread, so I see comments on older posts - there is the occasional outbreak of spam, but LWN keeps on top of it.

A "Report Spam" button to flag comments on there for investigation would work well - even if (at first) it just flags comments for LWN to look at. Longer term, you might want "Report Spam" by trusted users (ones with a track record of only reporting spam that LWN agrees is spam) on guest comments to hide them until LWN can investigate - as other people have said, subscribers tend not to spam.

Captcha

Posted Jul 30, 2010 4:41 UTC (Fri) by avik (guest, #704) [Link] (1 responses)

Tried and true. But instead of typing in a couple of words scanned from a book, point commenters at a random bug from bugzilla.kernel.org and require it to be resolved before posting. This will increase both the quality of lwn.net comments and the kernel it is covering.

Captcha

Posted Jul 30, 2010 11:03 UTC (Fri) by Oddscurity (guest, #46851) [Link]

Alternatively: "Before you can post, please write a driver for [remaining piece of closed hardware, from e.g. an Android phone] and get it accepted to mainline." or "Before you can post, please persuade [company x] to release full documentation on [chipset y] without the requirement on an NDA. Bonus: Free 1 year subscription if you can get NVIDIA to release docs for all their hardware."

But seriously, I think the best way to handle this is via a 'report spam' button, as outlined in other comments.

On comment spam

Posted Aug 2, 2010 3:23 UTC (Mon) by vogelke (guest, #4271) [Link]

> One could try content-based filtering approaches, but they have their own hazards.

Two filters I'd definitely recommend are POPfile and Nilsimsa. Both use fuzzy matching to assign a probability that two documents were not independently created, and they both work very well at catching this type of crap.

I've had the same mail address for around a decade so I'm on every spam list on the planet, but POPfile keeps that down to around one or two per day out of 150-200 messages. Site: http://getpopfile.org/

The main Nilsimsa site seems to be down from a disk failure, but there's a Perl module on CPAN.

On comment spam

Posted Aug 4, 2010 16:39 UTC (Wed) by a9db0 (subscriber, #2181) [Link]

Perhaps considering a system like that which slashdot introduced, yet focused only on spam, would be apropriate. Give subscribers a 'Report as spam' button, and then let other subscribers check their work. Those who misuse the button, as evaluated by their fellow subscribers, would lose access to the button. Appeals to be handled by Jon. That would:

a) provide another benefit to subscribers
b) bring plenty of eyeballs to the spam problem
c) bring eyeballs to the reporters
d) not require hours of Jon's time, which is better spent on Grumpy Editor articles.

To flag a comment as spam should take more than one report (low percentage of subscriber views, perhaps?) and revocation of privleges should be based on bad reports over a specific timeframe.

The rules for using the button would be simple: Use to report comment advertising. Use of the button to flag comments for other reasons will result in loss of privleges.

On comment spam

Posted Aug 5, 2010 11:43 UTC (Thu) by eduperez (guest, #11232) [Link]

As the admin at a low-traffic forum, my sincere congratulations to the people who have to cope with spam here.

We tried almost any known solution to stop spam in our forum; each one provided some "resting time", until spammers found a workaround and began spamming us again. At the end, the only solution that has worked for us is as simple as limiting the number of links that can be included in each post. It is a little nuisance for users, but most of them do not need to include ten links on each post; however, spammers soon lost all interest in our forum, once it became useless to them.