Honeypots
Honeypots
Posted Feb 14, 2025 21:59 UTC (Fri) by malmedal (subscriber, #56172)In reply to: Honeypots by mb
Parent article: Fighting the AI scraperbot scourge
Typically because it followed a honeypot link, at that point you give it a web-page consisting of only such links.
The idea is that the bot will spread these links to other members of the botnet so subsequent bots from other IPs will be immediately recognised and get the same treatment. Hopefully, over time should direct most of the botnet over to the sacrificial server and leave the real alone.
Posted Feb 14, 2025 22:05 UTC (Fri)
by mb (subscriber, #50428)
[Link] (10 responses)
But it's already over. You served the request and you spent the resources.
Posted Feb 14, 2025 22:27 UTC (Fri)
by malmedal (subscriber, #56172)
[Link] (9 responses)
It's not. The bot will report the links it found back to the rest of the botnet and then other bots will come for those links.
> multi terabyte timeout-less database
No database is needed.
Posted Feb 14, 2025 22:28 UTC (Fri)
by mb (subscriber, #50428)
[Link] (8 responses)
And consume traffic and CPU.
Posted Feb 14, 2025 23:01 UTC (Fri)
by malmedal (subscriber, #56172)
[Link] (7 responses)
From the sacrificial server, yes. So the real one gets less load.
Posted Feb 14, 2025 23:15 UTC (Fri)
by mb (subscriber, #50428)
[Link] (6 responses)
Which costs real non sacrificial money. Why would it cost less money than the "real" server?
This is a real problem.
This is a real threat to users. I am currently selecting b), because I think I can't win a).
Posted Feb 15, 2025 0:32 UTC (Sat)
by dskoll (subscriber, #1630)
[Link] (5 responses)
The sacrificial server can be less beefy than the real server because it doesn't have to generate real content that might involve DB lookups and such. And it can dribble out responses very slowly (like 10 bytes per second) to keep the bots connected but not tie up a whole lot of bandwidth, using something like this.
Posted Feb 15, 2025 0:49 UTC (Sat)
by mb (subscriber, #50428)
[Link] (4 responses)
Posted Feb 15, 2025 1:58 UTC (Sat)
by dskoll (subscriber, #1630)
[Link]
Yes, sure, but you might be able to tie some of them up in the tar pit for a while. Ultimately, a site cannot defend against a DDOS on its own; it has to rely on its upstream provider(s) to do their part.
My reply was for the OP who asked how the sacrificial server could be run more cheaply than the real server.
Posted Feb 15, 2025 10:28 UTC (Sat)
by malmedal (subscriber, #56172)
[Link] (2 responses)
LWN does not want to do things like captcha, js-challenges or putting everything behind a login, can you think of a better approach while adhering to the stated constraints?
Posted Feb 15, 2025 10:35 UTC (Sat)
by mb (subscriber, #50428)
[Link] (1 responses)
No. That was my original point.
Posted Feb 15, 2025 10:53 UTC (Sat)
by malmedal (subscriber, #56172)
[Link]
Honeypots
That is the problem.
The CPU/traffic load already happened once you identify the bot. And then it will basically never hit again, unless you keep a multi terabyte timeout-less database with the risk of putting your users into the terabyte ban database.
Honeypots
Honeypots
Lost.
Honeypots
Honeypots
It is a real problem for my machines, too.
And I really don't see a solution that is not
a) buy more resources or
b) potentially punish real users
Honeypots (and tarpits, oh my!)
Honeypots (and tarpits, oh my!)
Bot administrators are not stupid. Bots are optimized for maximal throughput, no matter what.
Honeypots (and tarpits, oh my!)
Honeypots (and tarpits, oh my!)
Yes, obviously. That's why I called this a "mitigation", not a "cure".
Honeypots (and tarpits, oh my!)
Honeypots (and tarpits, oh my!)