Basic cra
Basic cra
Posted Feb 15, 2025 17:15 UTC (Sat) by Tobu (subscriber, #24111)In reply to: Licencing pipe-dream by kleptog
Parent article: Fighting the AI scraperbot scourge
Here's an example of an image download tool that has resisted implementing robots.txt. Their README asks for nonstandard headers to opt out. PRs as simple as defining the User-Agent were not merged either.
Posted Feb 15, 2025 18:18 UTC (Sat)
by dskoll (subscriber, #1630)
[Link]
Wow, the maintainer of img2dataset is a piece of work...
I searched my logs for the default user-agent he uses to pretend to be something else, and it only ever hits images... never any real pages. So I've blocked that user-agent. It now gets 403 Forbidden.
Basic cra