|
|
Subscribe / Log in / New account

Basic cra

Basic cra

Posted Feb 15, 2025 17:15 UTC (Sat) by Tobu (subscriber, #24111)
In reply to: Licencing pipe-dream by kleptog
Parent article: Fighting the AI scraperbot scourge

Here's an example of an image download tool that has resisted implementing robots.txt. Their README asks for nonstandard headers to opt out. PRs as simple as defining the User-Agent were not merged either.


to post comments

Basic cra

Posted Feb 15, 2025 18:18 UTC (Sat) by dskoll (subscriber, #1630) [Link]

Wow, the maintainer of img2dataset is a piece of work...

I searched my logs for the default user-agent he uses to pretend to be something else, and it only ever hits images... never any real pages. So I've blocked that user-agent. It now gets 403 Forbidden.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds