Mozilla on the coming version-100 apocalypse

Posted Feb 16, 2022 19:52 UTC (Wed) by nybble41 (subscriber, #55106)
In reply to: Mozilla on the coming version-100 apocalypse by flussence
Parent article: Mozilla on the coming version-100 apocalypse

> When they provide some other way to differentiate between this week's browser and a really badly coded web scraper.

A *really* badly coded web scraper would just send a User-Agent header matching a popular web browser, in which case the header doesn't add any value. (And the more you rely on the User-Agent header to determine your response the most likely this scenario becomes, as scrapers are forced to make themselves look as much like regular browsers as possible.)

I can see a compatibility argument against removing the header entirely, but IMHO the actual agent string should be locked to a single value matching one of the popular browsers and never updated again. The same goes for JS APIs to probe the user agent. Servers and client-side code should treat all user agents equally.

Mozilla on the coming version-100 apocalypse

Posted Feb 17, 2022 7:52 UTC (Thu) by taladar (subscriber, #68407) [Link] (1 responses)

As a sysadmin I strongly disagree. The UA of bots is very rarely that of an actual, recent desktop browser. Yes, the bad bots do often copy it but they then never update it again so usually they look like severely outdated versions of the browser whose UA-String they copied. Freezing it would just serve to create that situation you think is already there.

Mozilla on the coming version-100 apocalypse

Posted Feb 17, 2022 15:31 UTC (Thu) by nybble41 (subscriber, #55106) [Link]

> The UA of bots is very rarely that of an actual, recent desktop browser. … Freezing it would just serve to create that situation you think is already there.

That isn't actually a disagreement—you're just not seeing a lot of "*really* badly coded web scrapers". I never said that *most* bots did this today. The point was just that you can't rely on a client-selected User-Agent string to filter out bots reliably. It's an easy thing to implement so long as it's not over-used, so scraper authors don't have any reason to work around it, but if identifying as a bot (or an old browser) will get a scraper blocked or throttled then correcting the problem will take a few minutes of the scraper developer's time at best. And in the meantime, for non-scrapers, we ought to be targeting web standards and not implementing workarounds for specific browsers. *That* is the point of freezing the User-Agent string: force sites to serve the same versions of their resources to everyone so that they don't break or degrade when someone comes along with a standards-compliant user agent the site simply can't identify.