LWN: Comments on "LWN site tour 2025" https://lwn.net/Articles/1006001/ This is a special feed containing comments posted to the individual LWN article titled "LWN site tour 2025". en-us Tue, 16 Sep 2025 14:10:59 +0000 Tue, 16 Sep 2025 14:10:59 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net Comments https://lwn.net/Articles/1013165/ https://lwn.net/Articles/1013165/ Fowl <div class="FormattedComment"> ... and I can now see this has been implemented! Thanks!<br> </div> Thu, 06 Mar 2025 09:34:15 +0000 Comments https://lwn.net/Articles/1009652/ https://lwn.net/Articles/1009652/ Fowl <div class="FormattedComment"> I agree that comment thread collapsing is very useful! <br> <p> One relatively minor enhancement I wish for is the ability to collapse (all the comments under) entire articles on the unread comments page. Occasionally the comments for an article get... numerous. <br> </div> Fri, 14 Feb 2025 03:32:42 +0000 No JS is a big plus https://lwn.net/Articles/1009304/ https://lwn.net/Articles/1009304/ PeeWee <div class="FormattedComment"> As a new subscriber I have to say that I like your policy on the JavaScript front. And it is also very refreshing to see this kind of "old school", no nonsense design. It is very functional, to the point and focused on what actually matters: the content. As the saying goes, if you are too concerned with optics you are trying to distract from the (lack of) content.<br> </div> Wed, 12 Feb 2025 22:40:45 +0000 Comments https://lwn.net/Articles/1008696/ https://lwn.net/Articles/1008696/ lamawithonel <div class="FormattedComment"> I've really liked the thread collapse feature you added last year. I'm hopping that's another area of focus, comments, but I'm happy if it's slow. I like the conservative approach you've taken over the years. Thanks for all the efforts in both directions, improvements and holding fast!<br> </div> Mon, 10 Feb 2025 02:49:25 +0000 Nonsense requests from content-gobbling site scrapers https://lwn.net/Articles/1008622/ https://lwn.net/Articles/1008622/ mb <div class="FormattedComment"> For me most of the bot load comes from bots "indexing" the cgit web interface. There's basically an infinite amount of data in there, so it never ends. Even though cgit is completely blocked in robots.txt.<br> <p> Currently something that identifies itself as "openai.com/gptbot" ignores my Crawl-delay and also the cgit blocking. Therefore I'm forced to block it completely either on IP level or via User-Agent.<br> It downloaded almost 40k files today.<br> </div> Sat, 08 Feb 2025 16:18:44 +0000 Nonsense requests from content-gobbling site scrapers https://lwn.net/Articles/1008621/ https://lwn.net/Articles/1008621/ apoelstra <div class="FormattedComment"> <span class="QuotedText">&gt; - No limit for link-following depth. It goes in circles without end.</span><br> <p> You can sometimes crash such bots by just having an infinitely deep directory tree. If you are serving a directory-indexed folder from nginx (or, I assume, Apache), you can do this by simply doing `mkdir do-not-follow; cd do-not-follow; ln -s .. do-not-follow`.<br> <p> And yes, on my site there are also only a small number of IPs that do this sort of thing so it's sufficient to manually block them (in fact, it's sufficient to just totally ignore them since they're not wasting a lot of resources in total :)).<br> <p> I imagine the situation is pretty bad for something like LWN with hundreds of thousands of links widely spread across mailing lists and the wider Internet.<br> </div> Sat, 08 Feb 2025 15:47:27 +0000 Strip trailing spaces https://lwn.net/Articles/1008418/ https://lwn.net/Articles/1008418/ hrw <div class="FormattedComment"> KSDB looks nice so I went to check for my commits (28 in total).<br> <p> Let search trim trailing spaces. On Android there is a space added after autocomplete so KSDB searched for "Juszkiewicz " and failed. <br> </div> Fri, 07 Feb 2025 05:55:59 +0000 Nonsense requests from content-gobbling site scrapers https://lwn.net/Articles/1008406/ https://lwn.net/Articles/1008406/ ejr <div class="FormattedComment"> Y'all rock.<br> <p> And, yes, the epub version definitely works via free software on a PineNote. ;) The site does as well when on a network, also via free software as much as any of these can be.<br> </div> Fri, 07 Feb 2025 00:44:22 +0000 Nonsense requests from content-gobbling site scrapers https://lwn.net/Articles/1008381/ https://lwn.net/Articles/1008381/ corbet Just to be clear: I know of no instances where ordinary, human readers have been mistaken for bots, we do go out of our way to avoid that. Thu, 06 Feb 2025 20:35:48 +0000 Nonsense requests from content-gobbling site scrapers https://lwn.net/Articles/1008380/ https://lwn.net/Articles/1008380/ daroc <div class="FormattedComment"> Oh, yes. The polite bots read our robots.txt and respect the directives there. It's not exactly great how many of them there are, but they're not causing problems for us because we set the crawl delay to something the site can handle.<br> <p> It's the bots that never look at robots.txt, and keep hitting the site even when they get a 'Rate Limit Exceeded' error that are the real problem. But — possibly because we _do_ serve 'Rate Limit Exceeded' errors with impunity, when a non-logged in user tries to load more than a handful of pages per second — we see the same bots coming from a large variety of IP addresses. Part of the challenge is that each IP address only makes a small number of requests, so we need to be able to identify them quickly.<br> <p> Incidentally — if anyone without a LWN.net account is reading this, one of the benefits of getting a free account, even without a subscription, is that the site code is less likely to classify you as a bot. Not that this should come up very often, since we try hard not to hit our human readers with false positives, but if it does for some reason you can fix it by signing in.<br> </div> Thu, 06 Feb 2025 20:31:11 +0000 Nonsense requests from content-gobbling site scrapers https://lwn.net/Articles/1008377/ https://lwn.net/Articles/1008377/ mb <div class="FormattedComment"> On my site these bots show a completely anti-social behavior that is against all bot standards:<br> <p> - No rate limiting at all.<br> - Completely ignoring robots.txt<br> - Fake User-Agent that mimics Safari browser.<br> - No limit for link-following depth. It goes in circles without end.<br> - Plus all the things you said.<br> <p> They are only a handful of source IP addresses for me, so I block them.<br> But I guess I'm just lucky with that.<br> <p> I don't know who they are and what they want to do.<br> But if I would not have taken action my machine would be completely overloaded.<br> </div> Thu, 06 Feb 2025 20:09:26 +0000 Nonsense requests from content-gobbling site scrapers https://lwn.net/Articles/1008374/ https://lwn.net/Articles/1008374/ daroc <div class="FormattedComment"> We get a surprising number of requests to our HTTP site that immediately get redirected to HTTPS; usually, you might expect this to happen approximately once per client before they pick up on the permanent redirect. In actuality, it's about 20% of requests made to the site, often by the same client multiple times in a row. Usually, they're requests for old articles from years ago — which human readers certainly read too, but someone coming in repeatedly on port 80 for old, unpopular articles is probably a robot.<br> <p> Other "fun" behaviors include: requesting an article, and then requesting to view each comment individually, when they were just rendered on the article's page; requesting "dead" URLs, like the HTTP version of the site or old URLs that have other redirects from the site being reorganized over time; etc.<br> <p> All of these have picked up since the start of January, to the point where we've had a handful (4-5, I think) of instances where the site has had a lag spike. It's nothing that we can't deal with, necessarily, but we don't want to have to deal with it because it takes time away from all the other parts of keeping LWN.net running.<br> <p> That's not getting into the non-AI bots, which mostly try to figure out if we have any shell or SQL injection vulnerabilities by repeatedly trying to log in with nonsense usernames. Or the bots that try to do credit-card fraud and get shot down for not having a valid transaction in their request.<br> </div> Thu, 06 Feb 2025 19:27:47 +0000 Site performance https://lwn.net/Articles/1008350/ https://lwn.net/Articles/1008350/ corbet No PHP at all, happily. <p> There is quite a bit of caching built into the LWN site code. But when you have tens of thousands of AI-scraper sites all hitting the server, there is only so much that caching will help. We've put in some countermeasures that seem to have stabilized the situation for now... but this is a net-wide problem, and I would expect it to get worse. Thu, 06 Feb 2025 17:21:24 +0000 Site performance https://lwn.net/Articles/1008345/ https://lwn.net/Articles/1008345/ jengelh <div class="FormattedComment"> <span class="QuotedText">&gt;we are facing [...] site scrapers trying to feed AI models. While we look for ways to block them to preserve site performance, we are avoiding adding any counter-measures that inconvenience our human readers.</span><br> <p> Is this some kind of {overly dynamic webpages cluttered with altmode media—images, font files, huge stylesheets, you name it} problem for which LWN is too plaintext to be overly concerned with?<br> <p> I get that LWN is running *some* form of dynamic page generation somewhere, but how much ‘PHP’ is there really to worry about, and could a static page cache not also address that?<br> </div> Thu, 06 Feb 2025 17:18:23 +0000 Nonsense requests from content-gobbling site scrapers https://lwn.net/Articles/1008241/ https://lwn.net/Articles/1008241/ amw <div class="FormattedComment"> Just out of curiosity, would it be possible to elaborate a bit on "nonsense requests from content-gobbling site scrapers trying to feed AI models"? What form do these requests take?<br> </div> Thu, 06 Feb 2025 13:59:34 +0000 Kernel Index https://lwn.net/Articles/1008238/ https://lwn.net/Articles/1008238/ ncultra <div class="FormattedComment"> The Kernel Index is much appreciated!<br> <p> <a href="https://lwn.net/Kernel/Index/">https://lwn.net/Kernel/Index/</a><br> </div> Thu, 06 Feb 2025 13:18:43 +0000 RSS with content? https://lwn.net/Articles/1008162/ https://lwn.net/Articles/1008162/ legoktm <div class="FormattedComment"> In case it's useful, 404media recently rolled out full-content RSS feeds for subscribers; each URL has a ?key=... for authentication.<br> <p> <a href="https://www.404media.co/404-media-now-has-a-full-text-rss-feed/">https://www.404media.co/404-media-now-has-a-full-text-rss...</a><br> </div> Wed, 05 Feb 2025 22:42:50 +0000 RSS with content? https://lwn.net/Articles/1008135/ https://lwn.net/Articles/1008135/ corbet True, we could implement a solution along those lines. Stay tuned (but I can't promise when). Wed, 05 Feb 2025 19:51:51 +0000 RSS with content? https://lwn.net/Articles/1008126/ https://lwn.net/Articles/1008126/ Cyberax <div class="FormattedComment"> A typical solution is to create a personal link with a long random token that encodes the authentication. <br> </div> Wed, 05 Feb 2025 19:42:36 +0000 RSS with content? https://lwn.net/Articles/1008129/ https://lwn.net/Articles/1008129/ mb <div class="FormattedComment"> An API-token can be used that (if leaked) only affects the ability to read the RSS feed.<br> Such tokens can typically be generated from the user's account menu.<br> <p> In this case it would basically be a personalized and maybe re-generatable RSS URL retrieved by the user from the account menu.<br> </div> Wed, 05 Feb 2025 19:40:35 +0000 Dark mode https://lwn.net/Articles/1008125/ https://lwn.net/Articles/1008125/ npws <div class="FormattedComment"> Thanks for the hint. I used a Chrome plugin for LWN so far, but that breaks quite a few things.<br> </div> Wed, 05 Feb 2025 19:23:52 +0000 RSS with content? https://lwn.net/Articles/1008124/ https://lwn.net/Articles/1008124/ corbet That has been occasionally requested, and I've looked into it. Authentication and RSS don't really go well together, about the only way to do it in most readers seems to be to put the username and password, in plain text, in the fetch URL. That seems ... not entirely elegant. I wish there were a better solution. Wed, 05 Feb 2025 19:22:55 +0000 RSS with content? https://lwn.net/Articles/1008123/ https://lwn.net/Articles/1008123/ Cyberax <div class="FormattedComment"> Do you have an RSS feed with the contents of the articles, obviously for subscribers only?<br> <p> For now, I'm using FreshRSS capability to download the linked RSS article with the supplied cookies for auth. But it'd be nice to get the full content.<br> <p> Perhaps via a personal URL?<br> </div> Wed, 05 Feb 2025 19:19:31 +0000