LWN.net Logo

Updating the kernel's web links

By Jake Edge
September 22, 2010

Justin P. Mattock has set out a rather large clean-up task for himself: updating all of the web links in the kernel comments. As might be guessed, many of those links have link-rotted over the last ten or more years, so Mattock is trying to update the kernel to point to the proper place—if it can be found. That effort resulted in a monster patch that covered all of the references to "http" that he could find.

Many of the new links pointed off to archive.org as the only location that Mattock was able to find, but that caused a formatting problem. When adding those, he used links like:

    http://web.archive.org/web/*/http://oldsite/oldlink
which shows all of the different versions of pages that the Wayback Machine has stored. Putting "*/" into a C-language comment is not a good plan, however, as Matt Turner pointed out. The proper solution is to use "%2A" as that is the HTML entity for "*". But there is a bigger issue with those archive links.

Finn Thain suggested that any of those links could just be left alone and that people should already know about archive.org, so adding it to the old links is just "bloat". Furthermore, there is a question of which version of the stored page is the one that the original comment referred to. Basically, Thain's point was that web pages which are maintained and updated are likely to be more useful, and that those who want to refer to pages that have dropped off the net should know (or learn) how to go about it.

Eventually Mattock split the patch into two parts, one that updated links to newer locations and the other which added the archive.org links for lost sites. He is soliciting more feedback on whether to include the archive links or not.

It is not clear, so far at least, whether these changes will be accepted. It is, in a sense, churn, and likely to lead to more churn down the road as link-rot is an endemic web problem. It is probably frustrating for developers and others to come across broken links in the kernel code, but is it worth the never-ending—hopefully fairly infrequent—stream of update patches? There are undoubtedly copyright, logistical, and other issues, but it would certainly be a lot nicer if these documents could be permanently stored in some location at kernel.org.


(Log in to post comments)

Updating the kernel's web links

Posted Sep 23, 2010 9:04 UTC (Thu) by rswarbrick (subscriber, #47560) [Link]

Quote:
...but it would certainly be a lot nicer if these documents could be permanently stored in some location at kernel.org.

Isn't that exactly what linking to an archive.org address would acheive?

Updating the kernel's web links

Posted Sep 23, 2010 9:14 UTC (Thu) by johill (subscriber, #25196) [Link]

archive.org documents can still be removed via a robots.txt file on the domain name, which may now belong to somebody else: http://www.archive.org/about/exclude.php

Updating the kernel's web links

Posted Sep 23, 2010 17:49 UTC (Thu) by intgr (subscriber, #39733) [Link]

> "%2A" as that is the HTML entity for "*".

Technically it's URL encoding, not an HTML entity.
Not that it makes any difference, just nitpicking here. :)

keywords

Posted Sep 24, 2010 8:18 UTC (Fri) by marcH (subscriber, #57642) [Link]

For most documents keywords offer a much more permanent alternative than URLs. Example:
http://www.google.com/search?q=simtec+StrongARM+110+Evalu...
(replace google with your favourite engine)

Sometimes it works even for documents that are not available any more:
http://www.google.com/search?q=AXPpci+33+Information+for+...

I think URLs and keywords should both be used side to side.

Copyright © 2010, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds