|
|
Subscribe / Log in / New account

Archiving web sites

Archiving web sites

Posted Sep 25, 2018 14:48 UTC (Tue) by anarcat (subscriber, #66354)
Parent article: Archiving web sites

As usual, here's the list of issues and patches generated while researching this article:

I also want to personally thank the folks in the #archivebot channel for their assistance and letting me play with their toys.


to post comments

Archiving web sites

Posted Oct 4, 2018 17:58 UTC (Thu) by anarcat (subscriber, #66354) [Link]

As it turns out, I couldn't stop working on this topic and opened two more PRs upstream after submitting WARC files to the internet archive:

The Pamplemousse crawl is now available on the Internet Archive, it might end up in the wayback machine at some point if the Archive curators think it is worth it.

Another example of a crawl is this archive of two Bloomberg articles which the "save page now" feature of the Internet archive wasn't able to save correctly (but webrecorder.io) could! Those pages can be seen in the web recorder player to get a better feel of how faithful a WARC file really is.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds