|
|
Subscribe / Log in / New account

Archiving web sites

Archiving web sites

Posted Oct 4, 2018 17:58 UTC (Thu) by anarcat (subscriber, #66354)
In reply to: Archiving web sites by anarcat
Parent article: Archiving web sites

As it turns out, I couldn't stop working on this topic and opened two more PRs upstream after submitting WARC files to the internet archive:

The Pamplemousse crawl is now available on the Internet Archive, it might end up in the wayback machine at some point if the Archive curators think it is worth it.

Another example of a crawl is this archive of two Bloomberg articles which the "save page now" feature of the Internet archive wasn't able to save correctly (but webrecorder.io) could! Those pages can be seen in the web recorder player to get a better feel of how faithful a WARC file really is.


to post comments


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds