Archiving web sites
Archiving web sites
Posted Oct 4, 2018 17:58 UTC (Thu) by anarcat (subscriber, #66354)In reply to: Archiving web sites by anarcat
Parent article: Archiving web sites
As it turns out, I couldn't stop working on this topic and opened two more PRs upstream after submitting WARC files to the internet archive:
- mention collections in the
ia
documentation - fix warnings in docs builds of
ia
The Pamplemousse crawl is now available on the Internet Archive, it might end up in the wayback machine at some point if the Archive curators think it is worth it.
Another example of a crawl is this archive of two Bloomberg articles which the "save page now" feature of the Internet archive wasn't able to save correctly (but webrecorder.io) could! Those pages can be seen in the web recorder player to get a better feel of how faithful a WARC file really is.