|
|
Subscribe / Log in / New account

ArchiveTeam, Software Heritage & archive.today

ArchiveTeam, Software Heritage & archive.today

Posted Aug 27, 2025 15:59 UTC (Wed) by pabs (subscriber, #43278)
Parent article: The need to reliably preserve our community history

ArchiveTeam are a group aimed at saving web resources to web.archive.org and to archive.org items, mainly resources that are in immediate or potential future danger of being deleted, or are important in some way, but also proactively saves lots of websites that aren't in danger yet. If any LWN readers know of FOSS folks who have died, or old projects that are dying slowly, or technologies that are out of fashion or otherwise in danger, they can help preserve it. Their main IRC channel #archiveteam-bs on the hackint IRC network is the main place to mention such resources. They have ArchiveBot, for recursive crawls of individual websites. They also have a distributed archiving mechanism (DPoS) based on volunteers running a virtual machine (called a Warrior) and spreading archiving work across all the VMs. One of the DPoS projects continuously archives new content on many different sites (including LWN, and many FOSS blog aggregator/Planet sites). They also have other projects for archiving code repositories, and for archiving MediaWiki/DokuWiki wikis as re-importable data. Some members also work on archiving FOSS-adjacent things, like MoinMoin wikis, IRC logs, Trac, Bugzilla, mailing lists in general (and Mailman2 lists in particular), pastebins and dying hosting sites like TuxFamily. They also welcome developers to work on the software that these activities are based on, most of it is also under FOSS licenses. Unfortunately it looks like ArchiveBot was never set loose on Groklaw, or more of it could have been preserved.

https://wiki.archiveteam.org/
https://archive.fart.website/archivebot/viewer/?q=groklaw

Software Heritage is a group aimed at proactively archiving all source code, it has been featured on LWN before. It also needs developers to help support archiving more VCS/forge types, and folks to submit forge instances and individual repos.

https://www.softwareheritage.org/
https://lwn.net/Articles/693471/

I'd also like to mention archive.today, it is good for archiving single pages that are difficult to save otherwise, because of the use of JavaScript, are blocked by anti-scraper tech, or are behind paywalls. This is also separate to archive.org.

https://archive.today/

Back to the article topic, I think it would be great if there were a EU mirror of IA, and also a separate-to-IA EU web archiving project too.


to post comments


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds