Not logged in
Log in now
Create an account
Subscribe to LWN
LWN.net Weekly Edition for May 16, 2013
A look at the PyPy 2.0 release
PostgreSQL 9.3 beta: Federated databases and more
LWN.net Weekly Edition for May 9, 2013
(Nearly) full tickless operation in 3.10
Posted Oct 13, 2009 21:45 UTC (Tue) by nybble41 (subscriber, #55106)
Wikipedia is *huge*; just the raw English articles in pure HTML really do take up some 200GB in uncompressed form. I actually had to create a loopback filesystem image to hold it, as my normal root filesystem, created with the default settings, didn't even have enough inodes for that many files.
Posted Oct 13, 2009 22:36 UTC (Tue) by cjb (guest, #40354)
The technique they're using, which is also the technique we used for our offline wikipedia snapshot at OLPC, is to have a single compressed archive containing all of the content, an index from article title into block number, and a tool for uncompressing (only) a specified block number from the archive quickly.
Posted Oct 14, 2009 21:11 UTC (Wed) by nybble41 (subscriber, #55106)
Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds