Not logged in
Log in now
Create an account
Subscribe to LWN
LWN.net Weekly Edition for December 5, 2013
Deadline scheduling: coming soon?
LWN.net Weekly Edition for November 27, 2013
ACPI for ARM?
LWN.net Weekly Edition for November 21, 2013
Posted Oct 13, 2009 20:18 UTC (Tue) by nybble41 (subscriber, #55106)
Luckily I had over 200GB of free space at the time, as the format of the official archives precluded extraction of individual files on demand. Once I got it into SquashFS form, however, updates were trivially accomplished via FUSE-based union overlays.
Posted Oct 13, 2009 21:26 UTC (Tue) by popey (subscriber, #53979)
Posted Oct 13, 2009 21:45 UTC (Tue) by nybble41 (subscriber, #55106)
Wikipedia is *huge*; just the raw English articles in pure HTML really do take up some 200GB in uncompressed form. I actually had to create a loopback filesystem image to hold it, as my normal root filesystem, created with the default settings, didn't even have enough inodes for that many files.
Posted Oct 13, 2009 22:36 UTC (Tue) by cjb (guest, #40354)
The technique they're using, which is also the technique we used for our offline wikipedia snapshot at OLPC, is to have a single compressed archive containing all of the content, an index from article title into block number, and a tool for uncompressing (only) a specified block number from the archive quickly.
Posted Oct 14, 2009 21:11 UTC (Wed) by nybble41 (subscriber, #55106)
Posted Oct 14, 2009 0:55 UTC (Wed) by rfunk (subscriber, #4054)
Posted Oct 14, 2009 3:55 UTC (Wed) by AJWM (guest, #15888)
Posted Oct 15, 2009 1:22 UTC (Thu) by nybble41 (subscriber, #55106)
Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds