LWN.net Logo

Wait, what?

Wait, what?

Posted Oct 13, 2009 18:51 UTC (Tue) by aorth (subscriber, #55260)
Parent article: WikiReader: OpenMoko's "Project B"

The website says "No Internet connection needed." So it has a dump of
Wikipedia or it uses Wi-Fi/WiMax (?) like the Kindle (if I recall
correctly)?


(Log in to post comments)

Wait, what?

Posted Oct 13, 2009 18:56 UTC (Tue) by mattdm (guest, #18) [Link]

Uses a microsd card -- you can update it that way.

Wait, what?

Posted Oct 13, 2009 20:18 UTC (Tue) by nybble41 (subscriber, #55106) [Link]

I wonder how large the SD card is? I once transferred the articles from August 2008 Wikipedia archive (HTTP format) onto a 16GB SD card using SquashFS; the actual filesystem image was about 12GB. You could access it with a normal web browser, but of course none of the special pages (e.g. search) were functional. I had to optimize the mksquashfs tool to allow it to handle that many files without running out of memory; support for insanely large compressed filesystems wasn't among the developers' priorities at the time.

Luckily I had over 200GB of free space at the time, as the format of the official archives precluded extraction of individual files on demand. Once I got it into SquashFS form, however, updates were trivially accomplished via FUSE-based union overlays.

Wait, what?

Posted Oct 13, 2009 21:26 UTC (Tue) by popey (subscriber, #53979) [Link]

It's text only, no images which I suspect will bring the dump size down
considerably. Only one language per dump may also shrink it somewhat.

Wait, what?

Posted Oct 13, 2009 21:45 UTC (Tue) by nybble41 (subscriber, #55106) [Link]

Mine was also imageless and English-only. I imaging this device just stores the raw text, with perhaps some short formatting codes, which would save space over full HTML pages. (Not all that much, however, given that both versions are compressed.)

Wikipedia is *huge*; just the raw English articles in pure HTML really do take up some 200GB in uncompressed form. I actually had to create a loopback filesystem image to hold it, as my normal root filesystem, created with the default settings, didn't even have enough inodes for that many files.

Wait, what?

Posted Oct 13, 2009 22:36 UTC (Tue) by cjb (guest, #40354) [Link]

> Wikipedia is *huge*; just the raw English articles in pure HTML really do take up some 200GB in uncompressed form. I actually had to create a loopback filesystem image to hold it, as my normal root filesystem, created with the default settings, didn't even have enough inodes for that many files.

The technique they're using, which is also the technique we used for our offline wikipedia snapshot at OLPC, is to have a single compressed archive containing all of the content, an index from article title into block number, and a tool for uncompressing (only) a specified block number from the archive quickly.

Wait, what?

Posted Oct 14, 2009 21:11 UTC (Wed) by nybble41 (subscriber, #55106) [Link]

Right, I've seen it done that way. However, I wanted to be able to access the articles as separate files without first pre-processing the archive to create a block index and writing a custom FUSE adapter to extract the files on demand. SquashFS is similar to an indexed archive, except that (a) it's more structured; (b) it's a more general solution, and (c) you don't need special software to read the filesystem image, as SquashFS is available by default in recent Linux kernels (with backports available for older ones).

wikipedia size

Posted Oct 14, 2009 0:55 UTC (Wed) by rfunk (subscriber, #4054) [Link]

I used to have an offline copy of Wikipedia on my 8GB iPod Touch. No images,
English-only, and it took about 2-3 GB.

Wait, what?

Posted Oct 14, 2009 3:55 UTC (Wed) by AJWM (guest, #15888) [Link]

The website mentions a 4+ GB file if you want to download updates. (Download to your own MicroSD card and then swap them, there's no d/l capability in the WikiReader itself.)

Wait, what?

Posted Oct 15, 2009 1:22 UTC (Thu) by nybble41 (subscriber, #55106) [Link]

It looks like they used a 4GB microSD card, but only included a subset of the articles (about 3 million). Given that my stripped-down version had over twice that many (6,044,424 inodes, most of them articles), and that the full HTML almost certainly takes up more space than their custom-preprocessed pages, the difference in size is understandable. On the other hand, unless their selection criteria is way off, the other couple million articles probably won't be missed; at least half of them are probably stubs, advertisements, or niche fan-pages.

Wait, what?

Posted Oct 13, 2009 18:57 UTC (Tue) by cjb (guest, #40354) [Link]

Yes, it has a dump of wikipedia on the SD card. The Kindle used a cell modem, not wifi, and this thing doesn't have either.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds