LWN.net Logo

Advertisement

GStreamer, Embedded Linux, Android, VoD, Smooth Streaming, DRM, RTSP, HEVC, PulseAudio, OpenGL. Register now to attend.

Advertise here

OLPC's software update problem

From the outside, much of the work going on at the One Laptop Per Child project appears to be oriented toward hardware. Successive test versions of the now well-known little green computer have been produced, each with more powerful components and (presumably) fewer glitches than those which came before. Work on getting suspend/resume functioning properly - critical for the laptop to meet its power use goals - is heading toward the final stages. It looks like a nice machine.

The software side of the OLPC project is just as interesting as the hardware. The project has been occasionally criticized, though, for concentrating on hardware and being slow to get its software together. Much of this criticism is not really warranted; work on the Sugar environment has been underway for quite some time, and there are a number of interesting applications coming together for this platform. In an area or two, however, it does seem like problems are being addressed a little later than might have been optimal.

One of those areas, as evidenced by a series of discussions on the project's mailing list, is the issue of software updates. The OLPC project plans to deploy millions of laptops into environments where skilled system administrators are scarce. It seems certain that, sooner or later, there will be a need to update the software installed on those systems - perhaps urgently. It is reasonable to expect that the children using these laptops might just not be entirely diligent in checking for and installing updates. So something with a relatively high degree of automation will be required.

There are some additional complications which must be taken into account. The OLPC project has decided to dispense with Linux-style package managers in favor of a whole-image approach. OLPC has the resources to fund some fairly strong servers and network bandwidth, but putting together the resources which can handle pushing an update to millions of laptops at the same time might still be a challenge. In fact, simply coping with update-availability queries from that many laptops would require significant resources. So how will OLPC handle software updates? It turns out that they still don't really know.

Discussions started when Alexander Larsson showed up on the list with an announcement that he was working on the software update task. His proposal was an interesting combination of tools. In this scheme, a system image looks a lot like a git repository; it contains a "manifest" which (like a git index) has a list of files associated with SHA1 hashes of their contents. Updating a system involves getting a new manifest, seeing which files have changed, grabbing their contents, and dropping them in place. The actual safe updating of the system image is done by way of the Bitfrost security model which was announced last February.

Alex's proposal uses the Avahi resource discovery protocol to find updates. Once one system on a given network (often the school server) obtains a copy of the update, it advertises it via Avahi. All laptops on the network can then notice the availability of the update and apply it. Once a laptop has the update, it, too, can make that update available over the mesh network, facilitating the distribution of the update to all systems on the net.

Ivan Krstić, the author of Bitfrost, has a different approach. It starts by taking advantage of one of the OLPC's more controversial features: the phone-home protocol. Laptops have to make regular contact with special servers to check whether they have been stolen; laptops which have been reported stolen can be shut down hard by the anti-theft server. Ivan's update proposal has the laptops checking for software updates while doing the "am I stolen?" check; the servers will be able to reply that the laptop remains with its owner, but that it is running old software and should update.

If the laptop needs an update, it will attempt to obtain the necessary files (using rsync) from the school server. If these attempts fail for a day or so, the laptop will eventually fall back to an "upstream master server" for the update files. The use of rsync allows updates to be transferred in a relatively bandwidth-friendly manner. Only changed parts of changed files need be transmitted over the net. It also has the advantage of being a known quantity; there is no doubt that rsync can be made to work in this setting. There is some concern that rsync tends to be resource intensive on the server side, meaning that those upstream master servers would probably have to be relatively powerful systems. If all goes well, though, the load on those servers would be mitigated by distributing updates through the school servers and staggering updates over time.

Ivan's proposal has also been criticized because it requires the use of central servers rather than distributing updates through the mesh network. He responds:

It requires a server because I think it's outrageous to consider spending engineering time on inventing secure peer-to-peer OS upgrades, never before done in a mainstream system, over a network stack never before used in a mainstream system, two months before we ship.

As an aside, this conversation also brought out some serious unhappiness about the use of Linux-VServer in Bitfrost. The (seemingly permanent) out-of-tree status of Linux-VServer makes it harder to support over the long term; it seems that the project may well move to a different solution once it has shipped its first set of systems.

Back on the update front, yet another proposal was posted by C. Scott Ananian. In this scheme, each laptop will occasionally poll a master server to see if an update is available; this poll might take the form of a DNS lookup. The more systems there are on the local network, the less frequently these polls will happen.

If a laptop discovers that an update is available, it will start pulling it down from the master server. This update will be divided into a number of small chunks, each of which is independently checksummed and signed. As those chunks come in, the receiving laptop will send them out to a multicast address on the local mesh; all other laptops in the area should then see it and grab a copy as it goes by. Once all of the required pieces have been received, the update can be applied. If a laptop misses a segment as it goes by, it will eventually time out and start actively grabbing (and rebroadcasting) pieces itself.

Which approach will be adopted is not clear; if the project has decided on a proposal (or a combination of them), that decision has not been posted on a public list. Time is tight, though, and a rock-solid solution will have to be in place before the first production systems ship. It is, after all, risky to count on being able to fix the remote update system (remotely) after the fact.

For a more general view of the state of OLPC software, a look at this message from Walter Bender (the OLPC president for software and content). A lot is happening, but a number of desired features (including the famous "view source" key) will not be functioning when the first systems ship. The OLPC software, he says, is a work in progress - much like the rest of our software. The "progress" part is clearly happening, though, and OLPC appears to be on course to deliver a system which will bring computing power and network connectivity to millions of children - and which will change our views of how that should be done.


(Log in to post comments)

OLPC's software update problem

Posted Jul 5, 2007 11:09 UTC (Thu) by smurf (subscriber, #17840) [Link]

Let's just hope that the last sentence of this contribution actually will come true. Given the number idiots who want to install Windows on the machine (sorry, but I want the kids to learn real-world stuff, not Computer Frustration), and the number of other idiots who want Intel's ClassMate PCs instead (you couldn't build a machine more unsuitable to rural third-world countries if you tried), I'm still not entirely convinced that OLPC will succeed. :-(

OLPC's software update problem

Posted Jul 5, 2007 12:09 UTC (Thu) by smitty_one_each (subscriber, #28989) [Link]

Heuristic problem.
Even if round #1 doesn't do so well (and there is still plenty of room for success), take heart: it shall have at least paved the way for the next round.
Hats off to those willing to undertake effort sans certain success.

OLPC's software update problem

Posted Jul 5, 2007 13:41 UTC (Thu) by aglet (guest, #1334) [Link]

That last multicast update distribution mechanism thing sounds awfully like a weird implementation of bittorrent to me.

OLPC's software update problem

Posted Jul 7, 2007 12:01 UTC (Sat) by csamuel (✭ supporter ✭, #2624) [Link]

I was thinking more like Flamethrower, its Freshmeat entry says:

Flamethrower is a multicast file distribution system. It was originally created to add multicast install capabilities to SystemImager, but is designed as a stand-alone package. It works with entire directory heirarchies, rather than single files. It uses a server configuration file, which takes module entries similar to rsyncd.conf. It is an on-demand system; multicast of a module is initiated when a client connects, but it waits a predetermined period for other clients to connect before beginning.

Sounds rather handy to me..

If at first you don't ...

Posted Jul 5, 2007 19:51 UTC (Thu) by filker0 (guest, #31278) [Link]

If the first OLPC software release contains a robust but resource intensive update distribution scheme (ie, the rsync based proposal) that does not scale well as the deployment grows, continued development on the mesh based scheme ought to continue full speed, and sent out to the OLPCs as updates once it's developed and tested. The two methods would have to co-exist on the servers for a while, but the load from the older method would decrease even as the number of deployed OLPC units goes up.

Someone (in another comment to a comment) compared one of the proposals outlined at the end of the article to bit torrent. On the surface, it is similar, however an OLPC will send the update packet to all of its peers, not just the ones that ask for it, and peers receiving the update packet passively need not acknowlege or retransmit the packet (except in the sense required by the mesh). A single transmitted packet might reach many peers, as it's a multicast; bit-torrent uses unicast, and puts a higher load on the network bandwidth for multiple (>2) peers.

If at first you don't ...

Posted Jul 6, 2007 13:39 UTC (Fri) by lamikr (guest, #2289) [Link]

There are however couple of issues to think about in situation where the system is automatically accepting packages from the other machines in the network.
- how to make sure that the system is able to save all packages advertized.
(for example what happens if new packages will require 150 mb in size even without extracting them and one has only left about 100 mb)
- how to prevent someone to make malicious local laptop which advertizes some bad applications for other local OLPC laptops. Should every single chunk of the apps be signed, if yes, then the verification of those chunks will also require some cpu...
- some kind of cleaner is needed to make sure that old "uploads" are removed from wasting space, once there is already newer version coming or installed.
(situation where you have received 30 % from glib 2.18.14 when that is updated again to something like glib 2.18.15)

I know those are all doable, but making them 100 % bullet proof is not easy task.

Mika

OLPC's software update problem

Posted Jul 16, 2007 12:04 UTC (Mon) by KazushiSakuraba (guest, #46282) [Link]

"...putting together the resources which can handle pushing an update to millions of laptops at the same time might still be a challenge."

This is getting ridiculous. One of the main problems in the poor areas where the OLPC will be used is that there is NO COMMUNICATION INFRASTRUCTURE WHATSOEVER: no telephone, no satellital communication, no cell phones, NOTHING. So millions of laptops running in poor areas from third world countries will not try to contact a cluster of high-performance servers running at MIT.

The OLPC advocates need a reality check.

Copyright © 2007, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds