OLPC's software update problem
[Posted July 3, 2007 by corbet]
From the outside, much of the work going on at the One Laptop Per Child
project appears to be oriented toward hardware. Successive test versions
of the now well-known little green computer have been produced, each with
more powerful components and (presumably) fewer glitches than those which
came before. Work on getting suspend/resume functioning properly -
critical for the laptop to meet its power use goals - is heading toward the
final stages. It looks like a nice machine.
The software side of the OLPC project is just as interesting as the
hardware. The project has been occasionally criticized, though, for
concentrating on hardware and being
slow to get its software together. Much of this criticism is not really
warranted; work on the Sugar environment has been underway for quite some
time, and there are a number of interesting applications coming together
for this platform. In an area or two, however, it does seem like problems
are being addressed a little later than might have been optimal.
One of those areas, as evidenced by a series of discussions on the
project's mailing list, is the issue of software updates. The OLPC
project plans to deploy millions of laptops into environments where skilled
system administrators are scarce. It seems certain that, sooner or later,
there will be a need to update the software installed on those systems -
perhaps urgently. It is reasonable to expect that the children using these
laptops might just not be entirely diligent in checking for and installing
updates. So something with a relatively high degree of automation will be
required.
There are some additional complications which must be taken into account.
The OLPC project has decided to dispense with Linux-style package managers
in favor of a whole-image approach. OLPC has the resources to fund
some fairly strong servers and network bandwidth, but putting together the
resources which can handle pushing an update to millions of laptops at the
same time might still be a challenge. In fact, simply coping with
update-availability queries from that many laptops would require
significant resources. So how will OLPC handle software updates? It turns
out that they still don't really know.
Discussions started when Alexander Larsson showed up on the list with an
announcement that he was working on the software update task. His
proposal was an interesting combination of tools. In this scheme, a
system image looks a lot like a git repository; it contains a "manifest"
which (like a git index) has a list of files associated with SHA1 hashes of
their contents. Updating a system involves getting a new manifest, seeing
which files have changed, grabbing their contents, and dropping them in
place. The actual safe updating of the system image is done by way of the
Bitfrost security model which
was announced last February.
Alex's proposal uses the Avahi resource
discovery protocol to find updates. Once one system on a given network
(often the school server) obtains a copy of the update, it advertises it
via Avahi. All laptops on the network can then notice the availability of
the update and apply it. Once a laptop has the update, it, too, can make
that update available over the mesh network, facilitating the distribution
of the update to all systems on the net.
Ivan Krstić, the author of Bitfrost, has a
different approach. It starts by taking advantage of one of the OLPC's
more controversial features: the phone-home protocol. Laptops have to make
regular contact with special servers to check whether they have been
stolen; laptops which have been reported stolen can be shut down hard by
the anti-theft server. Ivan's update proposal has the laptops checking for
software updates while doing the "am I stolen?" check; the servers will be
able to reply that the laptop remains with its owner, but that it is running
old software and should update.
If the laptop needs an update, it will attempt to obtain the necessary
files (using rsync) from the school server. If these attempts fail for a
day or so, the laptop will eventually fall back to an "upstream master
server" for the update files.
The use of rsync allows updates to be transferred in a relatively
bandwidth-friendly manner. Only changed parts of changed files need be
transmitted over the net. It also has the advantage of being a known
quantity; there is no doubt that rsync can be made to work in this
setting. There is some concern that rsync tends to be resource intensive
on the server side, meaning that those upstream master servers would
probably have to be relatively powerful systems. If all goes well, though,
the load on those servers would be mitigated by distributing updates
through the school servers and staggering updates over time.
Ivan's proposal has also been criticized because it requires the use of
central servers rather than distributing updates through the mesh network.
He responds:
It requires a server because I think it's outrageous to consider
spending engineering time on inventing secure peer-to-peer OS
upgrades, never before done in a mainstream system, over a network
stack never before used in a mainstream system, two months before
we ship.
As an aside, this conversation also brought out some serious unhappiness
about the use of Linux-VServer
in Bitfrost. The (seemingly permanent) out-of-tree status of
Linux-VServer makes it harder to support over the long term; it seems that
the project may well move to a different solution once it has shipped its
first set of systems.
Back on the update front, yet
another proposal was posted by C. Scott Ananian. In this scheme, each
laptop will occasionally poll a master server to see if an update is
available; this poll might take the form of a DNS lookup. The more systems
there are on the local network, the less frequently these polls will
happen.
If a laptop discovers that an update is available, it will start pulling it
down from the master server. This update will be divided into a number of
small chunks, each of which is independently checksummed and signed. As
those chunks come in, the receiving laptop will send them out to a
multicast address on the local mesh; all other laptops in the area should
then see it and grab a copy as it goes by. Once all of the required pieces
have been received, the update can be applied. If a laptop misses a
segment as it goes by, it will eventually time out and start actively
grabbing (and rebroadcasting) pieces itself.
Which approach will be adopted is not clear; if the project has decided on a
proposal (or a combination of them), that decision has not been posted on a
public list. Time is tight, though, and a rock-solid solution will have to
be in place before the first production systems ship. It is, after all,
risky to count on being able to fix the remote update system (remotely)
after the fact.
For a more general view of the state of OLPC software, a look at this message from Walter Bender (the OLPC
president for software and content). A lot is happening, but a number of
desired features (including the famous "view source" key) will not be
functioning when the first systems ship. The OLPC software, he says, is a
work in progress - much like the rest of our software. The "progress" part
is clearly happening, though, and OLPC appears to be on course to deliver a
system which will bring computing power and network connectivity to
millions of children - and which will change our views of how that should
be done.
(
Log in to post comments)