Jim Gettys has a long history at the interesting edge of computing
development; his past projects include MIT's Project Athena and the X
Window System. Currently, Jim is working on the One Laptop Per Child
project, which seeks to
distribute low-cost, Linux-based systems by the millions to children in the
Jim was kind enough to take what must have been a considerable amount of
time to answer our questions on this project. What follows is the first
part of the interview.
LWN: Could you briefly describe your role with the OLPC project?
Vice President of Software: in short, I worry about systems software
from one end of the project to the other and relations with the free and
open source software community.
The educational software and content are the province of others:
Nicholas Negroponte (the OLPC chairman), Walter Bender, Seymour Papert, Alan Kay, and
others, who have decades of experience in education of children with
computers, often in the developing world.
I also don't worry about how the bits get from machine to machine:
Michail Bletsas is our Chief Connectivity Officer.
Mary Lou Jepsen is our CTO, and responsible for our novel display
technology, and Mark Foster is V.P. of Engineering and chief hardware
Quanta Computers, founded by Barry Lam, who make almost 1/3 of the
laptops of the world, are building the OLPC machine.
It appears that few people appreciate the extent to which this project is
pushing the leading edge of free software development.
Our hardware is novel to meet the needs of children in the developing
world; much of the software we need to build in the short term are
driven by this novelty. We expect many of our innovations will appear in
conventional laptops over the coming years. In this case, Linux will be
leading rather than following the industry.
What are the features one would want for school-aged children, grades K-12?
A large fraction of such children are in parts of the developing
world where electricity is not available at home, or often even at
school, so for many children, a computer with low power consumption,
potentially human-powered, is a necessity, not a convenience.
Teaching may not even be inside, and certainly when children are at
home, they often will not be inside where conventional LCD screens are
usable. Children usually walk to and from school every day; weather is
unpredictable, rain, dirt and dust are commonplace. And cost is a major
consideration, if we are to bring computers and their great power to
help children learn, to children everywhere.
Much more about the hardware can be found in our wiki.
Consider the power
management issues, application slimming, system (non-)management
improvements, mesh networking, application checkpointing, pervasive IPv6,
localization problems, etc. Every one of these goals should benefit users
who will never see an OLPC system. How many of these goals do you think
you will be able to achieve by launch time?
Some of these items are all-or-nothing: others are suitable to
incremental progress. Let's take them one at a time.
Power management: We are doing at least two, if not three, true
innovations in this area:
- The Marvell wireless chip, which has an ARM 9 and 92K of RAM, can
forward packets in the mesh network while the processor is suspended to
RAM. This capability has been demonstrated in the lab, and Michail
Bletsas is confident of the outcome; in fact, it was an actual
demonstration that convinced us to use Marvell. Other wireless vendors
lack this capability. Our current estimate is that in this mode, the
wireless chip can be forwarding packets and the system consuming less
than a half a watt. We want there to be as little incentive to defeat
wireless as possible, so this is a key innovation: if children aren't
confident there will be power when they need it, they might work to
defeat the mesh.
- The display can be on while the processor is suspended, saving
power. In some modes, we expect to be suspending the CPU whenever it is
idle, even for times as low as a second or two. Since our display is
also novel and consumes much less power than conventional LCD's, even
the Geode's low power consumption would have otherwise dominated total
- Look around you the next time you sit in a conference room. How
many of you are actively using your machine at any given instant? How
much of the time are you just reading the screen? In many modes of use,
once the screen power consumption has been solved (as it is in our
display), the remaining major power consumption is the processor, power
supply and motherboard components. By making suspend/resume
unnoticeable, we can save most of the remaining power used in the
Mark Foster described his novel extremely fast suspend/resume software
technique at the Linux Power Management Summit this spring. Whether we
will need to implement it on our hardware to reach our goals of < 200ms
suspend/resume cycles awaits some laboratory tests (an iPAQ can already
suspend and resume in a subsecond period), but I expect we may need to
implement this technique. Any performance work *must* be preceded by
measurement to be useful: spending time optimizing the wrong code is a
waste. Of course, the faster suspend/resume can be made to work, the
more aggressive we can about suspending and saving power. This is an
example of an area where incremental improvement (once basic
capabilities) is possible.
We are also planning to dynamically change the refresh rate of the
screen depending on screen activity; as I've seen this capability in
graphics chips for cell phones, I won't claim this as full innovative,
though it will be new for the X Window System or window systems on
It is hard to predict how long similar hardware capabilities will take
to reach conventional hardware; but by showing it is possible, we know
it will happen and the software support required be useful to everyone.
There are also a number of places where changes in Linux and the desktop
environment can help. For example, the tickless patches currently being
worked on obviate the need for the CPU to wake up 100 times a second;
the more of the time a processor is fully idle, the more power saved.
Another example are places where the desktop environments are polling
periodically to find out changes in the system: notification systems are
much more efficient, and allow the system to be idle more of the time.
Out of memory behavior needs serious work: the current OOM killer's
policies are by current necessity very poor. Nokia has been
experimenting with more useful policies, exploiting information at the
user environment level, that can improve this behavior by informing the
kernel which processes are the most vital and which can be shot.
There seems to be a common fallacy among programmers that using memory
is good: on current hardware it is often much faster to recompute values than to
have to reference memory to get a precomputed value. A full cache miss
can be hundreds of cycles, and hundreds of times the power consumption
of an instruction that hits in the first level cache. Making things
smaller almost always makes them faster (and lower power). Similarly, it
can be much faster to redraw an area of the screen than to copy a saved
image from RAM to a screen buffer. Many programmer's presumptions are
now completely incorrect and we need to reeducate ourselves.
Sometimes we may just choose alternative applications. Of course, this
may not be what some application writers would like, and the solution
they can take is obvious. We have a large set of software to choose
from: this is one of open source's great strengths.
and others have
been doing some very
nice work identifying and fixing some of the
gratuitous memory usage.
A large part of this task is raising people's consciousness that we've
become very sloppy on memory usage, and often there is low hanging fruit
making things use less memory (and execute faster and use less power as
a result). Sometimes it is poor design of memory usage, and sometimes it
is out and out bugs leaking memory. On our class of a system, leaks are
of really serious concern: we don't want to be paging to our limited
In fact, much of the performance unpredictability of today's free
desktop can be attributed to the fact that several of our major
applications are wasting/leaking memory and driving even systems with
half a gigabyte of memory or more to paging quite quickly. Some of
these applications we care about, and some we don't: OpenOffice is just
not the right tool for someone learning to read and write, and we'll be
perfectly happy to use other tools. Some other major offenders need
fixing (and work has started): e.g. Firefox (Gecko), which, when using
tabs, has been hemorrhaging memory, which can force you to paging quite
quickly. Between evolution-data-server and Firefox alone, many people's
desktops exhibit unpredictable performance soon after boot due to
paging; fixing these problems will benefit all.
The memory usage display tools we have today are very misleading to
naive (and even journeyman) programmers, often leading them to massively
My biggest personal frustration (given my history with X) are people
saying: "X is bloated". The reality is: a) X maps all the frame buffer
and/or register space into its address space, so measurement of virtual
address spaced used is completely misleading: X may be actually
consuming only a very small amount of your DRAM, despite a virtual size
of a hundred megabytes, and b) X does what its told: many applications
seem to think that storing pixmaps in the X server (and often forgetting
about them entirely) is a good strategy, whereas retransmitting or
repainting the pixmap may be both faster and use less memory. Once in a
while there is a memory leak in X (generally in the graphics drivers):
but almost always the problem are leaks in applications, which often
forget the pixmaps they were using.
RAM in the X server is just as much RAM of your program, though it is in
a different address space. People forget that the X Window System was
developed on systems with 2 meg of RAM, and works today on 16 megabyte
We need better tools; some are beginning to appear. OLPC is sponsoring
a Google Summer of Code student, Eduardo Silva, from Chile, who is
working on a new tool called Memphis to help with this problem.
Work done on memory consumption will benefit everyone: not everyone in
the world has a 2ghz laptop with a gig or two of RAM...
System (non-)management improvements:
I think there are two, mostly separable areas here:
1) the kid's laptop, on which we want to focus primarily on making "easy
to fix", rather than "hard to break", so interested children can learn
computing. And by working hard to automate backup, we'd like the restore
of a laptop to be dead simple if there is some problem. By using
LinuxBIOS, we expect to be able to boot and reinstall via the network
easily. Requiring cables and/or USB keys for restore are costly and
complicate logistics greatly.
2) the school servers need to be "hard to break" as well as "easy to
fix", and remotely manageable, as finding expertise a remote school of
10 children and one teacher is very unlikely. This is one of the
factors driving us to IPv6 (much more below), since NATed IPv4 islands
cannot be easily remotely diagnosed or updated automatically without
expertise on the ground, which will often be rare in our deployment
I've recently become impressed by technology developed for and by
PlanetLab that Dave Reed brought to my attention. It is worth
everyone's careful look. See (www.planet-lab.org).
Pulling wires and having access points are very expensive and requires
expertise, neither of which may be available; and we want kids to be
able to work together anytime they meet up, even under a tree 3
kilometers from nowhere.
MIT Roofnet and other
projects have shown the feasibility of mesh networking, where one
machine forwards packets on behalf of others. Michail Bletsas is OLPC's
expert in this area, and has a lot of first hand experience. In radio
quiet areas, quite long links become feasible; in urban areas much
shorter links are only feasible, but the density of machines is likely
Our system is interesting in a number of ways beyond mesh software:
- it has antennae that can be rotated up above the top of the machine
and are more efficient than what you find in a conventional laptop; this
should roughly increase the footprint of each machine by a factor of
four (in area).
- the Marvell wireless chip we are using can operate autonomously. So
it can forward packets in the mesh even if the processor is suspended to
RAM! This should cut power consumption for an unused laptop to well
under one watt (current estimate is about .5 watts), while still
providing support to other machines in the mesh.
One of the challenges that the community can help later this year is to
help learn which techniques work best when the nodes of the mesh are
mobile machines. There are a number of routing protocols possible (some
of which should become power aware; not all machines may need to bother
to forward packets all the time), and which will work best in what
circumstances should be fun to learn.
128 megabytes of memory is enough to run (almost) any open source
application; there are a few exceptions, but few that are of educational
interest for young children. It isn't enough, on a system where paging
needs to be avoided, to run arbitrary numbers of the larger applications
at the same time.
In addition to the community working on dieting our environment (and
making it run faster as a result), application check-pointing could help
the user's experience greatly. When memory runs low, idle applications
not currently in use could save their state and be restarted later at
the same point. We see some of this being done in Maemo on the Nokia
770; the conventions for this need freedesktop codification and
implementation in applications.
In the developed world, we do not have a shortage of IPv4 addresses at
this time. We got to the Internet first, and grabbed enough "land" that
we don't yet feel the pain felt in other parts of the world.
We see differently from where we sit.
IPv6 to us is clearly essential on a number of grounds:
- address space, and not wanting a flag day conversion that would be
very difficult. There are good
arguments that we have effectively exhausted the IPv4 address space, and that even
conservation measures cannot change the situation by more than a year or
two. In the developing world the situation is already dire. In some
places, entire universities are hidden behind a single routable IPv4
address, and in others, NAT's are as much as 5 levels deep.
Vint Cerf told us that part of this problem is artificial: some cultures
are so worried about losing face if they were turned down that they have
not been asking for addresses, even though they would have been granted.
And part of it is very real indeed: Brazil is planning a deployment of
100,000,000 IP TV sets, for example; this cannot be done using IPv4. And
we hope to be deploying at such scale within a few years as well. Since
the cliff is already visible and we'd just as soon not fall off it; it
hurts so bad when you hit the bottom.
- it is impossible to diagnose problems if you can't observe them.
Initially, in many parts of the world, we have to presume limited
expertise is available on the ground, so local diagnosis could easily
become the limiting factor for deployment. If the school networks are
fragmented by NAT's, remote diagnosis becomes much more complicated.
- Building collaborative applications (or almost any new network
application) has become extremely difficult due to the extensive
deployment of NATs in the Internet: Skype is one of the few
applications, that by standing on its head in many ways, has succeeded
in (usually) working despite this disaster. Building such applications
becomes radically easier if we go back to the end to end principles
the Internet. NAT has made it
very difficult to deploy new applications.
Given tunneling technology (and 6to4, when routable addresses are
available), in concert with the IPv6 deployment that has begun in many
parts of the world, we can again have a clean end-to-end network, in
which kids anywhere can share with their peers all over the world.
So our judgment is that he time has really come, and (almost) all
applications are finally ready.
According to the Ethnologue
there are 347 languages
with more than one million speakers in the world, that covers 94% of the world's population.
We already see localization in open source systems for languages with fewer
speakers - one
million speakers. If we continue along the current path of localization, we're going to
find ourselves with a real problem within several years.
While I expect the current mechanisms and processes might get us through the first round of
deployments, the year after, this problem will become more acute. As a community, we
need to recognize this approaching problem and rise to the challenge. I wrote
in more detail in my blog.
Are you getting the needed
level of assistance from the community in reaching these goals?
It has been hard for people to help on the base hardware support, though
as the first few boards were distributed over the last month this has
been changing: this is about to change in a big way with our developer's
We are distributing almost 500 bare mother boards to enable people to
help on drivers, power management, code optimization (which not only
makes things faster, but reduces power consumption), mesh network
experimentation, etc. And there will be further opportunities later in
the program during beta test later this year.
you most urgently need help with at this time?
Power management is one key problem. And it can be subtle and indirect:
slow code, or bloated code, also wastes power.
The memory consumption problems, and how to manage low memory situations
is also key. It would help greatly if applications would bother to be
able to checkpoint their state and restart exactly where they left off.
Let's take one of those goals: paring down applications so that they fit
into the OLPC's memory. This is clearly an activity which benefits
everybody - bloated applications are slow applications. Are you making
progress in putting the needed tools on a diet?
There are only a few things we are doing ourselves at this instant; the
responsibility for these problems is distributed among a myriad of
We have a simple principle everyone should be aware of: if your
application is bloated, it's much less likely people will be able to use
it on the machine. There are usually alternatives for any particular
piece of software. Given the healthy competition in free software, there
is only a much smaller subset of software that we care about to the
point of fixing it ourselves. If you want your software to be usable,
please make it so: and everyone will benefit with leaner, faster
applications, not only OLPC.
How are the upstream
communities responding to debloating patches?
In most areas, we're still pedal to the metal on basic problems like
device drivers, and finishing up LinuxBIOS + Linux as boot loader so
that we can support installation over the (mesh) network. Ron Minnich
and Ollie Lo of LANL and the LinuxBIOS community are rising to the
Often, rather than patches, it is helping people understand there are
problems that need to be fixed. Chris Blizzard, who is on the board of
the Mozilla corporation, now works on OLPC (he's in charge of the Red
Hat team), and the Firefox team are finally aware they have a serious
problem and test cases are being generated. Chris says some progress has
already been made. Much more is needed, and there are viable
alternatives we could use if Firefox does not come through. But we think
they probably will by the time we will ship in volume.
Many thanks to Jim Gettys for taking the time to answer these questions.
The second part of the interview will appear next week.
to post comments)