August 25, 2010
This article was contributed by James M. Leddy
Attendees at LinuxCon 2010 were lucky enough to have not just
one, but two presentations devoted to boot speed. The first
was "How We Made Ubuntu Faster", by
Upstart creator Scott James
Remnant; the other was
"Improving Android Boot-Up Time", by Tim Bird of Sony. As expected,
Scott's talk was centered around netbooks running Ubuntu, while Tim
focused on different development boards running Android. Nevertheless,
there were some commonalities between both projects.
Ubuntu
No discussion of boot up speed would be complete without mentioning
the 5 second boot achieved by
Arjan de Ven and Auke Kok of
Intel's Open Source Technology Center.
In fact, a number of things from Scott's session assumed a knowledge
of that effort by Intel.
Good metrics are pivotal for improving boot time, and to get good
metrics one must standardize the variables. The hardest of these is
the machine, because everyone has different computers that have
various components that are slower or faster than others. The Ubuntu team
realized they would have to buy a whole bunch of "standard"
computers. They chose the Dell Inspiron mini 10 netbook, dubbed the
"touchpad from hell" by Scott because it was hard to
use without the pointer jumping around. The laptop as a whole has the
key requirement of being available in SSD and rotational media
configurations, and is cheap enough to keep the project under budget.
The next important piece is to have a goal in mind. They chose 10
seconds, by "doubling the numbers that Arjan came up
with". The kernel and initramfs get a total of two seconds. The
same is allocated for platform initialization such as init scrips. The
X server gets another two seconds, and the desktop environment, Gnome,
gets four. It turns out these numbers weren't accurate predictors in
the long run, but for some cases such as kernel, they were able to
beat their deadline.
In order to create an automated system to measure the changes over
time, the team threw together a pretty elaborate configuration where
the system would reinstall the latest nightly builds, and then profile
the resulting boot automatically. They compiled all the results and
put them on Scott's
people page.
One of the big portions of the Moblin kernel improvement was the early
use of asynchronous kernel threads. They improved boot time by
initializing the SATA controller, to handle
storage, at the same time as the USB host adapters. Canonical built upon on this work by moving
populate_rootfs(), the function responsible for unpacking the
initramfs, to yet another asynchronous thread.
Though Intel claimed a speed boost from statically compiling modules
into the kernel,
the Canonical team has to be able to support more than just Intel
netbooks. To achieve this, they cleaned up some of the slower parts of
the init script, such as a replacement of a 10 millisecond poll of the
blkid binary with an event based call to libudev. In the end the
team was able to surpass their 2 second target, even with the
requirement to use an initramfs.
Scott took some time here to plug Upstart. Though the Intel
ultimately settled on a hand tuned invocation of the old System V init
daemon to improve boot, Scott insisted that an event based system is
better than "thousands of lines of shell script". This is
even more true today because pretty much every system on the market
has more than one CPU.
The Gnome environment took a bit of time to
boot as well. Ubuntu uses Compiz by default, and this took almost half
of the time allocated for the desktop environment. The audience asked
if Compiz could be eliminated, but there are too many features of
Ubuntu that depend on its inclusion. Other large offenders were
gnome-panel and Nautilus. Altogether these components contributed to a
10 second Gnome start up, more than double their 4 second allotment.
Their research revealed that storage is the ultimate bottleneck.
"Hard drives suck, but SSDs suck too" was the specific
wording. To improve the situation, Scott used a well known tool called
readahead. Initially developed
at Red Hat, readahead is a tool that will log the filename for every
instance of open() and execve() for the first 60 seconds of
boot. Then on the next boot, a readahead process is spawned early that
pulls all files in the list into the page cache, ensuring it's just a
simple memory access when they're read later on.
Intel improved Red Hat's readahead with super-readahead, or
sreadahead. This does the same
thing, but was modified to only preload the blocks that will be read
in, instead of the entire file. Since it's assumed to be a MeeGo
system running on SSDs, and all SSDs have negligible seek time, the
order of blocks on disk is not taken into account. Using an SSD, the
Ubuntu system can read all blocks necessary for boot in 3 seconds.
However, Ubuntu has to run on rotating media as well, so yet another
iteration called über-readahead
was created by Scott. The daemon was modified so that it reads blocks
in order when using a rotating hard drive. The graph of this
optimization shows a few random metadata reads, followed by a smooth
linear path across the platter. For the rotational media, all pages
necessary can be read into page cache in fewer than 7 seconds. Scott
said that things can go even faster if the inital reads could be sorted and
done in order prior to performing readahead on the file contents. There
were a few file system patches sent to LKML,
but inclusion does not seem likely at this point.
Scott concluded the discussion by admitting they didn't achieve their
goal. The inability to reduce the desktop environment portion to fewer
than ten seconds precluded a sub-ten second overall boot. Note this is
on a Dell netbook, so the numbers will likely be better for systems
with beefier processors and I/O subsystems, which includes almost all
desktops and
traditional laptops. In the presentation abstract, it is stated that some
machines
boot Ubuntu in as few as 5 seconds. The good news is that the kernel
now takes fewer than 2 seconds to initialize, even with the initramfs
requirement. And Scott did a lot of useful work that can be used by
the larger community. Only time will tell if the other
distributions take advantage of his work.
Android
Tim ran into a unique set of problems with Android handsets.
Whereas Scott's problems were already well known in the much wider
desktop Linux community, Tim is working with a suite of tools with
names like Dalvik and Zygote, whose source code has rarely been
modified outside of Google. As such, Tim's focus was about getting an
initial performance profile to find what part of the boot process will
yield the largest reduction in time, and in turn should get the most
developer effort.
He profiled three different platforms, the
ADP1 and
Nexus 1
from HTC, and an
EVM OMAP3 board
from Mistral Solutions. The overall time for these machines to boot
was 57, 36, and 62 seconds, respectively. Though these are all ostensibly
development machines, that number still seemed huge compared to the
netbook boot times, but it should be mentioned that these boards have
a much slower processor and storage. By the same token, you can do a
lot more with a fully functional Gnome desktop than a smart phone. Tim
pointed out that "it's really sad that you can use a stopwatch to
accurately measure a phone booting up".
The boot chart for the EVM board revealed a number of areas for
improvement.
Android uses a rewritten Java-esque virtual machine called Dalvik in all
of its phones. For optimal user experience, all of the classes must
be preloaded before the phone is used. Zygote, the utility responsible
for doing this work, spends about 21 seconds in I/O wait. The application
classes
don't have to be preloaded, one can choose to load them on
demand, but this is just pushing the problem back and causes longer
application load times. Worse, there is a memory penalty for each
class now has to be loaded in a different heap, so the memory for
identical classes can't be shared.
A potential solution is to figure out how to load every class into
Zygote's heap, so that you can have something akin to shared libraries
in a conventional OS. Another possible solution is to make Zygote
threaded, and have one thread use the CPU while the other is reading
from storage. A more far out possibility avoids reading in the classes
at all and loads the heap as a binary blob, though this would take
the most development effort and would require a rebuild when new
classes are installed.
The other potential speed gain lies in the package scanning tool. The
purpose of this tool wasn't exactly obvious to Tim, but he illustrated
its complexity by showing the call tree. At the end of it all
is the parseZipArchive() function, which is called 138 times. There
is some low hanging fruit there, for example Tim shaved off a few
seconds by commenting out a sanity check of the zip file headers.
Just above that is a ZipFileRO::open() call which will mmap() the
zip file into memory. The problem is that parseZipArchive() walks
the mmaped region and builds a hash table to make subsequent accesses
easier, causing page faults for the entire archive. All this is
done just to extract one file, AndroidManifest.xml, so the time and
memory spent to fault in all those pages and build the hash table is
essentially wasted.
There is an emerging consensus within the Android development
community that a lot of time can be shaved if readahead is used. But
Tim thought it was masking the underlying problem, and that some of
the blocks shouldn't be read at all, much less used to populate the page
cache. Scott, who was in the audience at this discussion, noted that
readahead isn't really about masking temporal locality of reference or
"papering over problems", but it is about using the CPU
while populating the page cache. Tim still felt that using readahead
would make the problems with the code less noticeable, and developers
wouldn't be as motivated to fix them. They both agreed when this ships
on a device to consumers, it should have readahead enabled.
Unfortunately there were no significant speed ups in boot time yet,
but there is still work to do. Interested readers are encouraged to
sign up for the Android
mailing lists,
and check out the
eLinux wiki.
Conclusions
Though Android and Dalvik are a departure from the traditional GNU
userspace that Ubuntu uses, they do have some commonalities. Firstly,
because the kernel is not impacted by user-space differences, kernel
improvements will be available for any and all Linux devices. Tim
didn't mention the kernel because there are already a lot of well
known techniques
to boot the kernel faster, so
it was outside of the scope of his talk. Presumably, the techniques covered in the Ubuntu
presentation would also help the Android system boot more quickly.
Some improvements in user space carry over as well. Readahead is a
generic enough technique that it can be included in pretty much any
environment. Similarly, profiling techniques like bootchart and
ftrace can be run in both environments. However, generic GNU userspace
has the advantage of more code sharing and reuse than third party
environments like Android. Improvements to booting the X server, for
example, will be felt across Ubuntu, MeeGo, and the other desktop
Linuxes out there. That isn't the case for Android.
Even so, the developer community for Android is growing and Tim is
evidence of that. The problem of slow Android boots has probably not
been thought about much outside of Google's office walls, but that is
changing. The potential for improvement is there, especially
in Android-specific places like the package scanner and Zygote. For
desktop distributions and even specialized distributions like MeeGo,
the fast boot story may be largely coming to an end. For Android, it's
only just beginning.
[ Bootcharts, graph, and photo courtesy of Scott James Remnant of
Canonical and Tim Bird of Sony. ]
(
Log in to post comments)