|
|
Subscribe / Log in / New account

Debian switching to upstart

From:  Petter Reinholdtsen <pere-AT-hungry.com>
To:  debian-devel-announce-AT-lists.debian.org
Subject:  The future of the boot system in Debian
Date:  Sat, 05 Sep 2009 13:21:00 +0200

The future of the boot system in Debian
=======================================

Over the last few years, the boot system in Debian has progressively
deteriorated due to changes in the Linux kernel which make the kernel
more and more event based. For example, the kernel and its drivers no
longer block all processing while detecting disks, network interfaces
and other hardware, making the once trusty old boot system in Debian
increasingly fragile. During the current boot sequence device files in
/dev/ are often missing when fsck or mount are looking for them, or
the network is not available when the boot system tries to mount NFS
entries because network interfaces were slow to initialise, or audio
devices are missing when audio settings should be set. The problem is
fundamental to the way we boot Debian today - sequentially, and a
solution needs to address this fundamental problem. We believe the
solution is to migrate to an event based boot system.

In addition to this, there are long lasting problems with the boot
sequence of the existing init.d scripts, for some combination of
packages. The boot sequence is wrong in these cases, and to solve it
one needs to change the sequence numbers of all the immediate forward
and reverse dependencies of the init.d script in question - and their
forward and reverse dependencies and so forth until the boot sequence
is correct. In some cases the change needs to happen to several
scripts in different packages at the same time, which is impossible
with the old way of ordering init.d scripts. Previously the ordering
was done by asking the package maintainers to guess on and update
sequence numbers, a process that tended to introduce new problems and
took a long time to be solved properly. The solution to this problem
is to change how we order boot scripts. Change it from static sequence
numbers to calculate the boot sequence using dependency information
provided in the init.d scripts themselves. Since 2009-07-27 this is
the default in Debian unstable, and it will be the way init.d scripts
are ordered in Squeeze.  Switching to a dependency based boot
sequencing allows us to ensure its correctness, detect and fix
dependency loops, and in general fix a set of bugs in the distribution
that have been very hard to fix before.  Other solved problems with
the new system are incorrect stop sequences (the default should have
been S20/K80, not S20/K20), and the misleading runlevels 0 and 6,
where start symlinks are called with 'stop' as the argument to the
scripts. All of these problems are solved when Debian now moves to
dependency based boot sequencing.

It will take longer to solve the fundamental problem. It requires a
rewrite of how we handle the boot, and can not be done by just
modifying the boot sequence framework.

Before explaining the current plan, some background information. The
current boot system can be seen as consisting of three different parts:

 1. The implementation of /sbin/init, reading /etc/inittab and
    starting the script implementing part 2 (/etc/init.d/rc). This is
    normally done using the sysvinit package, but other replacements
    are available, like initng and upstart.

 2. The implementation of /etc/init.d/rc, which is responsible for
    calling the init.d scripts in the correct sequence. This is
    normally done using the sysv-rc package. An alternative is the
    file-rc package, which uses a file /etc/runlevel.conf instead of
    symlinks in /etc/rc?.d/ to decide what to execute and in which
    order.

 3. The individual init.d scripts, taking care of the tasks that need
    to be done during boot. The basic framework is provided by the
    initscripts package, and the rest is handled by individual
    packages like udev, netbase, ifupdown, apache, etc. :) There are
    approximately 850 packages with init.d scripts in Debian unstable.

Part 2 (sysv-rc) and 3 have been changed to use dependency based boot
sequencing. This was a release goal for Lenny and the continued work
is a release goal for Squeeze. The init.d scripts have seen review
with regard to dependency based ordering for more than 3 years. New
installs will use dependency based boot sequencing. For upgrades, a
critical debconf question will give the default option to migrate if
testing find no issues with the existing scripts or for now keep the
legacy boot ordering.

To solve the fundamental problem, the plan is to replace /sbin/init
with an implementation that is able to handle kernel events. It will
allow us to modify the boot system for the early boot to become event
based, while keeping the existing boot stuff working. We could rewrite
sysvinit to become event based, or have a look at the existing boot
systems that handle kernel events. After checking the options and the
systems used in other distributions, upstart seems like the most
promising candidate. It is used by Ubuntu and Fedora at the moment,
and solves the problem in a backwards compatible way. The plan is to
change upstart to actually use /etc/inittab, to ease the switch
between sysvinit and upstart. We will also change the init.d script
handling to treat upstart jobs as init.d scripts, to provide an
alternative for architectures lacking upstart support. These changes
should make it transparent for the users which package provides
/sbin/init, and thus make it easier to migrate from sysvinit to
upstart.

When /sbin/init is changed to an event based framework, the next step
is to rewrite the early boot system to use these events when
available, and behave the traditional way when there are no
events. When this step is finished the fundamental problem will be
solved, and the boot will be robust and should work correctly even in
edge cases with slow device buses.

The planned time frame for this is to replace /sbin/init with upstart
for Squeeze, and see if we manage to change the very early boot to
become event based in time for Squeeze too, fixing the most pressing
of the current boot problems (failing fsck and mount with USB
disks). For Squeeze+1, more of the early boot system will be
converted, to handle more of the existing problems.

According to the Linux Software Base specification, all LSB compliant
distributions must handle packages with init.d scripts. As Debian
plans to continue to follow the LSB, this mean the boot system needs
to continue to handle init.d scripts. Because of this, we need a boot
system in debian that is both event based for the early boot, and
which also calls init.d scripts at the appropriate time.

References:
  http://wiki.debian.org/LSBInitScripts/DependencyBasedBoot

Petter Reinholdtsen, Kel Modderman, Armin Berres




to post comments

Debian switching to upstart

Posted Sep 6, 2009 15:35 UTC (Sun) by salimma (subscriber, #34460) [Link] (5 responses)

That'd be great for commercial ISVs as well as upstream developers -- they can now write a single service initialization script that will work on Ubuntu, Fedora and Debian alike (not sure what openSUSE is using).

Debian switching to upstart

Posted Sep 6, 2009 15:54 UTC (Sun) by rahulsundaram (subscriber, #21946) [Link] (2 responses)

In theory, that would also be possible following the LSB spec. Note that the upstart format is a different and incompatible one although upstart does have a compatibility mode that enables distributions to transition in a phased manner.

Debian switching to upstart

Posted Sep 7, 2009 7:32 UTC (Mon) by arekm (guest, #4846) [Link] (1 responses)

That "compatibility mode" is almost unmaintained in upstart. Maybe that will change now.

Debian switching to upstart

Posted Sep 11, 2009 0:10 UTC (Fri) by hmh (subscriber, #3838) [Link]

It will have to be, or Debian will have to ditch the idea of switching to upstart. We take that stuff seriously.

Debian switching to upstart

Posted Sep 7, 2009 0:20 UTC (Mon) by jengelh (guest, #33263) [Link] (1 responses)

openSUSE remains to use LSB-compliant sysvinit init.d scripts. (And IIRC, one reason is that Upstart alone does not make the boot any faster - the scripts are the bottleneck.) Secondarily, openSUSE already supports starting boot scripts in parallel, either using startpar(8) or make(1)'s dependency mechanism, for some years and thus not block where there is no dependent ordering enforced by scripts themselves (= does not make sense to start NFS when you have no network interfaces up).

Precisely — Speed is not the main factor (but a nice side effect)

Posted Sep 7, 2009 1:39 UTC (Mon) by gwolf (subscriber, #14632) [Link]

What has been thoroughly discussed is that the current SysV init system is best suited for servers where you can assume quite a bit of things — Kind of hardware you can count on, general network topology, etc.

Of course, there are gross hacks, such as network-manager, but it makes no sense to wait (on the default installation, at least) for a user to be logged in to nm-applet ask network-manager to be so kind as to choose a profile.

What switching to upstart will provide is not about booting faster, but about booting smarter — And about behaving smarter. There is no use on having i.e. a server that binds to a specific network interface running if said interface is not available and configured, right? So that server should declare it depends on "eth0 gets a valid IP address", or on "getting 192.168.1.15". Then, when the kernel fires that event, upstart will tell the server to start.

Your Ethernet switch died? Well, that might push the network down. If so, the daemon in question should know it is now useless and shut itself down, so there are no unexpected, meaningless failure messages (of course, the admin should be notified about th problem — But about the fact that the problem happened in the network connection, not in the daemons' space. That will help him understand WTF. And of course, said daemons will come back to life once networking is restored.

The fact that upstart will allow for faster boots due to parallelizing all processes that wait for a specific event is just... A nice side effect.

Debian switching to upstart

Posted Sep 6, 2009 16:33 UTC (Sun) by obi (guest, #5784) [Link]

It makes sense, I was hoping for this.

Upstart wasn't the first init replacement, but it seemed to me that it's the most complete, with backwards compatibility, service supervision etc. Scott was able to look at the designs of launchd, initng, daemontools, smf, and in a way, his own - early upstart has seen a major deployment in Ubuntu, and it seems he learnt a lot from the early experience. Now it's seen adoption not only on Ubuntu and Fedora, but also on the Palm Pre's WebOS and Nokia's Maemo.

I'll be glad to be able to move away from racy, crufty boilerplate shell code for initscripts. Of course I suspect it'll be a while still until all the Debian packages with initscripts are converted to upstart jobs, and I'm curious to see what changes Debian wants to introduce in upstart.

Will it make it faster or slower?

Posted Sep 6, 2009 17:40 UTC (Sun) by endecotp (guest, #36428) [Link] (19 responses)

Interestingly there is no mention of boot time in the announcement. I would like to hope that the changes would have faster boot time as an objective, but when I wrote my "Booting Debian in 14 seconds" article I got the impression that this is not considered important by any of the relevant Debian people.

Will it make it faster or slower?

Posted Sep 6, 2009 17:48 UTC (Sun) by MattPerry (guest, #46341) [Link] (17 responses)

Is boot speed really important? With the ability to hibernate and suspend computers these days, what real value is gained in removing 15-30 seconds from booting? It seems like a "penny-wise, pound foolish" approach to optimization.

Will it make it faster or slower?

Posted Sep 6, 2009 17:59 UTC (Sun) by josh (subscriber, #17465) [Link]

Suspend still uses power, and has the potential for spurious wakeups. Many people don't feel comfortable suspending a system and tossing it in a laptop bag.

Given appropriate optimization, booting should take a lot less time than un-hibernating. In particular, the latter takes more time the more RAM you have in use.

In any case, many good reasons still exist to turn a laptop off rather than hibernating or suspending.

Will it make it faster or slower?

Posted Sep 6, 2009 18:56 UTC (Sun) by jreiser (subscriber, #11027) [Link]

Is boot speed really important?

Yes. There is still mainstream hardware (PC clone 1 to 3 years old) that does not suspend and hibernate properly. Also, coming out of hibernation is slower than booting on many desktop and laptop systems, including several that I use regularly.

boot speed matters

Posted Sep 6, 2009 19:44 UTC (Sun) by DonDiego (guest, #24141) [Link] (11 responses)

Absolutely. Do you intend to hibernate or suspend your appliances? I bet not. You want your
Bluray player, router, etc. to be available as quickly as possible.

boot speed matters

Posted Sep 6, 2009 21:58 UTC (Sun) by drag (guest, #31333) [Link] (10 responses)

Well lots of modern complex appliances do do standby modes. Televisions, radios, media players, etc etc. The "shutoff button" is a lie.

The most classic example of this that I can think of is automobile radios. They maintain power continuously in order to maintain the internal memory of radio station selects. If you remove the battery from the car or let the car run out of battery capacity (so you'd need a "jump" to get started) then you would have to go and reprogram your radio.

Another classic example is that older non-solid-state radios and televisions would have to maintain the temperatures on their tubes in order to start up quicker. I had a old tube television that doubled as a nice space heater. When plugged in it would take a few seconds to start up. However if it was cold then it may take up to a view minutes for the image to fully stabilize on the screen.

More modern examples of this are things like HD TV sets that take long to do a cold boot. Also VCRs would often suck more electricity shutoff then when they were operating.

In fact I would not be surprised that most modern appliances use current while "shut off". Not that I have done a survey.

---------------

The thing that sucks about Linux here is that power management is still unreliable. Its unreliable in the sense that it is not consistent and it causes crashes and dataloss occasionally.

If everything was right and proper in the world when you go to shut of your computer there should only be one option

"standby"

And that will send it into low power mode. No shutoff, reboot, suspend, hibernate, or anything like that. To get to those options you should have to dig further because there is no reason to use those during normal operations.

It should suspend AND save a system image to swap. That way when you start it up again then your golden.. even if you remove the battery the system can recover from the memory image in your storage. But Linux is not there yet, so we still need a half a dozen different options. This way no matter the expectations of the user it will more then likely do the right thing.

If Linux was reliable then there would be no hesitation for people to close the lid and stick it in the bag.

There are Linux systems that do this sort of thing correctly. Linux cell phones are one. The e-paper tables like Amazon's go into standby in between each screen render. My Dell Mini9 works like a champ now (I still wait and look for the power light to pulsate (indicating suspend) before sticking it in a bag though).

boot speed matters

Posted Sep 6, 2009 22:17 UTC (Sun) by dlang (guest, #313) [Link] (3 responses)

as long as you aso have no need to dual boot, and as long as you are willing to wait the time for your memory image to be written to disk, etc

some people want their computer to actually shut off, so that they can use the power in the battery later when they turn it on rather than have it drain away in 'suspend mode'

the 'standby only' mode may be what you consider ideal, but many other people would not find it ideal, even if there were no bugs in the linux suspend

boot speed matters

Posted Sep 7, 2009 0:01 UTC (Mon) by foom (subscriber, #14868) [Link] (2 responses)

> some people want their computer to actually shut off, so that they can use the power in the
battery later when they turn it on rather than have it drain away in 'suspend mode'

It's certainly possible some people want that, but I doubt there's _very many_ that go weeks
between uses of their laptop, on battery.

If linux power management worked as well on all the multitude of supported laptops/desktops, as
Apple has done for their 10 or so models, this conversation wouldn't be happening. Everyone I
know with a mac laptop simply closes the lid when they're done using it, no question. It's nearly
100% reliable and uses extremely minimal power.

boot speed matters

Posted Sep 7, 2009 0:22 UTC (Mon) by dlang (guest, #313) [Link]

well, if you can select exactly what hardware you will run on, supporting that hardware gets _much_ easier than the situation with linux where it needs to work on hardware designed by others.

it's not necessarily that people want to wait weeks between times that they use their computers, but if a battery will last a week on standby, letting the system sit in standby mode uses 1/7 of your power, that's a significant amount if you aren't going to be near a power outlet.

don't get me wrong, it's not that I don't want to see improvements in linux power management, it's just that I don't see 'suspend' replacing 'off' as being either realistic or desirable for all cases.

boot speed matters

Posted Sep 7, 2009 3:22 UTC (Mon) by daniels (subscriber, #16193) [Link]

OS X still has bugs in this regard too -- I've seen 15" unibody MacBook Pros attempting to cook themselves in bags many times, with the lid shut.

boot speed and standby power waste

Posted Sep 7, 2009 4:55 UTC (Mon) by eru (subscriber, #2753) [Link] (1 responses)

In fact I would not be surprised that most modern appliances use current while "shut off". Not that I have done a survey.

Yes, they do. And a lot. Plugging them into a power meter is eye-opening. For example a few years old Dell desktop PC I tested consumes 9 watts while supposedly "off", and my DTV set-top box seems to draw about the same amount, whether it is nominally "on" or "off". Google for "standby power waste" for more complaints...

With CO2 reduction being the hot issue, authorities have started to look into this. EU and some other legislators have already set standby power limits that will gradually come into effect.

A few years on, only consumer devices that are draw practically no current at all when "off" will be seen acceptable. For Linux to live on then, it either has to boot fast, or be reliably able to suspend into non-volatile memory.

boot speed and standby power waste

Posted Sep 7, 2009 11:02 UTC (Mon) by etienne_lorrain@yahoo.fr (guest, #38022) [Link]

Ever tried to plug that power meter on a laptop power supply without any laptop connected? Mine seems to consume 14 Watt all alone...
Long time ago there was good old voltage transformer made of steel, energy leaks were low, but now that is really different.

boot speed matters

Posted Sep 7, 2009 22:11 UTC (Mon) by ignacio.hernandez (guest, #56157) [Link] (3 responses)

Not sure about what car radios are you talking about but working on them for a while has showed me that almost all radios shipping in an automobile from the factory use a form of non volatile memory for storing you preferred stations.

boot speed matters

Posted Sep 8, 2009 8:24 UTC (Tue) by drag (guest, #31333) [Link] (1 responses)

All I can say is that you obviously have not been working on car stereos long enough.

boot speed matters

Posted Sep 8, 2009 20:26 UTC (Tue) by njs (subscriber, #40338) [Link]

It's probably a function of nvram economics and manufacturer philosophy at different times -- my 15-year-old oem radio from Volvo does preserve presets over battery disconnection.

Not that this is terribly relevant to anything.

boot speed matters

Posted Sep 8, 2009 16:22 UTC (Tue) by HenrikH (subscriber, #31152) [Link]

Not on any of the cars that I have ever own, including a 2001 Toyota Avensis had any form av non-volatile memory in the radio. Whenever there was service performed on the car(s) the stations went.

Will it make it faster or slower?

Posted Sep 7, 2009 6:03 UTC (Mon) by jmm82 (guest, #59425) [Link]

I still do not feel hibernate and suspend are completely reliable and always appropriate. Boot time may not be important for a server or desktop which goes months between reboots, but for a laptop or netbook it is very important. When I use my netbook I often am performing a task which may take only 30 seconds once I boot up, therefore depending on the boot time my task could take twice as long to complete.

I would not be surprised if Debian did not care about boot time given my earlier logic. Most people who care about boot time will probably be using Ubuntu or a similar distro.

Will it make it faster or slower?

Posted Sep 10, 2009 7:36 UTC (Thu) by PaulWay (guest, #45600) [Link]

> Is boot speed really important? With the ability to hibernate and suspend computers these days, what real value is gained in removing 15-30 seconds from booting?

The real value of upstart is that it also handles the hibernate and suspend processes. Instead of having separate, semi-magical scripts to handle the suspend process, upstart uses the same scripts as it would if it were shutting down whatever needs to be shut down. That way you know that everything's shut down cleanly and the developer hasn't patched a bug in one place and not in another.

In fact, you can extend this to most state transitions. Turn the wifi and bluetooth off to save power or to work in an airplane? Upstart will do the same things as required if you were shutting down the whole laptop. Shut down the USB devices to save power? Same thing. You get to use the same scripts all the way through, meaning everything is much more robust. And those start-ups start everything up in parallel, just like your boot sequence.

So upstart is great for speed and flexibility whether you suspend, hibernate, shut down, or whatever.

Have fun,

Paul

Will it make it faster or slower?

Posted Sep 10, 2009 16:37 UTC (Thu) by Quazatron (guest, #4368) [Link]

Is boot speed really important?
I have several systems, and not one of them suspends/resumes cleanly (let alone hibernate!). And while I envy the people who use iMacs, that simply open the laptop and are ready to work, I prefer the freedom Linux gives me. So I'll keep shutting down the machines until the day comes when suspend/resume works reliably (or nvram becomes standard, whatever comes first).

Will it make it faster or slower?

Posted Sep 11, 2009 0:04 UTC (Fri) by cjwatson (subscriber, #7322) [Link]

Speed is important, but in many cases before you can make it faster you have to make it reliable. Firstly, with a system that's full of race conditions, trying to speed it up will make it more likely that you'll run into more of them, and nobody will be very happy with that. Secondly, many of the speed problems with the current setup are essentially due to sledgehammer workarounds for things that would otherwise be unreliable. Put the system together in a better-thought-out way with more precise constraints on its behaviour and *then* it's a lot easier to make it faster.

(Petter said much the same thing at the boot BoF at DebConf, IIRC.)

Ubuntu - not a big deal

Posted Sep 6, 2009 18:25 UTC (Sun) by muwlgr (guest, #35359) [Link] (1 responses)

They still use upstart in sysvinit-compatible manner, i.e., to get to the runlevel 2, run /etc/init.d/rc with parameter '2', which in turn runs each of /etc/rc2.d/S??* with parameter 'start', and so on. You still could revert between upstart and sysvinit, and between /etc/inittab and /etc/event.d as you wish. Nothing of these fancy dependency tracking and startup order optimizations is ever seen.

Ubuntu - not a big deal

Posted Sep 6, 2009 21:20 UTC (Sun) by keybuk (guest, #18473) [Link]

The first switch to native Upstart jobs in Ubuntu happens tomorrow.

Debian switching to upstart

Posted Sep 6, 2009 23:01 UTC (Sun) by k8to (guest, #15413) [Link] (2 responses)

I'm not a fan. This will make troubleshooting much more difficult.

Debian switching to upstart

Posted Sep 7, 2009 0:01 UTC (Mon) by sbergman27 (guest, #10767) [Link] (1 responses)

According to the article, the current boot process is broken beyond repair. Troubleshooting doesn't get any harder than that.

I would say that this will make troubleshooting *easier*. Especially since it is a simple config option to tell Upstart whether to allow concurrent operations, or to serialize the entire process.

Debian switching to upstart

Posted Sep 16, 2009 18:47 UTC (Wed) by oak (guest, #2786) [Link]

Parallelism will make things harder to debug because then there's more
chance for timing related issues (-> whether something works, differs from
machine to machine). Besides switching to Upstart, the race conditions
exposed by parallelism need to be fixed.

One can get from Upstart scripts a graph of how the things are started
which helps a bit on understanding how things work.

Debian switching to upstart

Posted Sep 7, 2009 0:58 UTC (Mon) by jimmybgood (guest, #26142) [Link] (2 responses)

What are these events that the kernel notices? Is this like udev detecting hardware? How would a kernel event imply that an SSH server or a user space file system need to be started (or initialized)?

Debian switching to upstart

Posted Sep 7, 2009 1:34 UTC (Mon) by jgg (subscriber, #55211) [Link] (1 responses)

Therein lies the actual problem.

Stuff starting at boot largely has an ordering dependency because that daemons are not really all that clever.

NFS needs to wait for networking? Well, the NFS daemon should be able to block on netlink until a route to the server appears.

Why does ssh need to wait? Because in some cases resolv.conf isn't updated yet, or the address it wants to bind to doesn't exist yet. Fix that and the dependency goes away.

Hopefully upstart has a better dependency mechanism than just waiting for script X to finish. It should be things like 'run me after /var/run/ appears' or 'run me after eth0 appears' or 'run me after a route to x.x.x.x appears'. ie things that actually matter.

Debian switching to upstart

Posted Sep 7, 2009 10:26 UTC (Mon) by hppnq (guest, #14462) [Link]

Therein lies the actual problem.

Not really. The problem is that the distribution's boot sequence logic is not as clever as some of the programs it is trying to start, or the kernel it is running. Dependencies will always be there: it will remain difficult to communicate with another host if that host cannot be reached. The boot sequence dependencies are special because both the events, as well as their ordering, in most cases, are known in advance. You can circumvent them with the existing init/init.d setup as well -- but it is probably a better idea to move to an infrastructure that is more event based and up-to-date.

(Where it matters, some programs have or should been clever enough to detect and solve dependency problems themselves -- IIRC for years it has been possible to bind() to a certain address while the interface is not there yet. This, however, has for obvious reasons never been perceived as the solution to the problem that a user may plug in to different networks, or forgets to plug in the network cable in time.)

Debian switching to upstart

Posted Sep 7, 2009 9:11 UTC (Mon) by lmb (subscriber, #39048) [Link]

A dependency mechanism and service supervision are quite useful; the various cluster stacks such as Pacemaker implement this at a multi-node level, and are sometimes (not incorrectly) described as a "cluster-wide init".

A move to an event-based system is very desirable.


Copyright © 2009, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds