Debian switching to upstart
From: | Petter Reinholdtsen <pere-AT-hungry.com> | |
To: | debian-devel-announce-AT-lists.debian.org | |
Subject: | The future of the boot system in Debian | |
Date: | Sat, 05 Sep 2009 13:21:00 +0200 |
The future of the boot system in Debian ======================================= Over the last few years, the boot system in Debian has progressively deteriorated due to changes in the Linux kernel which make the kernel more and more event based. For example, the kernel and its drivers no longer block all processing while detecting disks, network interfaces and other hardware, making the once trusty old boot system in Debian increasingly fragile. During the current boot sequence device files in /dev/ are often missing when fsck or mount are looking for them, or the network is not available when the boot system tries to mount NFS entries because network interfaces were slow to initialise, or audio devices are missing when audio settings should be set. The problem is fundamental to the way we boot Debian today - sequentially, and a solution needs to address this fundamental problem. We believe the solution is to migrate to an event based boot system. In addition to this, there are long lasting problems with the boot sequence of the existing init.d scripts, for some combination of packages. The boot sequence is wrong in these cases, and to solve it one needs to change the sequence numbers of all the immediate forward and reverse dependencies of the init.d script in question - and their forward and reverse dependencies and so forth until the boot sequence is correct. In some cases the change needs to happen to several scripts in different packages at the same time, which is impossible with the old way of ordering init.d scripts. Previously the ordering was done by asking the package maintainers to guess on and update sequence numbers, a process that tended to introduce new problems and took a long time to be solved properly. The solution to this problem is to change how we order boot scripts. Change it from static sequence numbers to calculate the boot sequence using dependency information provided in the init.d scripts themselves. Since 2009-07-27 this is the default in Debian unstable, and it will be the way init.d scripts are ordered in Squeeze. Switching to a dependency based boot sequencing allows us to ensure its correctness, detect and fix dependency loops, and in general fix a set of bugs in the distribution that have been very hard to fix before. Other solved problems with the new system are incorrect stop sequences (the default should have been S20/K80, not S20/K20), and the misleading runlevels 0 and 6, where start symlinks are called with 'stop' as the argument to the scripts. All of these problems are solved when Debian now moves to dependency based boot sequencing. It will take longer to solve the fundamental problem. It requires a rewrite of how we handle the boot, and can not be done by just modifying the boot sequence framework. Before explaining the current plan, some background information. The current boot system can be seen as consisting of three different parts: 1. The implementation of /sbin/init, reading /etc/inittab and starting the script implementing part 2 (/etc/init.d/rc). This is normally done using the sysvinit package, but other replacements are available, like initng and upstart. 2. The implementation of /etc/init.d/rc, which is responsible for calling the init.d scripts in the correct sequence. This is normally done using the sysv-rc package. An alternative is the file-rc package, which uses a file /etc/runlevel.conf instead of symlinks in /etc/rc?.d/ to decide what to execute and in which order. 3. The individual init.d scripts, taking care of the tasks that need to be done during boot. The basic framework is provided by the initscripts package, and the rest is handled by individual packages like udev, netbase, ifupdown, apache, etc. :) There are approximately 850 packages with init.d scripts in Debian unstable. Part 2 (sysv-rc) and 3 have been changed to use dependency based boot sequencing. This was a release goal for Lenny and the continued work is a release goal for Squeeze. The init.d scripts have seen review with regard to dependency based ordering for more than 3 years. New installs will use dependency based boot sequencing. For upgrades, a critical debconf question will give the default option to migrate if testing find no issues with the existing scripts or for now keep the legacy boot ordering. To solve the fundamental problem, the plan is to replace /sbin/init with an implementation that is able to handle kernel events. It will allow us to modify the boot system for the early boot to become event based, while keeping the existing boot stuff working. We could rewrite sysvinit to become event based, or have a look at the existing boot systems that handle kernel events. After checking the options and the systems used in other distributions, upstart seems like the most promising candidate. It is used by Ubuntu and Fedora at the moment, and solves the problem in a backwards compatible way. The plan is to change upstart to actually use /etc/inittab, to ease the switch between sysvinit and upstart. We will also change the init.d script handling to treat upstart jobs as init.d scripts, to provide an alternative for architectures lacking upstart support. These changes should make it transparent for the users which package provides /sbin/init, and thus make it easier to migrate from sysvinit to upstart. When /sbin/init is changed to an event based framework, the next step is to rewrite the early boot system to use these events when available, and behave the traditional way when there are no events. When this step is finished the fundamental problem will be solved, and the boot will be robust and should work correctly even in edge cases with slow device buses. The planned time frame for this is to replace /sbin/init with upstart for Squeeze, and see if we manage to change the very early boot to become event based in time for Squeeze too, fixing the most pressing of the current boot problems (failing fsck and mount with USB disks). For Squeeze+1, more of the early boot system will be converted, to handle more of the existing problems. According to the Linux Software Base specification, all LSB compliant distributions must handle packages with init.d scripts. As Debian plans to continue to follow the LSB, this mean the boot system needs to continue to handle init.d scripts. Because of this, we need a boot system in debian that is both event based for the early boot, and which also calls init.d scripts at the appropriate time. References: http://wiki.debian.org/LSBInitScripts/DependencyBasedBoot Petter Reinholdtsen, Kel Modderman, Armin Berres
Posted Sep 6, 2009 15:35 UTC (Sun)
by salimma (subscriber, #34460)
[Link] (5 responses)
Posted Sep 6, 2009 15:54 UTC (Sun)
by rahulsundaram (subscriber, #21946)
[Link] (2 responses)
Posted Sep 7, 2009 7:32 UTC (Mon)
by arekm (guest, #4846)
[Link] (1 responses)
Posted Sep 11, 2009 0:10 UTC (Fri)
by hmh (subscriber, #3838)
[Link]
Posted Sep 7, 2009 0:20 UTC (Mon)
by jengelh (guest, #33263)
[Link] (1 responses)
Posted Sep 7, 2009 1:39 UTC (Mon)
by gwolf (subscriber, #14632)
[Link]
What has been thoroughly discussed is that the current SysV init system is best suited for servers where you can assume quite a bit of things Kind of hardware you can count on, general network topology, etc. Of course, there are gross hacks, such as network-manager, but it makes no sense to wait (on the default installation, at least) for a user to be logged in to nm-applet ask network-manager to be so kind as to choose a profile. What switching to upstart will provide is not about booting faster, but about booting smarter And about behaving smarter. There is no use on having i.e. a server that binds to a specific network interface running if said interface is not available and configured, right? So that server should declare it depends on "eth0 gets a valid IP address", or on "getting 192.168.1.15". Then, when the kernel fires that event, upstart will tell the server to start. Your Ethernet switch died? Well, that might push the network down. If so, the daemon in question should know it is now useless and shut itself down, so there are no unexpected, meaningless failure messages (of course, the admin should be notified about th problem But about the fact that the problem happened in the network connection, not in the daemons' space. That will help him understand WTF. And of course, said daemons will come back to life once networking is restored. The fact that upstart will allow for faster boots due to parallelizing all processes that wait for a specific event is just... A nice side effect.
Posted Sep 6, 2009 16:33 UTC (Sun)
by obi (guest, #5784)
[Link]
Upstart wasn't the first init replacement, but it seemed to me that it's the most complete, with backwards compatibility, service supervision etc. Scott was able to look at the designs of launchd, initng, daemontools, smf, and in a way, his own - early upstart has seen a major deployment in Ubuntu, and it seems he learnt a lot from the early experience. Now it's seen adoption not only on Ubuntu and Fedora, but also on the Palm Pre's WebOS and Nokia's Maemo.
I'll be glad to be able to move away from racy, crufty boilerplate shell code for initscripts. Of course I suspect it'll be a while still until all the Debian packages with initscripts are converted to upstart jobs, and I'm curious to see what changes Debian wants to introduce in upstart.
Posted Sep 6, 2009 17:40 UTC (Sun)
by endecotp (guest, #36428)
[Link] (19 responses)
Posted Sep 6, 2009 17:48 UTC (Sun)
by MattPerry (guest, #46341)
[Link] (17 responses)
Posted Sep 6, 2009 17:59 UTC (Sun)
by josh (subscriber, #17465)
[Link]
Given appropriate optimization, booting should take a lot less time than un-hibernating. In particular, the latter takes more time the more RAM you have in use.
In any case, many good reasons still exist to turn a laptop off rather than hibernating or suspending.
Posted Sep 6, 2009 18:56 UTC (Sun)
by jreiser (subscriber, #11027)
[Link]
Yes. There is still mainstream hardware (PC clone 1 to 3 years old) that does not suspend and hibernate properly. Also, coming out of hibernation is slower than booting on many desktop and laptop systems, including several that I use regularly.
Posted Sep 6, 2009 19:44 UTC (Sun)
by DonDiego (guest, #24141)
[Link] (11 responses)
Posted Sep 6, 2009 21:58 UTC (Sun)
by drag (guest, #31333)
[Link] (10 responses)
The most classic example of this that I can think of is automobile radios. They maintain power continuously in order to maintain the internal memory of radio station selects. If you remove the battery from the car or let the car run out of battery capacity (so you'd need a "jump" to get started) then you would have to go and reprogram your radio.
Another classic example is that older non-solid-state radios and televisions would have to maintain the temperatures on their tubes in order to start up quicker. I had a old tube television that doubled as a nice space heater. When plugged in it would take a few seconds to start up. However if it was cold then it may take up to a view minutes for the image to fully stabilize on the screen.
More modern examples of this are things like HD TV sets that take long to do a cold boot. Also VCRs would often suck more electricity shutoff then when they were operating.
In fact I would not be surprised that most modern appliances use current while "shut off". Not that I have done a survey.
---------------
The thing that sucks about Linux here is that power management is still unreliable. Its unreliable in the sense that it is not consistent and it causes crashes and dataloss occasionally.
If everything was right and proper in the world when you go to shut of your computer there should only be one option
"standby"
And that will send it into low power mode. No shutoff, reboot, suspend, hibernate, or anything like that. To get to those options you should have to dig further because there is no reason to use those during normal operations.
It should suspend AND save a system image to swap. That way when you start it up again then your golden.. even if you remove the battery the system can recover from the memory image in your storage. But Linux is not there yet, so we still need a half a dozen different options. This way no matter the expectations of the user it will more then likely do the right thing.
If Linux was reliable then there would be no hesitation for people to close the lid and stick it in the bag.
There are Linux systems that do this sort of thing correctly. Linux cell phones are one. The e-paper tables like Amazon's go into standby in between each screen render. My Dell Mini9 works like a champ now (I still wait and look for the power light to pulsate (indicating suspend) before sticking it in a bag though).
Posted Sep 6, 2009 22:17 UTC (Sun)
by dlang (guest, #313)
[Link] (3 responses)
some people want their computer to actually shut off, so that they can use the power in the battery later when they turn it on rather than have it drain away in 'suspend mode'
the 'standby only' mode may be what you consider ideal, but many other people would not find it ideal, even if there were no bugs in the linux suspend
Posted Sep 7, 2009 0:01 UTC (Mon)
by foom (subscriber, #14868)
[Link] (2 responses)
It's certainly possible some people want that, but I doubt there's _very many_ that go weeks
If linux power management worked as well on all the multitude of supported laptops/desktops, as
Posted Sep 7, 2009 0:22 UTC (Mon)
by dlang (guest, #313)
[Link]
it's not necessarily that people want to wait weeks between times that they use their computers, but if a battery will last a week on standby, letting the system sit in standby mode uses 1/7 of your power, that's a significant amount if you aren't going to be near a power outlet.
don't get me wrong, it's not that I don't want to see improvements in linux power management, it's just that I don't see 'suspend' replacing 'off' as being either realistic or desirable for all cases.
Posted Sep 7, 2009 3:22 UTC (Mon)
by daniels (subscriber, #16193)
[Link]
Posted Sep 7, 2009 4:55 UTC (Mon)
by eru (subscriber, #2753)
[Link] (1 responses)
Yes, they do. And a lot. Plugging them into a power meter is eye-opening. For example a few years old Dell desktop PC I tested consumes 9 watts while supposedly "off", and my DTV set-top box seems to draw about the same amount, whether it is nominally "on" or "off". Google for "standby power waste" for more complaints...
With CO2 reduction being the hot issue, authorities have started to look into this. EU and some other legislators have already set standby power limits that will gradually come into effect.
A few years on, only consumer devices that are draw practically no current at all when "off" will be seen acceptable. For Linux to live on then, it either has to boot fast, or be reliably able to suspend into non-volatile memory.
Posted Sep 7, 2009 11:02 UTC (Mon)
by etienne_lorrain@yahoo.fr (guest, #38022)
[Link]
Posted Sep 7, 2009 22:11 UTC (Mon)
by ignacio.hernandez (guest, #56157)
[Link] (3 responses)
Posted Sep 8, 2009 8:24 UTC (Tue)
by drag (guest, #31333)
[Link] (1 responses)
Posted Sep 8, 2009 20:26 UTC (Tue)
by njs (subscriber, #40338)
[Link]
Not that this is terribly relevant to anything.
Posted Sep 8, 2009 16:22 UTC (Tue)
by HenrikH (subscriber, #31152)
[Link]
Posted Sep 7, 2009 6:03 UTC (Mon)
by jmm82 (guest, #59425)
[Link]
I would not be surprised if Debian did not care about boot time given my earlier logic. Most people who care about boot time will probably be using Ubuntu or a similar distro.
Posted Sep 10, 2009 7:36 UTC (Thu)
by PaulWay (guest, #45600)
[Link]
The real value of upstart is that it also handles the hibernate and suspend processes. Instead of having separate, semi-magical scripts to handle the suspend process, upstart uses the same scripts as it would if it were shutting down whatever needs to be shut down. That way you know that everything's shut down cleanly and the developer hasn't patched a bug in one place and not in another.
In fact, you can extend this to most state transitions. Turn the wifi and bluetooth off to save power or to work in an airplane? Upstart will do the same things as required if you were shutting down the whole laptop. Shut down the USB devices to save power? Same thing. You get to use the same scripts all the way through, meaning everything is much more robust. And those start-ups start everything up in parallel, just like your boot sequence.
So upstart is great for speed and flexibility whether you suspend, hibernate, shut down, or whatever.
Have fun,
Paul
Posted Sep 10, 2009 16:37 UTC (Thu)
by Quazatron (guest, #4368)
[Link]
Posted Sep 11, 2009 0:04 UTC (Fri)
by cjwatson (subscriber, #7322)
[Link]
(Petter said much the same thing at the boot BoF at DebConf, IIRC.)
Posted Sep 6, 2009 18:25 UTC (Sun)
by muwlgr (guest, #35359)
[Link] (1 responses)
Posted Sep 6, 2009 21:20 UTC (Sun)
by keybuk (guest, #18473)
[Link]
Posted Sep 6, 2009 23:01 UTC (Sun)
by k8to (guest, #15413)
[Link] (2 responses)
Posted Sep 7, 2009 0:01 UTC (Mon)
by sbergman27 (guest, #10767)
[Link] (1 responses)
I would say that this will make troubleshooting *easier*. Especially since it is a simple config option to tell Upstart whether to allow concurrent operations, or to serialize the entire process.
Posted Sep 16, 2009 18:47 UTC (Wed)
by oak (guest, #2786)
[Link]
One can get from Upstart scripts a graph of how the things are started
Posted Sep 7, 2009 0:58 UTC (Mon)
by jimmybgood (guest, #26142)
[Link] (2 responses)
Posted Sep 7, 2009 1:34 UTC (Mon)
by jgg (subscriber, #55211)
[Link] (1 responses)
Stuff starting at boot largely has an ordering dependency because that daemons are not really all that clever.
NFS needs to wait for networking? Well, the NFS daemon should be able to block on netlink until a route to the server appears.
Why does ssh need to wait? Because in some cases resolv.conf isn't updated yet, or the address it wants to bind to doesn't exist yet. Fix that and the dependency goes away.
Hopefully upstart has a better dependency mechanism than just waiting for script X to finish. It should be things like 'run me after /var/run/ appears' or 'run me after eth0 appears' or 'run me after a route to x.x.x.x appears'. ie things that actually matter.
Posted Sep 7, 2009 10:26 UTC (Mon)
by hppnq (guest, #14462)
[Link]
Not really. The problem is that the distribution's boot sequence logic is not as clever as some of the programs it is trying to start, or the kernel it is running. Dependencies will always be there: it will remain difficult to communicate with another host if that host cannot be reached. The boot sequence dependencies are special because both the events, as well as their ordering, in most cases, are known in advance. You can circumvent them with the existing init/init.d setup as well -- but it is probably a better idea to move to an infrastructure that is more event based and up-to-date.
(Where it matters, some programs have or should been clever enough to detect and solve dependency problems themselves -- IIRC for years it has been possible to bind() to a certain address while the interface is not there yet. This, however, has for obvious reasons never been perceived as the solution to the problem that a user may plug in to different networks, or forgets to plug in the network cable in time.)
Posted Sep 7, 2009 9:11 UTC (Mon)
by lmb (subscriber, #39048)
[Link]
A move to an event-based system is very desirable.
Debian switching to upstart
Debian switching to upstart
Debian switching to upstart
Debian switching to upstart
Debian switching to upstart
Precisely Speed is not the main factor (but a nice side effect)
Debian switching to upstart
Will it make it faster or slower?
Will it make it faster or slower?
Will it make it faster or slower?
Is boot speed really important?
Will it make it faster or slower?
boot speed matters
Bluray player, router, etc. to be available as quickly as possible.
boot speed matters
boot speed matters
boot speed matters
battery later when they turn it on rather than have it drain away in 'suspend mode'
between uses of their laptop, on battery.
Apple has done for their 10 or so models, this conversation wouldn't be happening. Everyone I
know with a mac laptop simply closes the lid when they're done using it, no question. It's nearly
100% reliable and uses extremely minimal power.
boot speed matters
boot speed matters
In fact I would not be surprised that most modern appliances use current while "shut off". Not that I have done a survey.
boot speed and standby power waste
boot speed and standby power waste
Long time ago there was good old voltage transformer made of steel, energy leaks were low, but now that is really different.
boot speed matters
boot speed matters
boot speed matters
boot speed matters
Will it make it faster or slower?
Will it make it faster or slower?
Will it make it faster or slower?
Is boot speed really important?
I have several systems, and not one of them suspends/resumes cleanly (let alone hibernate!).
And while I envy the people who use iMacs, that simply open the laptop and are ready to work, I prefer the freedom Linux gives me.
So I'll keep shutting down the machines until the day comes when suspend/resume works reliably (or nvram becomes standard, whatever comes first).
Will it make it faster or slower?
Ubuntu - not a big deal
Ubuntu - not a big deal
Debian switching to upstart
Debian switching to upstart
Debian switching to upstart
chance for timing related issues (-> whether something works, differs from
machine to machine). Besides switching to Upstart, the race conditions
exposed by parallelism need to be fixed.
which helps a bit on understanding how things work.
Debian switching to upstart
Debian switching to upstart
Debian switching to upstart
Therein lies the actual problem.
Debian switching to upstart