LWN.net Logo

Faster is nice but ...

Faster is nice but ...

Posted Sep 20, 2003 6:35 UTC (Sat) by AnswerGuy (subscriber, #1256)
Parent article: Boot Linux faster (IBM developerWorks)

While we might tune a system for faster reboots which this technique,
it begs the main questions (why are we rebooting). The cost of a few
minutes in rc script handling on each reboot is amortized over months
of uptime in the common case of production servers.

Also other factors are far more important. For example, I would like
to see the major distributions perform an in depth failure point
analysis of the start-up sequence to make the whole process more
robust.

For example a corrupt /etc/ld.so.cache file used to cause a system
to just go belly up on boot and required someone to manually boot
from rescue disc/diskette and run ldconfig. (This appears to have been
fixed in recent versions of the ld libraries somehow).

As another example what happens if /proc or /dev is missing or corrupt?
If /proc is missing a /proc mount point should be created (obviously
/ must be mounted read-write for that --- which better NOT require a
mounted /proc --- so Red Hat's volume label laden /etc/fstab is a problem
here). If /proc is non-empty it should be renamed, a warning message
should be logged and a new /proc should be made. (Inadvertantly restoring
an archive of /proc and then booting the restored system can cost alot of
space hidden under the virtual /proc -- kcore being the main culprit;
that's easier to do than you might think).

This is just the tip of the iceberg.

To be truly more robust the kernel should have a couple command
line parameter changes: root= should take a list of possibilities
(comma, colon or semicolon delimited), and the watchdog driver should
take an option to pre-initialize itself. On a panic or watchdog timeout
(perhaps requiring a change to the reboot() system call --- adding a new
magic value and/or flag --- to indicate that the reboot is a result of a
panic or watchdog failure, or otherwise that the kernel should reboot
--- USING THE NEXT AVAILABLE rootfs (from the list).

This whole approach would allow us to build systems that would
automatically try alternate root filesystems removing yet another
SPOF from the overall system.

Of course the bootloader and BIOS are also in this SPOF chain.
Ultimately, we need LinuxBIOS or OpenBIOS to implement this sort of
failover list.

(I've worked with one motherboard that had a similar feature, there
was a CMOS setting and BIOS code to pre-initialize the watchdog timer
and cycle through the available drives until some copy of Linux came
up to start petting the watchdog in time; that was a custom change
for Snapserver/GuardianOS (Linux) NAS servers).

My point is: speed is NOT of the essence here; reliability is more
important.

As for changing the system to use 'make' instead of sh to run all
of the rc scripts ... nice idea with a certain elegance; but the
automated (package manager driven) maintenance of start up scripts
is too valuable to throw away for a few seconds boot time savings.
I want to see how the package maintainers will drop their start/stop
scripts into place under such a scheme. They we have to re-train
EVERY package maintainer (at least for one distribution) to handle
this new scheme.

Maintainability and service-ability are also key considerations for
production systems and professsional administration.

I also think we should move sshd and it's libraries unto the root fs
for the increasingly common case: colo facility or other remotely
administered machines.


(Log in to post comments)

Faster is nice but ...

Posted Sep 20, 2003 8:00 UTC (Sat) by ressu (subscriber, #14615) [Link]

"While we might tune a system for faster reboots which this technique,
it begs the main questions (why are we rebooting)."

There was a pretty good comment on slashdot on this, the main reason for rebooting seems to be that people who can't/don't want to keep their machines running 24/7 want to shut it down from time to time. This issue is the biggest with laptop users, who ofcourse can use suspended mode, but even that is out of the question in most cases.

ofcourse, i can't see the issue with boot times at all, i got used to my old P166 booting W2K, and it took somewhere around 15 mins from power on to reach usable state.

Faster is nice but ...

Posted Sep 20, 2003 23:38 UTC (Sat) by euvitudo (subscriber, #98) [Link]

I currently have two laptops: one for personal use, one for work. I can suspend the personal laptop just fine, and do not have to reboot it. However, since I currently have no battery, I have to reboot if I need to move it across the room, or unplug it for any reason. However, this is really not a problem.

The real problem I have is with my work computer. It has fairly new hardware (February/March 2003), which is the main problem. APM works, but when I wake up the system, it is frozen, and totally unusable. ACPI seems to work somewhat, but I still have problems with it freezing. This is due to an integrated mini-pci wireless card, and/or the Radeon 9000 that X can't seem to bring back up. Quite annoying really. Hence, I have to reboot it when I travel to and from work.

Therefore, faster reboot times are relevant and could be very useful to me--at least until ACPI is supported for my machine.

Cheers!

Faster is nice but ...

Posted Sep 22, 2003 0:58 UTC (Mon) by gdt (subscriber, #6284) [Link]

Reboot time matters in telco and control applications. Watchdog timer fails and reboots OS uncleanly. That eats into the 5 minutes of outage per annum implied by 99.999% availability.

Of more value to the typical distribution is the listing of dependencies in the init script. No more guessing at init seqeunce numbers.

Note that parallel starting of daemons can also make the machine boot slower (eg, if daemons fight over disk access). So dependencies need to be a bit more refined than suggested by the article).

Faster is nice but ...

Posted Sep 20, 2003 17:51 UTC (Sat) by NAR (subscriber, #1313) [Link]

While we might tune a system for faster reboots which this technique, it begs the main questions (why are we rebooting). The cost of a few minutes in rc script handling on each reboot is amortized over months of uptime in the common case of production servers.

I think, this article is written to those people who use Linux on their desktops at home like me. They don't keep the PC turned on 24/7 and reboot to an other operating system frequently to play a game or browse some "IE-only" site. They might "enjoy" a Linux booting process 2-3 times a day. But I think the seconds won by starting some services parallel are negligible to the minute KDE 2 takes to start up...

Bye,NAR

Faster is nice but ...

Posted Sep 21, 2003 19:04 UTC (Sun) by danielos (guest, #6053) [Link]

As for changing the system to use 'make' instead of sh to run all of the rc scripts ... nice idea with a certain elegance; but the automated (package manager driven) maintenance of start up scripts is too valuable to throw away for a few seconds boot time savings. I want to see how the package maintainers will drop their start/stop scripts into place under such a scheme. They we have to re-train EVERY package maintainer (at least for one distribution) to handle this new scheme.
Well, I guess it's not so hard to add boot depend in a special file parsed by debhelper script, and this is only for service package. For example if you maintain apache you create a file (i.e. boot-depend) in debian dir and add a line in rules (i.e. dh_bootdepend).

The only work is the script to generate a makefile, and this is almost simple

It seems simple to me

Faster is nice but ...

Posted Sep 22, 2003 20:53 UTC (Mon) by ernest (subscriber, #2355) [Link]


When an error occurs, parallelizing will add a lot of confusion as messages get mixed on screen.

a solution would be to "store" all messages from a startup script until it exits, and only then print this on screen

Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds