Faster is nice but ...
Posted Sep 20, 2003 6:35 UTC (Sat) by
AnswerGuy (guest, #1256)
Parent article:
Boot Linux faster (IBM developerWorks)
While we might tune a system for faster reboots which this technique,
it begs the main questions (why are we rebooting). The cost of a few
minutes in rc script handling on each reboot is amortized over months
of uptime in the common case of production servers.
Also other factors are far more important. For example, I would like
to see the major distributions perform an in depth failure point
analysis of the start-up sequence to make the whole process more
robust.
For example a corrupt /etc/ld.so.cache file used to cause a system
to just go belly up on boot and required someone to manually boot
from rescue disc/diskette and run ldconfig. (This appears to have been
fixed in recent versions of the ld libraries somehow).
As another example what happens if /proc or /dev is missing or corrupt?
If /proc is missing a /proc mount point should be created (obviously
/ must be mounted read-write for that --- which better NOT require a
mounted /proc --- so Red Hat's volume label laden /etc/fstab is a problem
here). If /proc is non-empty it should be renamed, a warning message
should be logged and a new /proc should be made. (Inadvertantly restoring
an archive of /proc and then booting the restored system can cost alot of
space hidden under the virtual /proc -- kcore being the main culprit;
that's easier to do than you might think).
This is just the tip of the iceberg.
To be truly more robust the kernel should have a couple command
line parameter changes: root= should take a list of possibilities
(comma, colon or semicolon delimited), and the watchdog driver should
take an option to pre-initialize itself. On a panic or watchdog timeout
(perhaps requiring a change to the reboot() system call --- adding a new
magic value and/or flag --- to indicate that the reboot is a result of a
panic or watchdog failure, or otherwise that the kernel should reboot
--- USING THE NEXT AVAILABLE rootfs (from the list).
This whole approach would allow us to build systems that would
automatically try alternate root filesystems removing yet another
SPOF from the overall system.
Of course the bootloader and BIOS are also in this SPOF chain.
Ultimately, we need LinuxBIOS or OpenBIOS to implement this sort of
failover list.
(I've worked with one motherboard that had a similar feature, there
was a CMOS setting and BIOS code to pre-initialize the watchdog timer
and cycle through the available drives until some copy of Linux came
up to start petting the watchdog in time; that was a custom change
for Snapserver/GuardianOS (Linux) NAS servers).
My point is: speed is NOT of the essence here; reliability is more
important.
As for changing the system to use 'make' instead of sh to run all
of the rc scripts ... nice idea with a certain elegance; but the
automated (package manager driven) maintenance of start up scripts
is too valuable to throw away for a few seconds boot time savings.
I want to see how the package maintainers will drop their start/stop
scripts into place under such a scheme. They we have to re-train
EVERY package maintainer (at least for one distribution) to handle
this new scheme.
Maintainability and service-ability are also key considerations for
production systems and professsional administration.
I also think we should move sshd and it's libraries unto the root fs
for the increasingly common case: colo facility or other remotely
administered machines.
(
Log in to post comments)