Andrew Tanenbaum is a huge figure in the field of computer science;
developers who work in the area of operating systems tend to have at least
one of his books on their shelf. Linus Torvalds also occupies a prominent
position. But when these two people are discussed together, the topic is
![[Linus and Andrew]](/images/conf/lca2007/lt-ast-sm.jpg)
almost always the
famous
debate between the two which happened early in the history of Linux.
Mr. Tanenbaum called Linux "obsolete," and made it clear that he would not
have been proud to have Mr. Torvalds as a student; Linus made some choice
comments of his own in return.
So it was pleasant to see Andrew Tanenbaum introduced in Sydney by none
other than Linus Torvalds. According to Linus, Andrew introduced him to
Unix by way of Minix. Minix also convinced Linus (wrongly, he says) that
writing an operating system was not hard. The similarities between the
two, he said, far outweigh any differences they may have had.
The talk began with a quoting of Myhrvold's laws: (1) software is a
gas which expands to fill its container, and (2) software is getting
slower faster than hardware is getting faster. Software bloat, he says, is
a huge problem. He discussed the size of various Windows releases, ending
up with Windows XP at 60 million lines. Nobody, he says,
understands XP. That leads to situations where people - even those well
educated in computer science, do not understand their systems and cannot
fix them.
The way things should be, instead, is described by the "TV model."
Generally, one buys a television, plugs it in, and it just works for ten
years. The computer model, instead, goes something like this: buy the
computer, plug it in, install the service packs, install the security
patches, install the device drivers, install the anti-virus application,
install the anti-spyware system, and reboot...
...and it doesn't work. So call the helpdesk, wait on hold, and be told to
reinstall Windows. A recent article in the New York Times reported that
25% of computer users have become so upset with their systems that they
have hit them.
So what we want to do is to build more reliable systems. The working
definition of a reliable system is this: a typical heavy user never
experiences a single failure, and does not know anybody who has ever
experienced a failure. Some systems which can meet this definition now
include televisions, stereos, DVD players, cellular phones (though some in
the audience have had different experiences), and automobiles (at least,
with regard to the software systems they run). Reliability is possible,
and it is necessary: "Just ask Grandma."
As an aside, Mr. Tanenbaum asked whether Linux was more reliable than
Windows. His answer was "probably," based mainly on the fact that the
kernel is much smaller. Even so, doing some quick back-of-the-envelope
calculations, he concluded that there must be about 10,000 bugs in the
Linux kernel. So Linux has not yet achieved the level of reliability he is
looking for.
Is reliability achievable? It was noted that there are systems which can
survive hardware failures; RAID arrays and ECC memory were the examples
given. TCP/IP can survive lost packets, and CDROMs can handle all kinds of
read failures. What we need is a way to survive software failures too.
We'll have succeeded, he says, when no computer comes equipped with a reset
button.
It is time, says Mr. Tanenbaum, to rethink operating systems. Linux, for
how good it is, is really a better version of Multics, a system which dates
from the 1960's. It is time to refocus, bearing in mind that the
environment has changed. We have "nearly infinite" hardware, but we have
filled it with software weighed down with tons of useless features. This
software is slow, bloated, and buggy; it is a bad direction to have taken.
To achieve the TV model we need to build software which is small, modular,
and self-healing. In particular, it needs to be able to replace crashed
modules on the fly.
So we get into Andrew Tanenbaum's notion of "intelligent design," as
applied to software. The core rules are:
- Isolate components from each other so that they cannot interfere
with each other - or even communicate unless there is a reason to do
so.
- Stick to the "principle of least authority"; no component should have
more privilege than it needs to get its job done.
- The failure of one component should not cause others to fail.
- The health of components should be monitored; if one stops operating
properly, the system should know about it.
- One must be prepared to replace components in a running system.
There is a series of steps to take to apply these principles. The first is
to move all loadable modules out of the kernel; these include drivers,
filesystems, and more. Each should run as a separate process with limited
authority. He pointed out that this is beginning to happen with Linux with
the interest in user-space drivers - though it is not clear how far Linux
will go in that direction.
Then it's time to isolate I/O devices. One key to reliability is to do
away with memory-mapped I/O; it just brings too many race conditions and
opportunities for trouble. Access to devices is through I/O ports, and
that is strictly limited; device drivers can only work with the ports they
have been specifically authorized to use. Finally, DMA operations should
be constrained to memory areas which the driver has been authorized to
access; this requires a higher level of support from the hardware, however.
The third step is minimizing privileges to the greatest extent possible.
Kernel calls should be limited to those which are needed to get a job done;
device drivers, for example, should not be able to create new processes.
Communication between processes should be limited to those which truly need
to talk to each other. And, when dealing with communications, a faulty
receiver should never be able to block the sender.
Mr. Tanenbaum (with students) has set out to implement all of this in
Minix. He has had trouble with people continually asking for new features,
but, he has been "keeping it simple waiting for the messiah." That remark
was accompanied with a picture of Richard Stallman in full St. Ignucious
attire. Minix 3 has been completely redesigned with reliability in
mind; the current version does not have all of the features described, but
3.1.3 (due around March) will.
Minix is a microkernel system, so, at the bottom level, it has a very small
kernel. It handles interrupts, the core notion of processes, and the
system clock. There is a simple inter-process communication mechanism for
sending messages around the system. It is built on a request/reply
structure, so that the kernel always knows which requests have not yet been
acted upon.
There is also a simple kernel API for device drivers. These include
reading and writing I/O ports (drivers do not have direct access to ports),
setting interrupt policies, and copying data to and from a process's
virtual address space. For virtual address space access, the driver will
be constrained to a range of addresses explicitly authorized by the calling
process.
Everything else runs in user mode. Low-level user-mode processes include
the device drivers, filesystems, a process server, a "reincarnation
server," an information server, a data store, a network server
(implementing TCP/IP), and more. The reincarnation server's job is to be
the parent of all low-level system processes. It gets notified if any of
them die, and occasionally pings them to be sure that they are still
responsive. Should a process go away, a table of actions is consulted to
see how the system should respond; often that response involves restarting
the process.
If, for example, a disk driver dies, the reincarnation server will start a
new one. It will also tell the filesystem process(es) about the fact that
there is a new disk driver; the filesystems can then restart any requests
that had been outstanding at the time of the failure. Things pick up where
they were before. Disks are relatively easy to handle this way; servers
which maintain a higher level of internal or device state can be harder.
A key point is that most operating system failures in deployed systems tend
to result from transient events. If a race condition leads to the demise
of a device driver, that same race is unlikely to repeat after the driver
is restarted. Algorithmic errors which are repeatable will get fixed
eventually, but the transient problems can be much harder to track down.
So the next best thing is to be able to restart failing code and expect
that things will work better the second time.
There were a number of performance figures presented. Running disk
benchmarks while occasionally killing the driver had the unsurprising
result of hurting performance a bit - but the system continued to run.
Another set of numbers made the claim that the performance impact of the
microkernel architecture was on the order of 5-10%. It's worth noting that
not everybody buys those numbers; there were not a whole lot of details on
how they were generated.
In summary, Mr. Tanenbaum listed a number of goals for the Minix project.
Minix may well be applicable for high-reliability systems, and for embedded
applications as well. But, primarily, the purpose is to demonstrate the
the creation of ultra-reliable systems is possible.
The talk did show that it is possible to code systems which can isolate
certain kinds of faults and attempt to recover from them. It was an
entertaining and well-presented discussion. Your editor has not, however,
noticed a surge of sympathy for the idea of moving Linux over to a
microkernel architecture. So it is not clear whether the ideas presented
in this talk will have an influence over how Linux is developed in the
future.
(
Log in to post comments)