LCA: Andrew Tanenbaum on creating reliable systems

[Posted January 18, 2007 by corbet]

Andrew Tanenbaum is a huge figure in the field of computer science; developers who work in the area of operating systems tend to have at least one of his books on their shelf. Linus Torvalds also occupies a prominent position. But when these two people are discussed together, the topic is

almost always the famous debate between the two which happened early in the history of Linux. Mr. Tanenbaum called Linux "obsolete," and made it clear that he would not have been proud to have Mr. Torvalds as a student; Linus made some choice comments of his own in return.

So it was pleasant to see Andrew Tanenbaum introduced in Sydney by none other than Linus Torvalds. According to Linus, Andrew introduced him to Unix by way of Minix. Minix also convinced Linus (wrongly, he says) that writing an operating system was not hard. The similarities between the two, he said, far outweigh any differences they may have had.

The talk began with a quoting of Myhrvold's laws: (1) software is a gas which expands to fill its container, and (2) software is getting slower faster than hardware is getting faster. Software bloat, he says, is a huge problem. He discussed the size of various Windows releases, ending up with Windows XP at 60 million lines. Nobody, he says, understands XP. That leads to situations where people - even those well educated in computer science, do not understand their systems and cannot fix them.

The way things should be, instead, is described by the "TV model." Generally, one buys a television, plugs it in, and it just works for ten years. The computer model, instead, goes something like this: buy the computer, plug it in, install the service packs, install the security patches, install the device drivers, install the anti-virus application, install the anti-spyware system, and reboot...

...and it doesn't work. So call the helpdesk, wait on hold, and be told to reinstall Windows. A recent article in the New York Times reported that 25% of computer users have become so upset with their systems that they have hit them.

So what we want to do is to build more reliable systems. The working definition of a reliable system is this: a typical heavy user never experiences a single failure, and does not know anybody who has ever experienced a failure. Some systems which can meet this definition now include televisions, stereos, DVD players, cellular phones (though some in the audience have had different experiences), and automobiles (at least, with regard to the software systems they run). Reliability is possible, and it is necessary: "Just ask Grandma."

As an aside, Mr. Tanenbaum asked whether Linux was more reliable than Windows. His answer was "probably," based mainly on the fact that the kernel is much smaller. Even so, doing some quick back-of-the-envelope calculations, he concluded that there must be about 10,000 bugs in the Linux kernel. So Linux has not yet achieved the level of reliability he is looking for.

Is reliability achievable? It was noted that there are systems which can survive hardware failures; RAID arrays and ECC memory were the examples given. TCP/IP can survive lost packets, and CDROMs can handle all kinds of read failures. What we need is a way to survive software failures too. We'll have succeeded, he says, when no computer comes equipped with a reset button.

It is time, says Mr. Tanenbaum, to rethink operating systems. Linux, for how good it is, is really a better version of Multics, a system which dates from the 1960's. It is time to refocus, bearing in mind that the environment has changed. We have "nearly infinite" hardware, but we have filled it with software weighed down with tons of useless features. This software is slow, bloated, and buggy; it is a bad direction to have taken. To achieve the TV model we need to build software which is small, modular, and self-healing. In particular, it needs to be able to replace crashed modules on the fly.

So we get into Andrew Tanenbaum's notion of "intelligent design," as applied to software. The core rules are:

Isolate components from each other so that they cannot interfere with each other - or even communicate unless there is a reason to do so.
Stick to the "principle of least authority"; no component should have more privilege than it needs to get its job done.
The failure of one component should not cause others to fail.
The health of components should be monitored; if one stops operating properly, the system should know about it.
One must be prepared to replace components in a running system.

There is a series of steps to take to apply these principles. The first is to move all loadable modules out of the kernel; these include drivers, filesystems, and more. Each should run as a separate process with limited authority. He pointed out that this is beginning to happen with Linux with the interest in user-space drivers - though it is not clear how far Linux will go in that direction.

Then it's time to isolate I/O devices. One key to reliability is to do away with memory-mapped I/O; it just brings too many race conditions and opportunities for trouble. Access to devices is through I/O ports, and that is strictly limited; device drivers can only work with the ports they have been specifically authorized to use. Finally, DMA operations should be constrained to memory areas which the driver has been authorized to access; this requires a higher level of support from the hardware, however.

The third step is minimizing privileges to the greatest extent possible. Kernel calls should be limited to those which are needed to get a job done; device drivers, for example, should not be able to create new processes. Communication between processes should be limited to those which truly need to talk to each other. And, when dealing with communications, a faulty receiver should never be able to block the sender.

Mr. Tanenbaum (with students) has set out to implement all of this in Minix. He has had trouble with people continually asking for new features, but, he has been "keeping it simple waiting for the messiah." That remark was accompanied with a picture of Richard Stallman in full St. Ignucious attire. Minix 3 has been completely redesigned with reliability in mind; the current version does not have all of the features described, but 3.1.3 (due around March) will.

Minix is a microkernel system, so, at the bottom level, it has a very small kernel. It handles interrupts, the core notion of processes, and the system clock. There is a simple inter-process communication mechanism for sending messages around the system. It is built on a request/reply structure, so that the kernel always knows which requests have not yet been acted upon.

There is also a simple kernel API for device drivers. These include reading and writing I/O ports (drivers do not have direct access to ports), setting interrupt policies, and copying data to and from a process's virtual address space. For virtual address space access, the driver will be constrained to a range of addresses explicitly authorized by the calling process.

Everything else runs in user mode. Low-level user-mode processes include the device drivers, filesystems, a process server, a "reincarnation server," an information server, a data store, a network server (implementing TCP/IP), and more. The reincarnation server's job is to be the parent of all low-level system processes. It gets notified if any of them die, and occasionally pings them to be sure that they are still responsive. Should a process go away, a table of actions is consulted to see how the system should respond; often that response involves restarting the process.

If, for example, a disk driver dies, the reincarnation server will start a new one. It will also tell the filesystem process(es) about the fact that there is a new disk driver; the filesystems can then restart any requests that had been outstanding at the time of the failure. Things pick up where they were before. Disks are relatively easy to handle this way; servers which maintain a higher level of internal or device state can be harder.

A key point is that most operating system failures in deployed systems tend to result from transient events. If a race condition leads to the demise of a device driver, that same race is unlikely to repeat after the driver is restarted. Algorithmic errors which are repeatable will get fixed eventually, but the transient problems can be much harder to track down. So the next best thing is to be able to restart failing code and expect that things will work better the second time.

There were a number of performance figures presented. Running disk benchmarks while occasionally killing the driver had the unsurprising result of hurting performance a bit - but the system continued to run. Another set of numbers made the claim that the performance impact of the microkernel architecture was on the order of 5-10%. It's worth noting that not everybody buys those numbers; there were not a whole lot of details on how they were generated.

In summary, Mr. Tanenbaum listed a number of goals for the Minix project. Minix may well be applicable for high-reliability systems, and for embedded applications as well. But, primarily, the purpose is to demonstrate the the creation of ultra-reliable systems is possible.

The talk did show that it is possible to code systems which can isolate certain kinds of faults and attempt to recover from them. It was an entertaining and well-presented discussion. Your editor has not, however, noticed a surge of sympathy for the idea of moving Linux over to a microkernel architecture. So it is not clear whether the ideas presented in this talk will have an influence over how Linux is developed in the future.

Index entries for this article
Conference	linux.conf.au/2007

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 18, 2007 3:53 UTC (Thu) by drag (guest, #31333) [Link] (29 responses)

This sort of thing always sounds so kick-ass.

I mean, seriously, who wouldn't want to spend a extra 50 bucks on their computer to make up for the 10% drop in performance if that makes the computer much much more reliable?

Also since everything is divided up into proccesses then doesn't this sort of think make sense to be used on the 8+ core machines of the future?

Oh well. So much promise, but so little results so far.

Maybe Hurd will have another port from L4 to Minix, like they did from Mach to the L4 kernel.

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 18, 2007 5:46 UTC (Thu) by elanthis (guest, #6227) [Link] (25 responses)

"I mean, seriously, who wouldn't want to spend a extra 50 bucks on their computer to make up for the 10% drop in performance if that makes the computer much much more reliable?"

The problem with that sentiment, and the whole article, is that it focuses solely on the kernel.

I don't think I've had more than 3 or 4 Linux failures in my life, and most of those were when using very new drivers (or NVIDIA).

I have had X crash or lock, various GNOME and KDE components crash or lock, various regular applications crash and lock more times than I can possibly count. Definitely into the triple digits, if not quadruple by now.

If you take Tanenbaum's suggestion to heart, the 5-10% "penalty" of the micro-kernel design is irrelevant, because you won't just be swapping in a micro-kernel underneath the bloated, unreliable layers we've built on top of Linux. You'll be building an entire new system, bottom to top, with less bloat and more reliability. Will that total system have a 5-10% penalty over my current system? I doubt it. You can't even *begin* to speculate, because there are just far, far too many variables to really judge that.

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 18, 2007 6:25 UTC (Thu) by drag (guest, #31333) [Link] (9 responses)

Well in application cases it's probably simplier.

Gnome-session can restart applications that crash and such.

For a while when I logged out of gnome I didn't bother 'logging out', I'd just ctrl-alt-backspace and kill X.

Worked fine, for me. And it was much quicker and guess what? Logging in afterwards seemed a bit quicker also.

Wasn't there a article somewere that read that delt with 'crash proof software' of some sort? (I can't recall it well enough to find it)

The concept was that applications at any point should be always at a state were they can instantly crap out and recover.later. Like a OD that at any point you could sync (truely sync), then kill -9 everything. Next time you reboot everything is back to were you left it.

The other part of the theory is that it allows for much faster shutdowns and reboots. Typically software that has these capabilities is able to recover a session faster then it is able to create a new one, ironicly.

that seems to be the user-land counter part to this Microkernel reliability and that other article "KHB: Recovering Device Drivers: From Sandboxing to Surviving" http://lwn.net/Articles/217119/

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 18, 2007 8:35 UTC (Thu) by oak (guest, #2786) [Link] (7 responses)

> Gnome-session can restart applications that crash and such.

This wasn't much of a consolation when I tried to run Ubuntu on
a system that didn't have enough memory. Nautilus died to kernel
OOM-kill and it was always restarted and as a result, the computer
was unusable. If it wouldn't have tried to continously restart
Nautilus, the system would have been usable. (moral: if it fails
too many times in a row, let it rest in peace)

> The concept was that applications at any point should be always at a
> state were they can instantly crap out and recover later.

But you can still lose data...

Btw. According to my limited experience, if there's a "reliability"
feature which papers over software faults, fixing of those faults will
be delayed (or sometimes not fixed at all) because "everything" works
"well enough" and debugging & fixing things is costly.

"Fault tolerance" should be used only on a system which you do not
expect/cannot fix or update.

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 18, 2007 9:59 UTC (Thu) by filipjoelsson (guest, #2622) [Link] (4 responses)

> "Fault tolerance" should be used only on a system which you do not
> expect/cannot fix or update.

Which is pretty much any end user system.

Sure, I'm a gentooer as well as a programmer - I can easily browse around for a patch in bugzilla, or whip up something on my own. But my wife can't, my brothers can't (engineer all), my parents can't. So, in order to let up on the helpdesk in computer matters (ie me) - fault tolerance would be much appreciated. Let the professionals run without fault tolerance, and give the world some stability!

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 18, 2007 10:44 UTC (Thu) by oak (guest, #2786) [Link] (3 responses)

The effort for making things more fault tolerant could be spent on
making them more bugfree instead.

The problem is that in the long run, the end result could be just
more fault tolerant system, but not more stable one because bugs
aren't found promptly and fixed. Most of the bugs are found by
users, not developers.

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 18, 2007 16:22 UTC (Thu) by mrfredsmoothie (guest, #3100) [Link]

It is not either/or.

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 18, 2007 18:23 UTC (Thu) by emkey (guest, #144) [Link] (1 responses)

Making a system fault tolerant would in theory mask all bugs. Fixing a bug fixes ONE bug. Thus fault tolerance is a much better short to mid term investment. Also, debugging problems is potentially much easier in the fault tolerant model. For example, many bugs can cause a system to become unresponsive. It is thus nearly impossible to gather data that might help in identifying and solving the problem. With a fault tolerant system you could optionally enter some sort of debugging environment when a particular component failed. This could greatly reduce the amount of time needed to fix problems.

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 18, 2007 18:55 UTC (Thu) by oak (guest, #2786) [Link]

Good points, but I've seen "fault tolerance" implementations which
make the system less responsive[1] and/or obliterate the traces of
the actual fault[2]. :-)

[1] Windows virus scanning software repeatedly starting some crashing
service so that opening any application window takes >20 minutes
[2] Linux SW restarting the crashed service which act changes the system
HW state that caused the original crash and results in a different
crash. You could fix the constant service restarts only by examining
the HW state for the first fault

So, I would say that if fault tolerance is done, great care would need
to be taken that it will really help also in finding and fixing the bugs
(by notifying user about the fault, saving data about the fault state,
allowing debugging of the fault when it happens etc), not just hiding
them. And this code should be fairly simple to assure that it actually
works, more complicated code is always harder to maintain and usually
contains more bugs...

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 18, 2007 14:27 UTC (Thu) by pphaneuf (guest, #23480) [Link]

I remember, a very long time ago, Mac OS ("classic") used to be very stable compared to the Windows of the time. And yet, when you looked at the software architecteure, you couldn't help but think this thing ought to fall apart and crash all the time (no memory protection, cooperative multitasking, bounded memory arena, no virtual memory etc). But somehow, it didn't?

Turns out the reason was quite simple. Failures were so spectacular that developers had no choice but to write their software carefully, because when it crashed on them, they had to reboot their entire development environment!

Also, users would tend to notice quickly when their system became less stable, would correlate it to some software they installed recently, then would stop using, or at least would whine about it all the time. So buggy software would just tend not to catch on, because people kicked them off after it crashed their whole system a few times, and they'd tell fellow users to steer clear.

So yes, these are difficult questions. In my opinion, it'd be nice if those automatic recovery features would still notify the user of their action, and try to make the culprit clear, so that there would be some motivation for users to adjust their software usage toward more reliable software, or at least whine on their blogs. ;-)

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 20, 2007 1:07 UTC (Sat) by bluefoxicy (guest, #25366) [Link]

That whole argument is silly. Fault tolerant systems don't COME TO A SCREECHING HALT when they have a fault. When the file system driver dies on Minix, it comes back and life goes on. On Linux, the world stops.

Notice that you can keep going on after disk/FS driver crashes? Know what else you can do? Make logs of the state of the driver at crash (ever core dump a file system?). Linux can do this with kexec and some tricks, although you still could suffer data loss from other applications or manage to critically damage the FS.

What else is interesting is drivers are all small and isolated. The only information you need is the state of the driver; and the driver uses itself entirely. To debug a component, you debug that component; you don't have to worry about the blurred, gray lines between drivers and VFS and such. Things are easier to chew in small bites.

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 18, 2007 11:11 UTC (Thu) by nix (subscriber, #2304) [Link]

The crashproof software stuff was another Val Henson special: Failure-oblivious computing.

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 18, 2007 8:24 UTC (Thu) by mingo (guest, #31122) [Link] (5 responses)

Yes. The other cost is not performance but flexibility of design and /flexibility of bugfixes/. Both matter very much. You dont win more reliability by making bugs harder to fix. A 'monolithic' kernel's state might be harder to debug, but you've got everything in one place - if you need to change a few drivers to fix a bug in a core infrastructure API - no problem, you just do it. If you need to expose a data structure to another subsystem - no problem.

In a microkernel design you have explicit, documented, relied on APIs (which are more like ABIs) between subsystems, making both the ad-hoc sharing of information and fast fixing of those interfaces alot more cumbersome.

Furthermore, if there's a failure in any of the subsystems, i definitely do not want to hide this fact by having a "restart and try again" feature. I really want to achieve a bug free kernel, not a kernel that appears bug-free.

My opinion is that we'll win far more reliability by concentrating on transparent debugging facilities (static ones such as Sparse and dynamic ones such as [plug alert] lockdep), than via limiting the basic flexibility of the kernel's design. I'd rather burn CPU time on running with lockdep enabled to find deadlocks, than to slow down and hinder /all/ kernel development by forcibly isolating components from each other.

Also, there are some areas and subsystems where isolation wins us /more/ flexibility: for example filesystems. But here Linux already has FUSE, which is an /optional/ feature to write filesystems in user-space. NTFS-3G has already proven (by being leagues better than the in-kernel ntfs driver) that at least for that type of filesystem, and in that stage of its lifecycle, development was faster and more flexible in user-space.

Anyway ... we'll see how this works out. I have a huge amount of respect for Mr. Tanenbaum, his books are great and i am sure he is having tons of fun with Minix - and i definitely agree with him that reliability is the #1 challenge of modern OS design. Diversity of opinion and diversity of approach does not bother me, it will only enrich the end result.

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 25, 2007 15:26 UTC (Thu) by tjc (guest, #137) [Link] (4 responses)

Furthermore, if there's a failure in any of the subsystems, i definitely do not want to hide this fact by having a "restart and try again" feature.

My understanding is that MINIX 3 will log server/driver crashes and email the developer if so configured. I can't remember if I read this somewhere here, or in one of the whitepapers.

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 27, 2007 22:50 UTC (Sat) by pascal.martin (guest, #2995) [Link] (3 responses)

Minix will log server/driver crashes? To disk ? even if the disk driver crashed? :-)

Lets assume the disk driver was restarted. What happens if the disk driver crashes again, because of the activity caused by the crash log? 8-)

That may seems silly, but I have seen similar "death trap" problems in actual life.

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 29, 2007 15:22 UTC (Mon) by tjc (guest, #137) [Link]

Well yes, there is some chance of that happening, but there's also some chance that you will be hit by a bus and killed before you read this post.

I expect the logging system works in enough cases to be a benefit.

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 31, 2007 22:50 UTC (Wed) by tjc (guest, #137) [Link] (1 responses)

I just found this bit of information in the paper "Reorganizing UNIX for Reliability"

If crashes reoccur, a binary exponential backoff protocol could be used to prevent bogging down the system with repeated recoveries.

Unfortunately, no specifics are given. It sounds like something from Star Trek TNG.

Data: "Captain, I could use an binary exponential backoff protocol to restart the warp engines."

Picard: "Very good Mr. Data -- make it so!"

http://www.minix3.org/doc/ACSAC-2006.pdf

exponential backoff

Posted Feb 1, 2007 12:54 UTC (Thu) by robbe (guest, #16131) [Link]

Exponential backoff is a standard technique used, for example by mail
servers, in the face of transient failures: after the n-th consequitve
error, wait f * k^n seconds, then retry. Suitable values for f and k
depend on the application -- k is often 2 -> binary exponential backoff.

Example with f = 300, i.e. 5 minutes (a viable value for SMTP):

* First try ... fails
* Wait 5 minutes
* Second try ... fails
* Wait 10 minutes
* Third try ... fails
* Wait 20 minutes
* Fourth try ... fails
* Wait 40 minutes
* Fifth try ...
etc.

It would work the same for OS-component restart, of course with values
for f in the milliseconds.

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 18, 2007 12:21 UTC (Thu) by lysse (guest, #3190) [Link] (1 responses)

I thought QNX had settled the whole "microkernels are a performance sink" question a long time ago?

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 18, 2007 13:41 UTC (Thu) by RobSeace (subscriber, #4435) [Link]

I'm not sure it really has... Or, if it has, then it settled it in the
affirmative, in my mind, at least...

I've written commercial code running under versions of QNX from 2.x to 4.x
for many, many years now, and let me tell you: it doesn't even come close to
living up to its hype... Speed? Sure, local IPC via Send()/Receive()/Reply()
is nice and quick, and the same remotely isn't bad, either... But, that's
about the only thing it's got going for it, speedwise... Compare normal
real-world-used standard IPC interfaces, such as pipes or TCP/IP, and the
situation changes dramatically, because they're all welded on almost as an
afterthought, and built on top of that fast SRR messaging, but adding more
layers to go through (and often multiple user-space processes that need to
be communicated with in order to get things done), rather than being first-class
interfaces in their own right... And, reliability? I've seen FAR more
examples of QNX crashing in various unpleasant ways than I ever have seen
from Linux (or pretty much anything outside of a Microsoft OS)... Sure,
most stuff is just a user-space app; but, if your "Dev" app or your "Fsys"
app goes away, you end up pretty well screwed, just as badly as if it were
part of the kernel itself...

QNX has lots of nifty features, and it's great for some specific uses...
For embedded systems, it's probably perfect... But, for a normal server or
desktop/workstation computer for normal everyday use, it's absolutely horrible,
and Linux has it outclassed by miles in every possible area I can think of...
This coming from someone whose main workstation ran QNX 4.x for many years,
so I'm not just making stuff up... When I switched my main dev environment
to Linux, it was absolute nirvana in comparison... Maybe it's because I
come from a Unix background, and QNX is just enough like Unix to make you
frustrated at all the ways it's NOT like Unix... But, whatever it is, I
find myself a LOT happier programming under Linux, that's for sure... (And,
when Linux differs from standard Unix stuff, it's usually in a much more
pleasant and superior way, rather than a frustratingly annoying way... ;-))

(And, note: I have no experience with the newest incarnations of QNX;
"Neutrino", or whatever it is they're calling it these days... 4.2x was the
last version I ever dealt with... So, maybe all my complaints are baseless
these days... *shrug* I've heard they've moved to gcc instead of that lousy
Watcom compiler, so that'd be ONE major improvement right there...)

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 18, 2007 17:52 UTC (Thu) by JoeBuck (subscriber, #2330) [Link] (4 responses)

But the X server (at least large parts of it) is like an extended kernel. It runs as root, accesses the hardware directly, without going through the kernel, so bugs have the same ability to toast your system as the kernel does.

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 18, 2007 20:54 UTC (Thu) by jamesh (guest, #1159) [Link] (3 responses)

Well, the Minix setup basically required all IO port access to go through the kernel, and the policies for each daemon would say what it was allowed to access.

As for intelligent hardware like a modern GPU that can DMA to arbitrary memory locations, his solution was to use the IOMMU to limit where the device could write to. It wasn't clear whether they've implemented use of the IOMMU like this yet.

I've got no idea what impact this would have on performance of graphics operations.

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 18, 2007 22:42 UTC (Thu) by nix (subscriber, #2304) [Link] (2 responses)

So Minix I/O port access always involves at least two ring transitions
*per port I/O*?

Given the timing-sensitive nature of much port I/O that strikes me as both
wildly impractical and somewhat dangerous.

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 24, 2007 14:01 UTC (Wed) by kleptog (subscriber, #1183) [Link] (1 responses)

On i386 hardware, it's possible to grant an unprivelidged processes access to particular I/O ports, without having to do any ring transitions. On Linux it's the ioperm() function call.

These days people use memory-mapped I/O so mmap() is what you mostly need.

LCA: Andrew Tanenbaum on creating reliable systems

Posted Feb 2, 2007 13:46 UTC (Fri) by willy (subscriber, #9762) [Link]

Yes, but minix explicitly doesn't do ioperm, it really does call down to the microkernel to do IO port accesses. He talked about how 'evil' mmaped IO was.

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 18, 2007 20:14 UTC (Thu) by eklitzke (subscriber, #36426) [Link]

The problem with that sentiment, and the whole article, is that it focuses solely on the kernel. I don't think I've had more than 3 or 4 Linux failures in my life, and most of those were when using very new drivers (or NVIDIA). I have had X crash or lock, various GNOME and KDE components crash or lock, various regular applications crash and lock more times than I can possibly count. Definitely into the triple digits, if not quadruple by now.

I tend to agree with you here. The kernel is very stable -- I've only had one real, bona fide kernel oops in the past 18 months or so (I think it was pdflush that crashed it). And I can't even begin to count how many times X has totally locked up the system (usually after starting a misbehaving Gnome application). But that just means that those applications just need to implement a fault tolerant model as well. It's totally unacceptable that an application can cause X to lock up the whole computer. If X was self-healing that would be spectacular.

A lot of the most modular pieces of software on my system (I am thinking particularly of Postfix and Apache) are also the most stable. TCP/IP is another example of a modular (well, layered) system that is particularly resilient to failure. Certainly this level of modularity isn't needed in all cases, but for any really critical software I think that taking some lessons from the microkernel model is a great idea.

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 19, 2007 20:39 UTC (Fri) by dark (guest, #8483) [Link]

Still, I'd happily give up 90% of my computing power in exchange for a
reliable system. That'll put is back about 7 years in terms of hardware
development. I was happy enough in 2000, I can live with that :-)

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 18, 2007 13:26 UTC (Thu) by vonbrand (subscriber, #4458) [Link] (2 responses)

Minor problems in sight here...

To get decent performance out of current hardware the system has to be able to shove large amounts of data in one go. Bye, bye "Use IO to do data movement"
The whole "serialize request, send it over, unserialize, check, act, serialize results, send them back, unserialize, check" business is costly on current hardware. The relative cost of context switches is going up, so this isn't getting cheaper.
I just fail to see how microkernels (which require each separate part of the system to be able to handle multiple requests simultaneously (if you don't want to make even a single user system unbearably slow) can win over handling the whole synchronization problem once
Once everybody has to keep enough state to know how to restart requests that failed because the server crashed, we are in a whole new universe of pain. The tendency is exactly in the opposite direction: TCP is so nice because it handles all sorts of problems in the underlying net transparently. Few people are able to write software that is able to handle random failures, that is the reason why highly-reliable software is so expensive to produce.

A nice pipe dream.

Yes, I know that way back when people resisted compilers for the fear of loosing complete control over the machine, and getting slower programs. I do know that with today's plummeting hardware costs and balloning capabilities, and the ever better compilers and subtly changing hardware underneath, that it is madness to write complete programs in assembler. Maybe awt's time will come, but not in the near future.

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 18, 2007 14:18 UTC (Thu) by nix (subscriber, #2304) [Link]

Agreed with everything you say: I'm just being picky here and pointing out that your (common) typo of `loosing' for `losing' completely inverts the meaning of one sentence in your post :)

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 18, 2007 19:21 UTC (Thu) by oak (guest, #2786) [Link]

Amen.

However, you missed this one:
The health of components should be monitored; if one stops operating
properly, the system should know about it.

I.e. polling / wakeups? -> goodbye battery life

I've also seen a case where the monitor thought the component was
misbehaving and killed & restarted it constantly. Yes, the component
was not communicating "according to spec" but from the user's perspective
it worked correctly. Killing was worse than letting it live and constant
restarting of course also drains the battery.

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 18, 2007 4:20 UTC (Thu) by jwb (guest, #15467) [Link] (19 responses)

It sounds like the guy doesn't do that much with his computer. People who play games or edit broadcast video are using their computers right up to the very edge of their capabilities. Consider a GPU. Most GPUs cannot survive a stream of bad commands. If you send the wrong command you will deadlock the part and the computer will need to be reset. You could redesign the GPU to analyze the incoming command stream and reject bad commands, returning to a known-good state afterwards. Basically you want the GPU to be a device on a network with its own operating system. That would not be cheap nor easy, and the resulting system will be considerably slower.

High performance software relies on tricks which are, for the most part, quite unsafe. Your 3D game only works because the GPU is allowed unchecked access to main memory. If you were to start being careful about that access, performance will suffer. The same is true of cluster supercomputing with remote DMA.

Perhaps Tanenbaum envisions two classes of computer users: those who are willing to absorb the performance hit (because they only run PINE) as opposed to those who demand all the capabilities technology can offer.

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 18, 2007 6:54 UTC (Thu) by khim (subscriber, #9252) [Link] (6 responses)

If you send the wrong command you will deadlock the part and the computer will need to be reset.

Computer ? Probably not. GPU ? Absolutely. It can be done without any hardware redesign. ATI drivers for Windows are doing it (not so sure about Linux ones).

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 19, 2007 4:40 UTC (Fri) by drag (guest, #31333) [Link] (5 responses)

Well I asked this on the Xorg mailing list and the basic respons was when the drivers in Linux bork, X restarts. Also you can do the ctl-alt-backspace to break out of X when it locks up. This is basicly what Vista has implimented, but in a more automatic manner.

If the driver fails in a manner that borks the hardware then your screwed irregardless in both operating systems.

That's my limited understanding. The details on the internet on what actually happens with the video card reset features in Vista is few and far between. So don't take it as the gospil truth.

The video drivers in Linux have always been userspace, which is suppose to be a big new feature for Vista. The 'DRM' Linux kernel modules allow the 'DRI' drivers (drivername_dri.so) to control the hardware. As have been the USB drivers, optionally.

Another example seems like Microsoft is full of shit about a lot of other aspects of Vista that are touted as huge improvements.

For example they toute their 'resolution independant ui'. this is to make your UI to work better with very high resolution displays. In reality all this means is that you can change the DPI for the display (with a reboot, I beleive).

Effectively they just implimented a feature that was aviable in other OSes for years and years and it's this big new selling point for them.

We need clients that can survive an X restart.

Posted Jan 19, 2007 18:43 UTC (Fri) by AJWM (guest, #15888) [Link] (4 responses)

> when the drivers in Linux bork, X restarts.

The problem is, that pretty much takes all the X clients with it, so you've lost your whole session anyway.

Now, arguably that's the fault of the client(s) rather than the X server, but I've yet to see an X client program written that could gracefully recover (ie to the point of picking up exactly where it left off) if its X display was yanked out from under it and restarted. It's been long enough since I programmed at the Xlib level that I don't recall how much state (of windows, etc) is in the server vs the client (via the X libraries), but I imagine that enough state _could_ be kept in the client (again, courtesy of the xlib) to redisplay to the new X server everything that was there when the old one borked.

It'd be an interesting and non-trivial programming exercise, but a massively useful one. (Nothing worse than having numerous windows open, having your X display lock up, and knowing that the only thing you can do is blow it all away even though the client programs are still perfectly OK - or will be until their display connection is killed.)

We need clients that can survive an X restart.

Posted Jan 19, 2007 21:27 UTC (Fri) by jwb (guest, #15467) [Link] (3 responses)

Actually X11 is (used to be) stateless. These days with backing stores and compositing it's not quite true. GTK+ has had for years the ability to disconnect from one display and reconnect to another, which also means that you can connect it to a dummy display while your real display restarts. However this toolkit capability has been long neglected by application writers.

We need clients that can survive an X restart.

Posted Jan 20, 2007 0:31 UTC (Sat) by drag (guest, #31333) [Link]

Well I would think that compositing would make it easier to make things stateless, since windows and such are not rendered on the actual display but in off-screen buffers.

I'd think you'd just have to keep those off-screen buffers alive and when the main display comes back up then you "re-composite" them.

Maybe you need a different thread for the application management vs the part of the X server that does the actual rendering or something, I don't know.

We need clients that can survive an X restart.

Posted Jan 20, 2007 8:21 UTC (Sat) by cworth (subscriber, #27653) [Link] (1 responses)

> GTK+ has had for years the ability to disconnect from one display and
> reconnect to another, which also means that you can connect it to a dummy
> display while your real display restarts.

There is a missing piece here though. The GTK+ code can successfully
migrate an X connection through a client-initiated disconnect. But it
turns out that design flaws in Xlib make it impossible for a client
to cleanly recover from an X server that disappears out from under the
client.

I've actually looked into what it would take to retrofit Xlib to add
what's missing. It'd be possible, but it would require a programmer
with a stronger constitution than I have to wade through the Xlib
internals to make the fix. And then one would still need to fixup
GTK+ to properly respond to the new XServerDisconnected event that
would have to be added.

Meanwhile, a more realistic approach is to get toolkits to switch to
XCB which doesn't suffer from the same shortcoming as Xlib in this area.

> However this toolkit capability has been long neglected by application
> writers.

I agree that there are some interesting aspects of migrating applications
from one X server to another that applications aren't taking advantage of.

But for the idea of replacing an X server for an entire session---I'd much
rather that be something that not require any application knowledge at all.
That's a much quicker route to making it work reliably for as many
applications as possible.

-Carl

We need clients that can survive an X restart.

Posted Jan 22, 2007 5:19 UTC (Mon) by elanthis (guest, #6227) [Link]

With XCB around, is it really necessary to retrofit Xlib with those changes? Either way, client apps will need to be updated, and XCB brings a lot of other benefits with it, no? Most apps use a toolkit, so once you get the major ones ported (including whatever today's popular Motif clone is, and Tk) you should be set.

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 18, 2007 11:28 UTC (Thu) by nix (subscriber, #2304) [Link] (7 responses)

What's more, banning DMA has a *really* high price. Yes, bus-mastering DMA means that misprogrammed hardware can scribble over any memory it likes: but the cost of avoiding it is immense (certainly far more than 5% in e.g. I/O-bound loads).

What we really need is a better MMIO controller such that devices can have multiple privilege rings (or capability tokens); with that in place, it could be made *impossible* for devices to DMA into memory other than that the CPU wants it to DMA into.

But as far as I know nobody has written such a controller, let alone put it in any sort of affordable hardware. I'd be overjoyed to be corrected.

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 18, 2007 12:08 UTC (Thu) by Los__D (guest, #15263) [Link] (6 responses)

He talked about constraining DMA to the memory areas needed, not banning DMA... If the first is possible without the last, I have no idea.

The ban was on mmap.

Dennis

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 18, 2007 14:19 UTC (Thu) by nix (subscriber, #2304) [Link] (4 responses)

Banning mmap() of hardware would be reasonable except that... anything a bug can do to a memory-mapped region, external hardware can do to you anyway through a bug in DMA programming.

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 18, 2007 15:28 UTC (Thu) by gnb (subscriber, #5132) [Link] (3 responses)

So you need an IOMMU. They are arriving on server-grade x86 hardware, so
I assume they'll make their way into people's desktops eventually. And
eventually into sub-PC priced devices.

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 18, 2007 15:41 UTC (Thu) by cventers (guest, #31465) [Link] (2 responses)

Even then, isn't it fairly trivial to hang the bus on common PC
architecture?

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 18, 2007 16:17 UTC (Thu) by nix (subscriber, #2304) [Link] (1 responses)

Certainly a lot of hardware has bugs/misfeatures whereby it can be convinced to grab the bus and never let it go: again, graphics cards are the most common crashers. Graphics card interfaces always seem to me to have been written by madmen, from state machines where if you don't do exactly the right thing the bus locks up, through write-only memory locations, to entire undocumented languages on modern cards...

I remain impressed that Dave Airlie and the other free software graphics cards retain their sanity. I'm sure I wouldn't.

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 18, 2007 17:19 UTC (Thu) by nix (subscriber, #2304) [Link]

Um, the other free software graphics card *hackers*. As far as I know you can't buy Dave on the high street yet (and I'm not sure how fast he'd be able to do 3D rendering).

(I'll, um, blame it on the weather. I was warned that `high winds and heavy rain are forecast and this will disruption', so presumably as well as disrupting their grammar it's disrupted my posts.)

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 19, 2007 2:17 UTC (Fri) by vonbrand (subscriber, #4458) [Link]

Doing the "banning" right presumes faultless software (elsewhere). I don't see that that software will be any simpler (and thus more probably right) than the one futzing around. Looks to me like the sum total will be buggier.

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 18, 2007 18:15 UTC (Thu) by bcd (guest, #11759) [Link]

There's always a performance/reliability tradeoff. But if ask the users of *most* systems -- outside of the limited scope of 3D gaming -- you'll find that reliability is more important, as it should be. And we're not just talking about desktop PCs: it's also about the embedded devices that we're all putting more trust into these days.

It's hard to focus on performance first and reliability later -- I've tried this in my own software, and what usually happens is the bug fixes and redesigns to address instability blow away all of the performance gains you started with. It's much easier as a developer to get it right first, and worry about the performance second. Sure, it's a close second, but it's still second.

Tanenbaum's points should be first discussed and debated on their own merits, regardless of performance implications. Do these principles really guarantee higher reliability? Are microkernels the best way to implement these principles? Trying to address performance concerns at the same time only complicates things.

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 18, 2007 22:30 UTC (Thu) by intgr (subscriber, #39733) [Link] (1 responses)

Your 3D game only works because the GPU is allowed unchecked access to main memory.

As far as I can tell, this statement alone is incorrect. All graphics cards produced since AGP become widespread, are now using an IOMMU called GART, which ought to stop the GPU from making memory requests to bad addresses in the main memory.

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 25, 2007 10:55 UTC (Thu) by nix (subscriber, #2304) [Link]

What about all the other devices relying on DMA for decent performance, like disk controllers? An IOMMU specific only to graphics cards strikes me as silly.

PINE and reliability is especially punny

Posted Feb 1, 2007 11:30 UTC (Thu) by gvy (guest, #11981) [Link]

I know lots of people who conside "UW", "WU" and other signs of "made in Washington University" being a label of inherent insecurity, a kind of non-reliability too.

So running microkernel for reliability and then for pine is yeah nice joke :)

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 18, 2007 4:36 UTC (Thu) by flewellyn (subscriber, #5047) [Link] (1 responses)

Linux may not move towards a microkernel design anytime soon. But it can possibly be done
piecemeal, in the sense that kernel modules could be migrated to userspace. This would,
however, require a "stable module API", which they don't want for reasons Greg KH has outlined.

Still, Minix could provide an interesting platform to work on this design. I don't see it replacing
Linux anytime soon, but who knows what may happen in ten years?

contras (and a bit of philosophy)

Posted Feb 1, 2007 13:36 UTC (Thu) by gvy (guest, #11981) [Link]

With this license and sort of leadership, I guess it will be as irrelevant in ten years as it is now.

BSDL has proven to facilitate intra-projects which rather are in their own loop and not easily doing I/O in terms of development, and a stance like "if you were my student" doesn't help to create real following.

I guess it's sort of "QNX going opensource" might change things, but not for the mass market (where we see de-appliancization of mobile phones, which can hang or crash these days -- and drain battery in a day).

Maybe it's continuation of "CIO's problem with Linux", namely "nobody to blame". If we use flakey software like Windows (as Tannenbum does), we have lots of code by others to blame; if it's inherently reliable (and running on proper hardware, as one can see it's yet another problem) then it's clearly us to blame, and we don't like that very much deep inside.

---

One might start with actually enjoying the day job, particularly its high-quality, useful results which should not require arcane marketing to put down someone else's throat; and not resorting to the entertainment industry out of frustration... to sort of feel the scale of the problem.

Remember the iAPX 432?

Posted Jan 18, 2007 5:38 UTC (Thu) by BrucePerens (guest, #2510) [Link] (6 responses)

The iAPX 432 was an Intel CPU designed to run Ada reliably. It used a message-passing paradigm for communication between functions, and every function ran in its own privilege ring. So, a single function, rather than an entire microkernel, could protect itself from the rest of the system. At the time (around 1980) it took 4 PC boards to implement the CPU, and ran slow as molasses. It could be implemented within a single chip today, and would run at a reasonable speed.

Why isn't anyone working on this? There are lots of applications that could use it.

Bruce

Remember the iAPX 432?

Posted Jan 18, 2007 6:57 UTC (Thu) by drag (guest, #31333) [Link] (3 responses)

Lisp machines redux?

Remember the iAPX 432?

Posted Jan 18, 2007 16:26 UTC (Thu) by AJWM (guest, #15888) [Link] (2 responses)

Kind of.

The iAPX 432 was (as I recall) designed to be an Ada machine in the same way that e.g. the Burroughs B6700 et al were designed to be Algol machines (or Algol & Cobol machines). (Actually I think the 432 started out as a more generic HLL machine but when DOD finalized on Ada, Intel made the obvious moves to accomodate.)

Remember the iAPX 432?

Posted Jan 20, 2007 0:43 UTC (Sat) by brouhaha (subscriber, #1698) [Link] (1 responses)

That's correct. Ada was only in a very early stage of development when the 432 project started (originally as the 8800), and was not the target at that time. The first language to ship for the 432 was actually a dialect of Smalltalk.

The 432 architecture was very similar in concept, if not in detail, to the JVM.

Remember the iAPX 432?

Posted Jan 20, 2007 11:03 UTC (Sat) by drag (guest, #31333) [Link]

Well you have the 'Open Graphics' project recently got their first prototype board. It's all re-programmable and all that.

They have pictures and specifications at:
http://wiki.duskglow.com/tiki-index.php?page=OGD1&PHP...

Also HP has their new innovations with their FPGA designs were it allows faster speeds and much less waste of die space compared to older FPGA technology...

And you have Sun GPL'ng their cpu design, which is interesting (if currently a bit pointless).

And now you have Sun with their open source JVM.

Now I am a bit ignorant, and I know that there is a huge difference between programming for C vs programming a FPGA or whatnot, but what is the posibility for using a programmable proccessor for running, or at least accelerating, a Java Virtual Machine?

Or maybe a simplified Lisp environment?

Would it be worth it for a certain group of people to have hardware specificly tailored for a paticular software programming language?

Remember the iAPX 432?

Posted Jan 18, 2007 7:03 UTC (Thu) by eru (subscriber, #2753) [Link]

You could do a lot of that by using a regular Intel 32-bit x86 architecture chip the way the original designers of its security features apparently intended: Put almost everything in its own segment, use the 4 rings to give least privilege to code modules, use call gates, etc.

And you get quite a performance hit over the "unixy" way of using it. But it could be a lot more secure.

Remember the iAPX 432?

Posted Jan 18, 2007 22:27 UTC (Thu) by JoeBuck (subscriber, #2330) [Link]

I've met the former Intel exec who was in charge of the 432. The 432 is a great example of an interesting failure, interesting because of how much it taught people and the influence that it had on the industry. It certainly influenced David Patterson, father of RISC (and no, the x86 didn't really defeat RISC, since all modern x86 machines are RISC internally, with the CISC instructions translated to RISC micro-ops on the fly).

The idea of the 432 was to close the semantic gap, by directly supporting high-level language constructs in hardware. The VAX architecture had similar ideas, though the VAX didn't take things as far. The problem is that you pay through the nose for this stuff, and there's no way to avoid the penalty, even in cases where the compiler can easily prove that the condition the hardware is protecting against can't happen. Complex microcode is needed to handle all the possible corner-cases.

The end result is that even though the hardware has specialized stuff to handle the complex language constructs, much simpler hardware beat the pants off of heavyweight monstrosities, with vastly less silicon area and power.

High-availability operating systems not anything new

Posted Jan 18, 2007 7:17 UTC (Thu) by eru (subscriber, #2753) [Link] (1 responses)

The design principles Tanenbaum describes have been for decades in daily use in operating systems for telecommunications and other applications where seriously high availability is needed (and often enforced by regulators). What he is advocating is bringing that kind of technology to more mainstream computer users. Could be a good idea in principle, but are people willing to pay the cost? Selling reliability is harder than flashy performance.

High-availability operating systems not anything new

Posted Jan 18, 2007 11:29 UTC (Thu) by gdt (subscriber, #6284) [Link]

It's not that hard to sell reliability. Mobile phone manufacturers for one are getting very worried about the implications of the huge amount of code on their phones. Which is why they are funding things like the formal proof of the L4 microkernel.

Most ISPs are deeply interested in the stability and uptime of their routers and switches. One of the great disappointments of Cisco's IOS XR is that although it runs on the QNX microkernel it doesn't exploit that and runs huge processes that need to be restarted to have bug fixes applied.

The talk was timely because it refocusses the discussion on to building the best computer possible, rather than merely building an operating system better than Microsoft's.

About Andrew and Linus - A good article on AST's Web page

Posted Jan 18, 2007 7:26 UTC (Thu) by pr1268 (guest, #24648) [Link] (3 responses)

I've known for several years now that Andrew Tanenbaum and Linus buried the hatchet long ago over the infamous flame war back in 1993. But for those of you not familiar, AST has written about his correspondence with Ken Brown of the Alexis de Tocqueville Institute. Brown appeared to be writing a book on the history of Unix; in particular he seemed to be prying (mis)information out of AST as to whether Linus could actually write the entire Linux kernel himself without "borrowing" others' (presumably copyrighted and/or patented) Unix code (the double-quotes are mine).

I feel that AST's article is an enlightening account of the high praise AST actually has for Linus and Linux.

http://www.cs.vu.nl/~ast/brown/

About Andrew and Linus - A good article on AST's Web page

Posted Jan 18, 2007 15:38 UTC (Thu) by tjc (guest, #137) [Link] (1 responses)

Did anyone else notice that they both look the same? Check out that first picture -- Andy could be Linus' father.

Linus in 30 years...

About Andrew and Linus - A good article on AST's Web page

Posted Jan 18, 2007 17:12 UTC (Thu) by bronson (subscriber, #4806) [Link]

TANNENBAUM (advancing on Linus): There is no escape. Don't make me destroy your kernel. You do not yet realize your code's importance. You have only begun to discover microkernels. Join me and I will complete your training. With our combined efforts, we can end this destructive conflict and bring order to the industry.

LINUS: I'll never join you!!

About Andrew and Linus

Posted Jan 26, 2007 2:24 UTC (Fri) by nicku (guest, #777) [Link]

Andrew Tanenbaum and Linus buried the hatchet long ago

I don't think they were ever close to bearing hatchets in each other's company, but I was sitting in the row behind Linus in the talk, and I noticed that his applause was not the most heartfelt at the end of Andrew's talk. I don't expect a microkernel from Linus in the next year.

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 18, 2007 8:10 UTC (Thu) by alejluther (subscriber, #5404) [Link] (2 responses)

I think some linux users are very sensitive when someone criticizes Linux, but if you think about what is the main reason to use microkernels you have to agree with Tanenbaum. Security, robustness, ..., yes, Linux is very stable and secure but the cost to get it is too high and just minor changes in the kernel can crash the system. OK, microkernels have other drawbacks, but my opinion is operating systems will follow the approach sooner or later. Look at virtualization technology: it is not just to save money, it is for security too. Some security protocols does not allow to run critical applications with others normal ones, but virtualization enables to do that in the same machine, but not in the same OS. Microkernels have the same goal, but here the execution domains are not full isolated, but they are full protected.

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 18, 2007 10:34 UTC (Thu) by Nick (guest, #15060) [Link]

>I think some linux users are very sensitive when someone criticizes Linux,

That's obviously relative to microkernel advocates, they don't go away no matter how much people criticise them. :)

>but if you think about what is the main reason to use microkernels you have to agree with Tanenbaum. Security, robustness, ..., yes, Linux is very stable and secure but the cost to get it is too high and just minor changes in the kernel can crash the system. OK, microkernels have other drawbacks, but my opinion is operating systems will follow the approach sooner or later. Look at virtualization technology: it is not just to save money, it is for security too. Some security protocols does not allow to run critical applications with others normal ones, but virtualization enables to do that in the same machine, but not in the same OS. Microkernels have the same goal, but here the execution domains are not full isolated, but they are full protected.

I didn't understand how Tanenbaum's design and robustness features were supposed to help so much.

What was shown was that he was able to kill a really simple IDE driver and have it recover, in his test environment (as opposed to hitting a real bug).

Minix apparently can't address issues where the hardware isn't being programmed in exactly the right way. I think this is where a lot of Linux driver bugs come from. These bugs are far worse than a simple driver crash, because your data can get trashed or lost.

It also doesn't address the issue of the wrong bit of data being written, or data being written in the wrong place. Andrew claimed these are usually caught very early in alpha testing. I'm sure this is true for a simple block device driver. I can't say the same would be true for something with one or two orders of magnitude more complexity, like an advanced filesystem.

From what I see, most bugs in Linux are not caused by one kernel component stepping on another, so microkernel protection won't help much; nor by simple mistakes that are easy to detect and/or recover from. So what do minix's tricks do? Hide the most mundane 10% of the bugs encountered, maybe. I can't see how it would reduce the amount of bugs by even one order of magnitude. In fairness it is a work in progress, so let's see what happens with it.

Actually I'm sure that if such a design could be built that meets all the promises then it would be widely used, and Linux would probably adopt ideas from it. It is also great to have such smart people like Tanenbaum researching all these different ideas and making these interesting kernels.

But is minix any less an unproven research toy now than it was during the Torvalds vs Tanenbaum debate?

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 18, 2007 16:49 UTC (Thu) by i3839 (guest, #31386) [Link]

The kernel is a bunch of software you're running of about 2 Mb big. Close your eyes, point in an arbitrary direction of stench, and you'll find a big blob unmanagable, buggy bloatware, which is in practice as crucial for using your computer as the OS is.

So all talk about a stable kernel are great, but the real problem lies elsewhere.

I hope that in this time where PCs and handheld consumer devices are converging in functionality, the software developers will recognize how bloated and slow their software is, and do something about it.

Comparing closed firmware with general purpose OS is bogus

Posted Jan 18, 2007 17:38 UTC (Thu) by pascal.martin (guest, #2995) [Link] (5 responses)

I dislike Tannenbaum's (and other's) references to TV or DVD players. These devices are closed, fixed, firmware. I never heard of any user installing software on any of these.

Well, one could hardly imagine a PC on which you cannot install any additional software: any buyer?

Now many of the problems people have with computer is application setup. Looking at the Tannenbaum's list of OS recommended feature, I do not see how he addresses any of these.

If my text processor dies and minix has restarted it, what about the 300 pages document I typed? Of course, good text processors maintain a recovery log on disk, but that is a design of the application and the OS cannot take credit for this.

Mr Tannenbaum has a (OS) hammer, and anything looks like a (OS) nail to him.

Comparing closed firmware with general purpose OS is bogus

Posted Jan 18, 2007 18:12 UTC (Thu) by pascal.martin (guest, #2995) [Link] (4 responses)

To add to my own comment, I also disagree with Tanenbaum's comment that DVD players are bug free and just work.

I recently created a DVD with some simple menus using dvdauthor. It worked fine with Oggle. It worked fine on my $25 DVD player. It worked fine on my sister's DVD player in France. It worked fine on my mother's neighbour DVD player. It does not work on my mother's Philips DVD player. Only God knows...

Go to any web site dedicaced to the very subject of DVD players and you will find that this model plays this but not that, etc...

Comparing closed firmware with general purpose OS is bogus

Posted Jan 19, 2007 13:49 UTC (Fri) by hummassa (subscriber, #307) [Link] (2 responses)

Only for your information, there are plenty of DVD players out there that
are firmware-upgradable. I happen to have one of those (Philips model 5100
IIRC -- I am at the office right now), and its firmware has its bugs (*),
but you can write a firmware image unto a CD-R, boot it, and it will
reflash the player.

(*) mostly, rendering problems (pixelations on some type of DivX movies)
and caption positioning problems...

Those that read Spanish, can take a look at http://dvp5100.blogspot.com/

Comparing closed firmware with general purpose OS is bogus--not!

Posted Jan 24, 2007 2:26 UTC (Wed) by ldo (guest, #40946) [Link]

Only for your information, there are plenty of DVD players out there that are firmware-upgradable.

Sure there are. And there are plenty of operating systems and other such pieces of software that offer upgrades to new versions, too, where you can trade in your old bugs for new ones.

Which reinforces the point, that Tanenbaum's analogy that a common household appliance like a TV is somehow inherently more reliable than a computer or a piece of software, is false. As such appliances incorporate full working computers into them, running complex pieces of software, they inevitably become just as unreliable as our PCs. Nobody is immune to writing buggy code.

Comparing closed firmware with general purpose OS is bogus

Posted Jan 27, 2007 22:54 UTC (Sat) by pascal.martin (guest, #2995) [Link]

Thanks. My mother is 80 years old, and never used a computer.

I am not sure I want to tell her to upgrade the DVD firmware. I found that not using any menu worked. So my DVDs now behaves like old VHS tapes :-)

The point is: DVD firmware hell makes the original claim ("DVD just work") look silly.

Comparing closed firmware with general purpose OS is bogus

Posted Jan 19, 2007 15:31 UTC (Fri) by tjc (guest, #137) [Link]

To add to my own comment, I also disagree with Tanenbaum's comment that DVD players are bug free and just work.

Second. I have a (fairly old and inexpensive) DVD player that is subject to buffer underruns. Unexpected things happen when this occurs, all well outside the category of "just working."

But I don't disagree with his premise, my cheap DVD player notwithstanding.

Interrupt handling?

Posted Jan 18, 2007 21:31 UTC (Thu) by seanyoung (subscriber, #28711) [Link]

There is also the issue of interrupts. Disabling an interrupt is device-specific, so it cannot be handled in the micro kernel.

What would happen with a shared, level-triggered interrupt line? As I understand, each device driver process will be scheduled (eventually), possibly re-asserting the interrupt line before it is disabled.

Please correct me if I'm wrong, but with I/O ports, DMA and interrupts it seems the hardware is a bigger issue than the implementation.

Besides, the amount of context-switching is enormous. A simple read()
would context switch to kernel, vfs, kernel, fs, kernel, ide, kernel and all the way back.

Meta: Excellent summary

Posted Jan 18, 2007 22:11 UTC (Thu) by kmself (guest, #11565) [Link]

Jon, just wanted to say that this is the sort of excellent, technical coverage that we really love from LWN (well, that, the Grumpy Editor series, and Debian developer licensing objections). Well worth the subscription.

On the technical side, it's interesting to note that there are also two developing routes to Tannenbaum's nirvana (nerdvana?). Microkernel architectures are one. Domain-separation through paravirtualization, a'la Xen, is another. While kernels themselves might remain monolithic, potentially hazardous activities (hardware interactions, user interfaces) can be split from one another, while the bogeyman of microkernels -- massive message-passing overheads -- are largely avoided.

Still, the key principles as reported are very sound.

Thanks!

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 18, 2007 23:49 UTC (Thu) by iabervon (subscriber, #722) [Link]

I obviously have a somewhat unusual experience, but the only Linux kernel bug I've run into worked like this: I have an ethernet card which, in a certain configuration, sends, in addition to the interrupts that the kernel expects, interrupts on the IRQ for the hard drive controller. If the system ran for a while without any hard drive traffic, the kernel would decide that some unknown device was screwing with that interrupt, and shut it of. At this point, the hard drive stops working, because its interrupts are ignored.

There are a number of belt-and-suspenders ways that Linux could keep the system stable (if an interrupt isn't handled, but gets shut off somehow anyway, that's as good as handling it for the purposes of stuck interrupt detection; if a shared interrupt with handlers is stuck and gets disabled, call the handlers from the timer interrupt or something, which will be bad for performance but keep the system running slowly), but I don't see a way that a microkernel could have helped. The bug would always trigger for this particular system, but wouldn't have any effect on a system in which the misdirected interrupt wasn't to an IRQ with a significant device on it. When the bug showed symptoms, the problem was not that the driver having trouble was wrong at all, and restarting it would have no effect. The bug didn't affect the misbehaving hardware or the driver that put it into the non-compliant state. And the bug involved only standard access to the device's own I/O space, in a way which the PCI spec says is correct.

Probably not everybody's pet bug has this sort of characteristic, but it seems to me that, while there is a clear benefit to defensive designs in which each component doubts the correctness of the rest of the system, works around dysfunction, and reports (for debugging) things that are wrong but survivable, I'm not convinced that a microkernel design is the ideal implementation of this practice.

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 19, 2007 19:35 UTC (Fri) by jbailey (guest, #16890) [Link]

I was convinced by ast's comments for a while until I wanted to contribute there. I'd hacked on the Hurd for a while and was thinking that it would be nice to work on a project that actually had momentum.

There are a series of things that kept me from going further:

* A strong resistance of things GPL. To the point where they're not really interested in hearing about bugs compiling things like coreutils.

* A strong rejection of testsuites. They said that they would accept tests, but that if a test ever broke, the test would simply be removed. One of the things that I think is mostly interesting about the Minix architecture is that so much of the system could be build-time tested while being built as a regular user.

* Private development model. Discussions happen around their lab rather than on lists and newsgroups. I started working on an ELF interpretor only to be told that a core dev was reworking this file, and that I should come back in a few months.

I still argue that the Microkernel debate was never lost - simply that to this day, no real contender has ever shown up.

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 20, 2007 12:27 UTC (Sat) by dw (subscriber, #12017) [Link] (2 responses)

The most interesting Great Leap Forward in thinking about the safety and reliability of operating systems that I have seen recently has come from none other than Microsoft Research. Their Singularity project already has many of the qualities that Tanenbaum is after, while introducing the (IMHO utterly cool) idea of software isolated processes. I'd highly recommend a read of this paper:

ftp://ftp.research.microsoft.com/pub/tr/TR-2005-135.pdf

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 22, 2007 18:19 UTC (Mon) by tjc (guest, #137) [Link] (1 responses)

See section titled "language-based protection":

http://preview.tinyurl.com/qhuhg

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 24, 2007 2:20 UTC (Wed) by ldo (guest, #40946) [Link]

That "tiny" URL points here.

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 24, 2007 18:14 UTC (Wed) by elmarco (guest, #42897) [Link] (2 responses)

OGG Video of the lca opening & Tanenbaum speech is available here:

http://mirror.linux.org.au/pub/linux.conf.au/2007/video/w...

have fun,

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 25, 2007 17:49 UTC (Thu) by kamil (guest, #3802) [Link] (1 responses)

How did you come about that link? At http://lca2007.linux.org.au/Programme there are links to the talks from Wednesday on, but not to Monday or Tuesday, or to the keynotes...

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 25, 2007 21:04 UTC (Thu) by elmarco (guest, #42897) [Link]

Hehe :) Someone posted a link to the K. Packard talk that was not listed in the programme. I realized that may be there will be other not listed, so I browse the directories. Finaly, I found some new one (see yourself). I was so glad that it was online!

Thanks to the LCA video team. Thanks Jeff: you really know that some of us would have been crying. You make us happy :)

(it could be *nice* if someone take the time to update the programme to reference all the talk recorded)

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 25, 2007 17:32 UTC (Thu) by pm101 (guest, #3011) [Link] (2 responses)

One downside of microkernel-style architectures is that they don't just impact performance -- they also impact complexity and code size. The system Andy describes does away with shared memory, so communication gets more difficult. It is highly threaded, and so the developer needs to worry about deadlocks. You also need to be tolerant of processes going away to have any benefit -- if the file system crashes while my word processor is saving, it needs to catch the fault and try saving again. This potentially dramatically increases code size and complexity, which leads to more bugs, and potentially a less stable overall system. Sticking compartments in a ship makes great sense, since it adds robustness without adding much design complexity. Sticking them in software is sometimes a good idea, but just as often, it sounds convincing, but actually leads to more bloated and less stable software.

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 25, 2007 18:48 UTC (Thu) by tjc (guest, #137) [Link]

MINIX 3 is about the same size as MINIX 2 -- a bit less than 30,000 lines of code. I wouldn't call this bloated. It lacks some important features, and only supports a few common devices, but it's still fairly impressive for a POSIX-compatible OS. At the very least it's a successful proof of concept.

LCA: Andrew Tanenbaum on creating reliable systems

Posted Feb 1, 2007 13:39 UTC (Thu) by renox (guest, #23785) [Link]

>highly threaded, and so the developer needs to worry about deadlocks.
Note that there are some telecom equipment SW written in Erlang with massive threading where they still manage to have a high reliability.

>if the file system crashes while my word processor is saving, it needs to catch the fault and try saving again.
This is not necessarily the word processor responsibility to retry the action: after all, the word processor did ask the OS to write some data on the disk. Whether it took one or two tries for the OS to do it doesn't really matter to the word processor..
Don't you remember 'KHB: Recovering Device Drivers: From Sandboxing to Surviving' from the week before?