LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 18, 2007 8:24 UTC (Thu) by mingo (guest, #31122)
In reply to: LCA: Andrew Tanenbaum on creating reliable systems by elanthis
Parent article: LCA: Andrew Tanenbaum on creating reliable systems

If you take Tanenbaum's suggestion to heart, the 5-10% "penalty" of the micro-kernel design is irrelevant, because you won't just be swapping in a micro-kernel underneath the bloated, unreliable layers we've built on top of Linux. You'll be building an entire new system, bottom to top, with less bloat and more reliability. Will that total system have a 5-10% penalty over my current system? I doubt it. You can't even *begin* to speculate, because there are just far, far too many variables to really judge that.

Yes. The other cost is not performance but flexibility of design and /flexibility of bugfixes/. Both matter very much. You dont win more reliability by making bugs harder to fix. A 'monolithic' kernel's state might be harder to debug, but you've got everything in one place - if you need to change a few drivers to fix a bug in a core infrastructure API - no problem, you just do it. If you need to expose a data structure to another subsystem - no problem.

In a microkernel design you have explicit, documented, relied on APIs (which are more like ABIs) between subsystems, making both the ad-hoc sharing of information and fast fixing of those interfaces alot more cumbersome.

Furthermore, if there's a failure in any of the subsystems, i definitely do not want to hide this fact by having a "restart and try again" feature. I really want to achieve a bug free kernel, not a kernel that appears bug-free.

My opinion is that we'll win far more reliability by concentrating on transparent debugging facilities (static ones such as Sparse and dynamic ones such as [plug alert] lockdep), than via limiting the basic flexibility of the kernel's design. I'd rather burn CPU time on running with lockdep enabled to find deadlocks, than to slow down and hinder /all/ kernel development by forcibly isolating components from each other.

Also, there are some areas and subsystems where isolation wins us /more/ flexibility: for example filesystems. But here Linux already has FUSE, which is an /optional/ feature to write filesystems in user-space. NTFS-3G has already proven (by being leagues better than the in-kernel ntfs driver) that at least for that type of filesystem, and in that stage of its lifecycle, development was faster and more flexible in user-space.

Anyway ... we'll see how this works out. I have a huge amount of respect for Mr. Tanenbaum, his books are great and i am sure he is having tons of fun with Minix - and i definitely agree with him that reliability is the #1 challenge of modern OS design. Diversity of opinion and diversity of approach does not bother me, it will only enrich the end result.

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 25, 2007 15:26 UTC (Thu) by tjc (guest, #137) [Link] (4 responses)

Furthermore, if there's a failure in any of the subsystems, i definitely do not want to hide this fact by having a "restart and try again" feature.

My understanding is that MINIX 3 will log server/driver crashes and email the developer if so configured. I can't remember if I read this somewhere here, or in one of the whitepapers.

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 27, 2007 22:50 UTC (Sat) by pascal.martin (guest, #2995) [Link] (3 responses)

Minix will log server/driver crashes? To disk ? even if the disk driver crashed? :-)

Lets assume the disk driver was restarted. What happens if the disk driver crashes again, because of the activity caused by the crash log? 8-)

That may seems silly, but I have seen similar "death trap" problems in actual life.

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 29, 2007 15:22 UTC (Mon) by tjc (guest, #137) [Link]

Well yes, there is some chance of that happening, but there's also some chance that you will be hit by a bus and killed before you read this post.

I expect the logging system works in enough cases to be a benefit.

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 31, 2007 22:50 UTC (Wed) by tjc (guest, #137) [Link] (1 responses)

I just found this bit of information in the paper "Reorganizing UNIX for Reliability"

If crashes reoccur, a binary exponential backoff protocol could be used to prevent bogging down the system with repeated recoveries.

Unfortunately, no specifics are given. It sounds like something from Star Trek TNG.

Data: "Captain, I could use an binary exponential backoff protocol to restart the warp engines."

Picard: "Very good Mr. Data -- make it so!"

http://www.minix3.org/doc/ACSAC-2006.pdf

exponential backoff

Posted Feb 1, 2007 12:54 UTC (Thu) by robbe (guest, #16131) [Link]

Exponential backoff is a standard technique used, for example by mail
servers, in the face of transient failures: after the n-th consequitve
error, wait f * k^n seconds, then retry. Suitable values for f and k
depend on the application -- k is often 2 -> binary exponential backoff.

Example with f = 300, i.e. 5 minutes (a viable value for SMTP):

* First try ... fails
* Wait 5 minutes
* Second try ... fails
* Wait 10 minutes
* Third try ... fails
* Wait 20 minutes
* Fourth try ... fails
* Wait 40 minutes
* Fifth try ...
etc.

It would work the same for OS-component restart, of course with values
for f in the milliseconds.