LCA: Andrew Tanenbaum on creating reliable systems
Posted Jan 18, 2007 8:24 UTC (Thu) by mingo
In reply to: LCA: Andrew Tanenbaum on creating reliable systems
Parent article: LCA: Andrew Tanenbaum on creating reliable systems
If you take Tanenbaum's suggestion to heart, the 5-10% "penalty" of the micro-kernel design is irrelevant, because you won't just be swapping in a micro-kernel underneath the bloated, unreliable layers we've built on top of Linux. You'll be building an entire new system, bottom to top, with less bloat and more reliability. Will that total system have a 5-10% penalty over my current system? I doubt it. You can't even *begin* to speculate, because there are just far, far too many variables to really judge that.
Yes. The other cost is not performance but flexibility of design and /flexibility of bugfixes/. Both matter very much. You dont win more reliability by making bugs harder to fix. A 'monolithic' kernel's state might be harder to debug, but you've got everything in one place - if you need to change a few drivers to fix a bug in a core infrastructure API - no problem, you just do it. If you need to expose a data structure to another subsystem - no problem.
In a microkernel design you have explicit, documented, relied on APIs (which are more like ABIs) between subsystems, making both the ad-hoc sharing of information and fast fixing of those interfaces alot more cumbersome.
Furthermore, if there's a failure in any of the subsystems, i definitely do not want to hide this fact by having a "restart and try again" feature. I really want to achieve a bug free kernel, not a kernel that appears bug-free.
My opinion is that we'll win far more reliability by concentrating on transparent debugging facilities (static ones such as Sparse and dynamic ones such as [plug alert] lockdep), than via limiting the basic flexibility of the kernel's design. I'd rather burn CPU time on running with lockdep enabled to find deadlocks, than to slow down and hinder /all/ kernel development by forcibly isolating components from each other.
Also, there are some areas and subsystems where isolation wins us /more/ flexibility: for example filesystems. But here Linux already has FUSE, which is an /optional/ feature to write filesystems in user-space. NTFS-3G has already proven (by being leagues better than the in-kernel ntfs driver) that at least for that type of filesystem, and in that stage of its lifecycle, development was faster and more flexible in user-space.
Anyway ... we'll see how this works out. I have a huge amount of respect for Mr. Tanenbaum, his books are great and i am sure he is having tons of fun with Minix - and i definitely agree with him that reliability is the #1 challenge of modern OS design. Diversity of opinion and diversity of approach does not bother me, it will only enrich the end result.
to post comments)