LCA: Andrew Tanenbaum on creating reliable systems
Posted Jan 18, 2007 20:14 UTC (Thu) by eklitzke
In reply to: LCA: Andrew Tanenbaum on creating reliable systems
Parent article: LCA: Andrew Tanenbaum on creating reliable systems
The problem with that sentiment, and the whole article, is that it focuses solely on the kernel.
I don't think I've had more than 3 or 4 Linux failures in my life, and most of those were when using very new drivers (or NVIDIA).
I have had X crash or lock, various GNOME and KDE components crash or lock, various regular applications crash and lock more times than I can possibly count. Definitely into the triple digits, if not quadruple by now.
I tend to agree with you here. The kernel is very stable -- I've only had one real, bona fide kernel oops in the past 18 months or so (I think it was pdflush that crashed it). And I can't even begin to count how many times X has totally locked up the system (usually after starting a misbehaving Gnome application). But that just means that those applications just need to implement a fault tolerant model as well. It's totally unacceptable that an application can cause X to lock up the whole computer. If X was self-healing that would be spectacular.
A lot of the most modular pieces of software on my system (I am thinking particularly of Postfix and Apache) are also the most stable. TCP/IP is another example of a modular (well, layered) system that is particularly resilient to failure. Certainly this level of modularity isn't needed in all cases, but for any really critical software I think that taking some lessons from the microkernel model is a great idea.
to post comments)