LCA: Andrew Tanenbaum on creating reliable systems
Posted Jan 18, 2007 8:35 UTC (Thu) by oak
In reply to: LCA: Andrew Tanenbaum on creating reliable systems
Parent article: LCA: Andrew Tanenbaum on creating reliable systems
> Gnome-session can restart applications that crash and such.
This wasn't much of a consolation when I tried to run Ubuntu on
a system that didn't have enough memory. Nautilus died to kernel
OOM-kill and it was always restarted and as a result, the computer
was unusable. If it wouldn't have tried to continously restart
Nautilus, the system would have been usable. (moral: if it fails
too many times in a row, let it rest in peace)
> The concept was that applications at any point should be always at a
> state were they can instantly crap out and recover later.
But you can still lose data...
Btw. According to my limited experience, if there's a "reliability"
feature which papers over software faults, fixing of those faults will
be delayed (or sometimes not fixed at all) because "everything" works
"well enough" and debugging & fixing things is costly.
"Fault tolerance" should be used only on a system which you do not
expect/cannot fix or update.
to post comments)