LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 18, 2007 13:26 UTC (Thu) by vonbrand (subscriber, #4458)
In reply to: LCA: Andrew Tanenbaum on creating reliable systems by drag
Parent article: LCA: Andrew Tanenbaum on creating reliable systems

Minor problems in sight here...

To get decent performance out of current hardware the system has to be able to shove large amounts of data in one go. Bye, bye "Use IO to do data movement"
The whole "serialize request, send it over, unserialize, check, act, serialize results, send them back, unserialize, check" business is costly on current hardware. The relative cost of context switches is going up, so this isn't getting cheaper.
I just fail to see how microkernels (which require each separate part of the system to be able to handle multiple requests simultaneously (if you don't want to make even a single user system unbearably slow) can win over handling the whole synchronization problem once
Once everybody has to keep enough state to know how to restart requests that failed because the server crashed, we are in a whole new universe of pain. The tendency is exactly in the opposite direction: TCP is so nice because it handles all sorts of problems in the underlying net transparently. Few people are able to write software that is able to handle random failures, that is the reason why highly-reliable software is so expensive to produce.

A nice pipe dream.

Yes, I know that way back when people resisted compilers for the fear of loosing complete control over the machine, and getting slower programs. I do know that with today's plummeting hardware costs and balloning capabilities, and the ever better compilers and subtly changing hardware underneath, that it is madness to write complete programs in assembler. Maybe awt's time will come, but not in the near future.

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 18, 2007 14:18 UTC (Thu) by nix (subscriber, #2304) [Link]

Agreed with everything you say: I'm just being picky here and pointing out that your (common) typo of `loosing' for `losing' completely inverts the meaning of one sentence in your post :)

LCA: Andrew Tanenbaum on creating reliable systems

Posted Jan 18, 2007 19:21 UTC (Thu) by oak (guest, #2786) [Link]

Amen.

However, you missed this one:
The health of components should be monitored; if one stops operating
properly, the system should know about it.

I.e. polling / wakeups? -> goodbye battery life

I've also seen a case where the monitor thought the component was
misbehaving and killed & restarted it constantly. Yes, the component
was not communicating "according to spec" but from the user's perspective
it worked correctly. Killing was worse than letting it live and constant
restarting of course also drains the battery.