LWN.net Logo

Advertisement

Aztek Networks. Linux, C++ developers wanted. Embedded, Power-PC target.

Advertise here

Crash-only software: More than meets the eye

Posted Jul 13, 2006 18:54 UTC (Thu) by Segora (subscriber, #8209)
Parent article: Crash-only software: More than meets the eye

Hi,

this made me think of Joe Armstrong's (of Erlang fame) work on fault tolerant systems[1]. The canonical way to make an Erlang/OTP system is to divide it into one or more applications, each of which has a supervisor tree of processes. When a process crashes, the restart strategy determines if only the crashed process is to be restarted, all processes on the same level are restarted, or the supervisor crashes and the fault is propagated upwards, leading to the whole node being restarted via hardware watchdog in the extreme case.

Segora

[1] Making Reliable Distributed Systems in the Presence of Software Errors (2003), http://citeseer.ist.psu.edu/armstrong03making.html


(Log in to post comments)

Crash-only software: More than meets the eye

Posted Jul 14, 2006 0:18 UTC (Fri) by pimlott (subscriber, #1535) [Link]

Rats, I was going to mention Erlang! Erlang's motto is "let it fail", and this philosophy (counterintuitively!) builds extremely high reliability telecommunications routers. Walking through the Erlang tutorial is a great exercise in this style of design.

Crash-only software: More than meets the eye

Posted Jul 14, 2006 3:25 UTC (Fri) by im14u2c (subscriber, #5246) [Link]

You know what's interesting is that it's not only the software that fails. This is especially true in infrastructure computing (such as telecom), where boxes are literally everywhere and deployed "forever." Bit-flips due to radiation, aging components, etc... none of those should bring down the phone network, but you might drop a phone call.

Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds
Powered by Rackspace Managed Hosting.