KHB: Failure-oblivious computing
Posted Jun 30, 2006 9:22 UTC (Fri) by oak
In reply to: KHB: Failure-oblivious computing
Parent article: KHB: Failure-oblivious computing
Note that this is not a general solution for "fixing" buggy
programs, but a way to increase reliability of programs in
- Uptime / not crashing
- (Performance / speed)
are more important than program working correctly.
This might be the case where the handled data is either:
- Not written, just read and sent somewhere else
(another machine or process)
- You don't care about the data as much as of the rest of the service
("good enough" data reliability is satisfactory)
Even in those kind of situations I would assume this feature
to be enabled only after the software:
- Development phase has ended and SW has been deployed in
place(s) where it's hard to update (e.g. set-top boxes)
- Has been pretty throughly tested in an environment where similar
bugs cause program e.g. to dump core
I would say that for this thing to be generally useful,
following should be possible:
- Changing the program without re-compiling to terminate/dump core
- This run-time configurability would still be fast enough
...as I'm pretty sure administrators will still want to be able to
debug the problems they will encounter.
The more you value the data the program handles, the less you want
it to continue after there's some problem in handling the data.
Compare for example a program that manipulates / writes the same
data / files constantly (e.g. database server) to a program that
acts as a filter for a data that's different each time (e.g. mail
server) or doesn't write it at all (e.g. www-server).
to post comments)