It seems to me that there is an uncovered area in your article.
Ok for "crash-only" (ie: backward recovery in dependability terminology), fault containment, increased reliability, etc.
But what about fault detection then? In more practical terms: when do you fire up the crash/kill/terminate procedure? Do you let the user decide when it should hit the power button? (Do you really trust users? What if he cuts the power cord with a knife?) Do you have another magical watchdog program running in some corners that knows what to do?
Fault management should not be limited to the recovery procedure, sometimes, the detection procedure is as important as well and it emphasizes the overall assumptions made on the system (fail-stop, fail-silent, fail-arbitrary, etc.).
Dunno how it applies to OS development however (except for pragmatic ideas like the linux software watchdog).
Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds