writes about self-healing networks on O'Reilly.
"Wouldn't it be nice if your network services could detect their own failures
and gracefully restart? Sure, you could have cron or FAM jobs always
checking them, but that's so unrefined. Instead, consider Greg Retkowski's
solution: building a small Cfengine and NAGIOS combination to detect and
recover from failure."
(Log in to post comments)