First off, they are talking about failures much mroe significant than killing random processes, they are including network/system/power/building failures as well.
Secondly, in theory planning for every possible failure and setting up explicit handling for that failure is the best approach, in practice people have blind spots and something will go wrong that they didn't think of. It gets even worse when you start talking about combinations of failures.
As a result, the practice of randomly killing devices/systems/processes actually makes your far more resilient in the long run.
It's rough to get started with this, you need to make a real effort to make your system handle all the normal outages that you can think of, and you have to have management that agrees with this and is willing to accept the occasional outage that results when you find a new problem.
But when you have deliberately taken something down, it's usually far easier to bring it up again than when the same failure happens for real.
Plus you have your logs about what was deliberately "failed", and this can greatly cut down on your troubleshooting time if there is an outage.