Posted Jan 19, 2012 9:35 UTC (Thu) by khim
In reply to: The future calculus of memory management
Parent article: The future calculus of memory management
However, I still think that systems can't be overcommitted safely, so I remain skeptical what possible tricks there are that an administrator could pull while not violating service's expectations.
This depends on what services you have available. Google's infrastructure is good example. It runs different kinds of taks on the same machines. Some serve search results and have "no overquota" policy. Some are batch processes, crawlers, exacycle, etc. These can be killed at any time because you can always just start them on another system.
Now, not only batch processes can be run in overcommit mode - Google can even take memory reserved for critical search process! Because if it actually will ask for the memory later you can kill non-critical process with extreme prejudice and give memory to critical process. Not sure if Google actually does it or not, but this is obviously doable.
If you'll think about what real clusters are doing you may be surprised just how much work is done by processes which can actually be killed and restarted. Sadly today such tasks are usually run synchroniously in the context of critical user-facing process thus to use memory efficiently you'll need to do serious refactoring.
to post comments)