By Jonathan Corbet
November 4, 2009
The Linux memory management code does its best to ensure that memory will
always be available when some part of the system needs it. That effort
notwithstanding, it is still possible for a system to reach a point where
no memory is available. At that point, things can grind to a painful halt,
with the only possible solution (other than rebooting the system) being to
kill off processes until a sufficient amount of memory is freed up. That
grim task falls to the out-of-memory (OOM) killer. Anybody who has ever
had the OOM killer unleashed on a system knows that it does not always pick
the best processes to kill, so it is not surprising that making the OOM
killer smarter is a recurring theme in Linux virtual memory development.
Before looking at the latest attempt to improve the OOM killer, it is worth
mentioning that it is possible to configure a Linux system in a way which
all but guarantees that the OOM killer will never make an appearance. OOM
situations are caused by the kernel's willingness to overcommit memory. As
a general rule, processes only use a portion of the address space they have
allocated, so limiting allocations to the total amount of RAM and swap
space on the system would lead to underutilization of system memory. But
that limitation can be imposed on systems which can never be allowed to go
into an OOM state; simply set the vm.overcommit_memory sysctl knob
to 2. Individual processes are much more likely to see allocation
failures in this mode, but the system as a whole will not overcommit its
resources.
Most systems will allow overcommitted memory, though, because the
alternative is too limiting. Overcommit works almost always, but the
threat of a day when the Firefox developers add one memory leak too many
always looms. When that sad occasion comes to be, it would be nice if the
OOM killer would target that leaky Firefox process instead of, say, the X
server and PostgreSQL. Many attempts have been made to add
smarts to the OOM killer over the years; there's also a means by which the system
administrator can steer the OOM killer toward or away from specific
processes. But manual configuration is only suitable for certain,
relatively static workloads; for the rest, the OOM killer often proves less
discriminating than one would like.
The latest attempt to fix the OOM
killer comes from Hiroyuki Kamezawa. This patch makes a number of
fundamental changes to the selection of OOM victims. The result is an OOM
killer which is smarter in some ways, but which takes a somewhat different
approach to the selection of its victims.
One of the factors that the current OOM killer takes into account, naturally, is
the amount of memory being used by each process. But the measure used
(mm->total_vm) is somewhat crude: it penalizes processes using
a lot of shared memory and says little about how much physical memory the
process is using. Hiroyuki's patch tries to move away from total_vm in
most situations, looking at the actual resident set size (RSS) and possibly
taking into account the amount of swap space used as well.
Figuring in swap usage is controversial. A program which is using a lot of
swap is clearly putting pressure on memory, but, if that program has been
mostly swapped out, killing it will not immediately free much RAM. Eventually
other processes can be shifted into the newly-freed swap space, but it
might make more sense to just do away with those other processes at the
outset. Even so, Hiroyuki's patch, for now, will figure in swap space if
specific constraints do not force the use of other criteria.
One constraint which can change the calculation is when the memory shortage
is specific to low memory - the region of memory which can be directly
addressed by the kernel. When a low-memory allocation is required, nothing
else will do, so there is little value in killing processes which are not
hogging low-memory pages. With Hiroyuki's patch, the VM subsystem tracks
how much low memory each process is using as a separate statistic. If the
OOM situation is caused by an attempt to allocate low memory, the OOM
killer's "badness" function will focus on processes holding large amounts
of low memory.
[PULL QUOTE:
Killing
gnome-session is likely to free substantial amounts of memory, but
the user's gratitude may be surprisingly limited.
END QUOTE]
The current OOM killer makes an attempt to target "fork bomb" processes by
adding half of each child's "badness" value to its parent. A process with
a lot of children will thus have a high badness and will thus come under
the OOM killer's baleful gaze sooner. The problem here, of course, is that
some processes legitimately have lots of children - the session manager
for the user's desktop environment is a good example. Killing
gnome-session is likely to free substantial amounts of memory, but
the user's gratitude may be surprisingly limited.
The patch changes the fork bomb detector significantly. The new code
counts only the child processes which have been running for less than a
specific amount of time (five minutes in the posted patch). If one process
has newborn children which make up at least 1/8 of the processes on the system,
that process is deemed to be a fork bomb; it is duly rewarded with a spot
at the top of the OOM killer's short list.
Finally, the current OOM killer tries to kill newly-created processes,
while allowing long-running processes to continue. Hiroyuki feels that
this approach creates a loophole for long-running processes which slowly
leak memory. That web browser may have been running for a long time and is
thus a high-value process, but it has been dropping memory on the floor for
that long time and is also the cause of the problem. So the new code
changes the calculation to look at how long it has been since the process has
expanded its virtual memory size. A process which has been running for a
long time, but which has not grown in that time, will look better than one
which has been expanding.
There seems to be little disagreement with the idea that the OOM killer
needs a rework, but not everybody is sold on this approach yet. It looks
like a very large change, which makes some people nervous. It also shifts
the focus of the OOM killer's attention in a significant way: the current
heuristics were designed to be as unsurprising to the user as possible,
while the new ones are focused more strongly on freeing RAM quickly. But,
given that the existing heuristics are still clearly producing plenty of
surprises, perhaps a more goal-oriented approach makes sense.
(Naturally, no article on the OOM killer is complete without a link to this 2004 comment from Andries
Brouwer).
(
Log in to post comments)