Fighting fork bombs

By Jonathan Corbet
March 29, 2011

Unix-like systems tend to be well hardened against attacks from outside, but more vulnerable to attacks by local users. One of the softer spots in most systems has to do with "fork bombs" - processes which madly fork() until they run the system out of resources. These attacks are difficult to defend against and difficult to stop without a reboot; they can also, at times, be created inadvertently. If Hiroyuki Kamezawa has his way, fork bombs will be less of a problem in the future.

The problem with fork bombs is that they are moving targets; by the time a system administrator notices a rapidly-forking process, it may have created vast numbers of children and exited. Killing processes individually in a fork bomb situation is not really an option; even a program written especially for this task can be hard put to keep up with the stream of new processes. There is just no way to get a handle on the entire tree of offending processes from user space. So it is not surprising that the best response in this situation can be to hit the Big Red Button and start over. Even if, as in Kamezawa-san's case, hitting the button involves walking to another building where the afflicted system is housed.

Indeed, it can be hard to get a handle on this tree from kernel space as well. The process tree only exists, as such, as long as the parent processes remain alive; once a process exits, all of its children are reparented to the init process. That causes a flattening of the tree structure and makes it hard to identify all of the processes involved in the attack. So Kamezawa-san's patch starts with the addition of a new process tracking structure. It is organized as a simple tree reflecting the actual family structure of the processes on the system. It differs from existing data structures, though, in that this "history tree" persists even when some processes exit. That allows the kernel to view the entire tree of processes involved in a fork bomb even if those which launched the attack have long since gone away.

Keeping the entire history of all processes created over the lifetime of a Linux system would be a costly endeavor. Clearly, there comes a point where history needs to be discarded. Every so often (30 seconds by default), the kernel will try to determine whether there might possibly be a fork bomb attack in process; if no signs of an attack are detected, any tracking history which has existed for more then 30 seconds will be deleted.

How does the kernel decide whether it might be under attack? The way fork bombs incapacitate a system is usually through memory exhaustion, so the code looks for signs of memory stress: in particular, it looks to see if there have been any memory allocation stalls or kswapd runs since the last check. It also looks at whether the total number of processes on the system has increased. If none of those checks shows any reason for concern, the older history data will be removed from the system. If, instead, memory allocations are getting harder to come by or the number of processes is growing, the tracking structure will be kept around.

If a fork bomb runs the system out of memory, the kernel's first response will be to fire up the out-of-memory (OOM) killer. Given time, the OOM killer might manage to clean up the mess, but the fact of the matter is that the OOM killer is designed around finding the one process which is creating the problem and killing it. The OOM killer cannot identify a whole tree of rapidly-forking processes and do away with all of them.

Enter the fork bomb killer, which is invoked by the OOM killer. The fork bomb killer will perform a depth-first traversal of the process history tree, filling in each node with information on the total number of processes below that node and the total memory used by those processes. At the end, the process with the highest score is examined; if there are at least ten processes in the history below the high scorer, it is deemed to be a fork bomb; that process and all of its descendants will be killed. Problem solved - hopefully.

There are a couple of control knobs which have been placed under /sys/kernel/mm/oom. History tracking will only be performed if mm_tracking_enabled is set to "enabled" (which is the default setting). The value in mm_tracking_reset_interval_msecs controls how often the process tracking tree is cleaned up; the default value is 30,000 milliseconds. A possibly surprising omission is the lack of a knob controlling how many descendants a process must have before it is declared to be a fork bomb; the hardcoded value of ten seems low.

The reception for this patch has not been entirely favorable; commenters worry about the runtime cost of maintaining the tracking structure and suggest that user-space solutions may be better. Kamezawa-san seems resigned that the patch may not go in, saying "To go to other buildings to press reset-button is good for my health." Other administrators, who may not be within easy walking distance of their systems, may feel their health is better served by some extra fork bomb protection, though.

Index entries for this article
Kernel	Fork bombs
Kernel	OOM killer
Kernel	Security/Security technologies

Fighting fork bombs

Posted Mar 31, 2011 2:44 UTC (Thu) by jengelh (guest, #33263) [Link]

>once a process exits, all of its children are reparented to the init process. That causes a flattening of the tree structure and makes it hard to identify all of the processes involved in the attack.

Eh.. this sounds very much like a case that cgroups can handle. systemd is said to use them already to kill all processes spawned from a master even if the children have detached and reparented (think sshd).

Given that, the oom-killer may be tuned to group killable targets by cgroup rather than just tgid/tid.

Fighting fork bombs

Posted Mar 31, 2011 7:43 UTC (Thu) by alonz (subscriber, #815) [Link] (1 responses)

This mechanism appears to be very naive, and is easily bypassed.

For example: it is easy to develop a “creeping” fork-bomb that will just wait 30s (or even 1m, or 5m) between spawning successive generations of children. When this bomb begins to make its impact, it will already have tens (or hundreds, or thousands) of children, and the history will be long gone.

Fighting fork bombs

Posted Mar 31, 2011 11:10 UTC (Thu) by dholland (subscriber, #14680) [Link]

I haven't read the patch and I'm not familiar with kernel code... but this looks like it would catch the most common (non-malicious) forkbombs, which in my experience are usually due to user error.

(and something about not letting "perfect" be the enemy of "good"?)

Fighting fork bombs

Posted Mar 31, 2011 14:21 UTC (Thu) by cesarb (subscriber, #6266) [Link] (9 responses)

> Keeping the entire history of all processes created over the lifetime of a Linux system would be a costly endeavor. Clearly, there comes a point where history needs to be discarded.

I am failing to see why. You only need to keep the family tree of live processes (thus, branches with only dead leaves can be pruned). You do not need to keep all the inner nodes too; if you have a dead inner node with a single dead children, you can collapse both into a single dead inner node (how many intermediate dead nodes you had does not matter, and even if it did they could be replaced by a counter in the collapsed node). Unless I am visualizing it incorrectly, the worst case then is a binary tree with all the live nodes being the leaves, and so it has a bounded size (which is not that large).

Fighting fork bombs

Posted Mar 31, 2011 14:54 UTC (Thu) by Seegras (guest, #20463) [Link] (8 responses)

Yes. And keeping the tree it makes even more sense in a forensic context:

You will still know which process spawned what "inetd", even if the parent is long gone from memory or even disk.

Definitly worth some consideration.

Fighting fork bombs

Posted Mar 31, 2011 21:04 UTC (Thu) by dafid_b (guest, #67424) [Link] (7 responses)

Such a tree could be provide a framework for a more user friendly process inspection tool.

Hold in this tree the reason the process was created...
eg
"login-shell" (init hard code)
"Firefox Web Browser" (menu entry text)
"print-spooler"
"Chrome - BBC News Home" (Window title)

Background
I find myself uneasy when evaluating the safety of my system - the process list of 140 odd processes with perhaps 10 recognised, leaves me no wiser..

There are a couple of use-cases I think the above tool could help with
1)
Should I use the browser to transfer cash between bank accounts?
Or should I reboot first?
How can I become more confident of code running on my system?

2)
Was that web-site really benign?
I allowed the site to run scripts in order to see content more clearly...
Has it created a process to execute in the background after I closed the frame?

Fighting fork bombs

Posted Apr 3, 2011 1:58 UTC (Sun) by giraffedata (guest, #1954) [Link] (6 responses)

I have long been frustrated by the Unix concept of orphan processes, for all the reasons mentioned here.

If I were redesigning Unix, I would just say that a process cannot exit as long as it has children, and there would be two forms of exit(): kill all my children and exit, and exit as soon as my children are all gone. And when a signal kills a process, it kills all its children as well.

Furthermore, rlimits would be extended to cover all of a process' descendants as well, and be refreshable over time. Goodbye, fork bomb.

There are probably applications somewhere that create a neverending chain of forks, but I don't know how important that is.

Fighting fork bombs

Posted Apr 3, 2011 2:52 UTC (Sun) by vonbrand (subscriber, #4458) [Link] (5 responses)

Keeping processes around just because some descendent is still running is a waste of resources.

Fighting fork bombs

Posted Apr 3, 2011 19:06 UTC (Sun) by giraffedata (guest, #1954) [Link] (2 responses)

Keeping processes around just because some descendent is still running is a waste of resources.

Seems like a pretty good return on investment for me. Maybe 50 cents worth of memory (system-wide) to be able to avoid system failures due to runaway resource usage and always be able to know where processes came from. It's about the same tradeoff as keeping a process around just because its parent hasn't yet looked at its termination status, which Unix has always done.

A process that no longer has to execute shouldn't use an appreciable amount of resource.

Fighting fork bombs

Posted Apr 7, 2011 9:24 UTC (Thu) by renox (guest, #23785) [Link] (1 responses)

Currently when the parent exits its memory is totally freed, you're suggestion keeping the whole process until its children exits which can be expensive, maybe a middleground could be more useful ie keep only the 'identity' of the parent process and free the rest.

Fighting fork bombs

Posted Apr 7, 2011 15:16 UTC (Thu) by giraffedata (guest, #1954) [Link]

you're [suggesting] keeping the whole process until its children exits which can be expensive, maybe a middleground could be more useful ie keep only the 'identity' of the parent process and free the rest.

I don't think "whole process" implies the program memory and I agree - if I were implementing this, I would have exit() free all the resources the process holds that aren't needed after the program is done running, as Linux does for zombie processes today. But like existing zombies, I would probably keep the whole task control block for simplicity.

Fighting fork bombs

Posted Apr 4, 2011 16:51 UTC (Mon) by sorpigal (guest, #36106) [Link] (1 responses)

Isn't "disk/ram/cpu is cheap" typically the argument used to dismiss Unix design decisions based on efficiency?

Fighting fork bombs

Posted Apr 5, 2011 6:29 UTC (Tue) by giraffedata (guest, #1954) [Link]

Isn't "disk/ram/cpu is cheap" typically the argument used to dismiss Unix design decisions based on efficiency?

This appears to be a rhetorical question, but I can't tell what the point is.

Fighting fork bombs

Posted Mar 31, 2011 23:31 UTC (Thu) by mrons (subscriber, #1751) [Link] (3 responses)

I administer a system shared by comp sci students and see a lot of fork bombs.

Sending a signal to the process group kills all fork bombs in my experience.

A signal to the process group also kills what we call "comets", a process that forks then exits. You can never catch a PID to kill the comet directly. They can even be hard to detect on a busy system. lastcomm process logs are often the only way to see one.

The other requirement is process limits on users. Fork bombs will make a system unusable if there are no limits.

I don't really see the need for this patch in the kernel. The current facilities of process groups and user process limits solve all the problems that I've seen.

Fighting fork bombs

Posted Apr 1, 2011 0:29 UTC (Fri) by dtlin (subscriber, #36537) [Link] (2 responses)

A process can easily setsid() and make itself a session and process group leader, escaping kill(-pgid) in the same way that fork() escapes kill(pid).

RLIMIT_NPROC/ulimit -u is good, though.

Fighting fork bombs

Posted Apr 1, 2011 0:57 UTC (Fri) by mrons (subscriber, #1751) [Link]

Yes process limits are the most useful tool.

To kill a fork bomb that you can't send a kill(-pgid), you need to send
a STOP signal to each of the processes. The fork bomb won't grow past the users process limits and a STOPped process can't fork. So once all the processes are stopped you can KILL them.

Many years ago we had a lot of fun here in an fork bomb arms race. That's where several forms of "comets" mentioned above were invented in an effort to find something that the sys admin (me) could not kill.

One neat way to kill a comet, is to create a fork bomb as the user of the comet! That will slow down the comet enough so you can STOP it. Then you kill the fork bomb in the usual way.

Fighting fork bombs

Posted Apr 4, 2011 15:35 UTC (Mon) by jeremiah (subscriber, #1221) [Link]

so why not have this fork bomb killer use the limits data to know when a fork bomb has gone off and kill it. I guess to me it seems like if something is constantly bumping up against a limit, there must be something wrong going on, and some configurable action could take place. This seems better than some sort of time based polling mechanism.

Fighting fork bombs

Posted May 31, 2011 13:15 UTC (Tue) by mehuljv (guest, #52868) [Link]

I don't know whether i understood the solution correctly or not. I have below doubt,

How this patch handles below scenario,
Consider history cleanup time is 30 seconds.

Process A starts and forks 9 children. Lets refer all these new children as GROUP-B. Now, Process A exits so that init becomes parent of all GROUP-B processes.

Now, consider if all GROUP-B processes wait for 1 minute so that history of their original parent - PROCESS-A gets cleared. After 1 minute each GROUP-B process does fork of 9 children. So in total GROUP-B will spawn 81 processes. Lets refer these 81 processes as GROUP-C.

Now if all processes in GROUP-B exits, init will become parent of all processes in GROUP-C. Again all GROUP-C processes will wait for 1 minute so that history of all GROUP-B processes gets cleared.. and fork again...

If above iterations continue then after a while there will be many processes waiting/forking/exiting to avoid oom and system is still under fork attack.

Can any one explain me what happens in above scenario ?

Mehul.