LWN: Comments on "Killing processes that don't want to die" https://lwn.net/Articles/754980/ This is a special feed containing comments posted to the individual LWN article titled "Killing processes that don't want to die". en-us Tue, 30 Sep 2025 09:19:23 +0000 Tue, 30 Sep 2025 09:19:23 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net Single Unix https://lwn.net/Articles/766616/ https://lwn.net/Articles/766616/ lukeshu <div class="FormattedComment"> Back in the day, there were many differences between SUS and POSIX, but today SUS is just POSIX+Curses. SUSv4 is literally a document set (Open Group T101) of two separate documents; Open Group C165 (POSIX-2008, 2016 edition), and Open Group C094 (X/Open Curses, Issue 7).<br> </div> Tue, 25 Sep 2018 02:41:05 +0000 Killing processes that don't want to die https://lwn.net/Articles/766615/ https://lwn.net/Articles/766615/ lukeshu <div class="FormattedComment"> That page is a bit confusing in what it is and what it's describing. It's describing the major changes between POSIX Issue 6 (AKA POSIX-2001) and POSIX Issue 7 (AKA POSIX-2008).<br> <p> So where do 2017 &amp; 2018 come in on that page? POSIX Issue 7 has had several "bugfix" releases since it was released in 2008. The most recent of which was "1003.1-2017", which didn't actually become official until January 2018.<br> <p> There are real changes and additions being worked on by the POSIX committee, but they won't show up in a "bugfix" release to Issue 7, they're being held until Issue 8. I'm unsure what the release timeline looks like for Issue 8.<br> </div> Tue, 25 Sep 2018 02:29:07 +0000 Killing processes that don't want to die https://lwn.net/Articles/766614/ https://lwn.net/Articles/766614/ lukeshu <div class="FormattedComment"> Looping over /sys/fs/cgroup/…/cgroup.procs until it's empty *does* resolve the race-condition that it's difficult to determine whether you've successfully killed the process. You are correct in that it doesn't resolve the race-condition that it's difficult to get the active PID to kill. However, changing the question from "if" to "when" is *significant*.<br> <p> Additionally, in the case of systemd: since systemd is the process that will will be collecting the dead parent PIDs, this removes the safety concern that another process re-uses the PID between the time the target process abandons the PID and the reaper calling kill(PID). If not using systemd, the same thing can be accomplished by having a trusted parent process mark itself as a subreaper before invoking the untrusted executable.<br> </div> Tue, 25 Sep 2018 02:06:14 +0000 kill a kernel thread https://lwn.net/Articles/756699/ https://lwn.net/Articles/756699/ liam I had <a href=https://lwn.net/Articles/192632/>this</a> in mind:<br> <blockquote> Other potential uses exist as well; consider, for example, disconnecting a process from a file which is preventing the unmounting of a filesystem. </blockquote> Wed, 06 Jun 2018 06:45:08 +0000 Killing processes that don't want to die https://lwn.net/Articles/756385/ https://lwn.net/Articles/756385/ sbaugh <div class="FormattedComment"> As the article says, process groups can be changed by unprivileged processes, so a fork bomb can easily switch process group and avoid your attempt at killing.<br> <p> Your SIGSTOP suggestion is likewise flawed. Nothing prevents a normal unprivileged process from simply sending SIGCONT to its parents.<br> </div> Sun, 03 Jun 2018 18:19:45 +0000 Killing processes that don't want to die https://lwn.net/Articles/756381/ https://lwn.net/Articles/756381/ anselm <p> If you're dealing with a fork bomb that fills up the process table, SIGKILLing a process will free a slot in the process table that a new instance of the fork bomb can immediately fill. If you SIGSTOP them first, that can't happen because none of the still-existing-in-the-process-table-but-stopped fork bomb processes will be able to spawn new children. </p> Sun, 03 Jun 2018 16:30:41 +0000 Single Unix https://lwn.net/Articles/756286/ https://lwn.net/Articles/756286/ jschrod <div class="FormattedComment"> <font class="QuotedText">&gt; Sorry I meant: only the latter is behind a registration/paywall? (which?)</font><br> <p> registration<br> <p> <font class="QuotedText">&gt; what are in a nutshell the *technical* differences between today's POSIX and today's Single Unix?</font><br> <p> POSIX is a part of SUS.<br> <p> In fact, current POSIX publication is also done by OpenGroup; e.g. <a href="http://pubs.opengroup.org/onlinepubs/9699919799/nframe.html">http://pubs.opengroup.org/onlinepubs/9699919799/nframe.html</a> is POSIX.1-2017, which is the most important part of SUS Version 4.<br> </div> Fri, 01 Jun 2018 20:00:45 +0000 Killing processes that don't want to die https://lwn.net/Articles/756282/ https://lwn.net/Articles/756282/ smurf <div class="FormattedComment"> ?<br> <p> Please explain how SIGSTOP+SIGKILL could possibly be more effective than SIGKILL.<br> </div> Fri, 01 Jun 2018 19:07:44 +0000 Single Unix https://lwn.net/Articles/756275/ https://lwn.net/Articles/756275/ marcH <div class="FormattedComment"> Sorry I meant: only the latter is behind a registration/paywall? (which?)<br> <p> This was just a side and half-joke actually, I don't really care that much. My more important question is: what are in a nutshell the *technical* differences between today's POSIX and today's Single Unix? Assuming of course these can fit in a nutshell. For instance: is Single Unix just a new name fancy name for what could have been just called POSIX 2018? Or is POSIX is an outdated and significantly smaller subset of Single Unix? Are the exact same players shooting again? Etc.<br> <p> xtifr seemed to know much more than he shared.<br> </div> Fri, 01 Jun 2018 18:32:09 +0000 Killing processes that don't want to die https://lwn.net/Articles/756232/ https://lwn.net/Articles/756232/ dps <div class="FormattedComment"> Nobody seems to have mention this. so maybe I should.<br> <p> One of the standard ways of dealing with fork bombs is to send the processes you can to kill SIGSTOP (19), to prevent news processes appearing, before killing them with SIGKILL. All you need is the kill(1) command and the privileges required to use SIGKILL. The only major excpetion is init(1) and maybe some kernel threads.<br> <p> Also note that kill(-1, ...) is liable to hit more than you probably want to hit. It is probably better to target a process group instead. Most fork bombs only have one process group id and therefore the signal will be delivered to all of their components.<br> <p> <p> </div> Fri, 01 Jun 2018 16:01:20 +0000 Single Unix https://lwn.net/Articles/756230/ https://lwn.net/Articles/756230/ jschrod <div class="FormattedComment"> Download the latter at <a href="https://publications.opengroup.org/t101">https://publications.opengroup.org/t101</a><br> </div> Fri, 01 Jun 2018 13:30:52 +0000 Killing processes that don't want to die https://lwn.net/Articles/756131/ https://lwn.net/Articles/756131/ sbaugh <div class="FormattedComment"> The primary issue with a UID-based approach to killing processes is that it requires privileges to set up. Also, this isn't robust in the presences of setuid binaries such as sudo. You can get around this by setting PR_SET_NO_NEW_PRIVS. But then of course your subprocesses can't use setuid binaries.<br> <p> I hacked together <a href="https://github.com/catern/supervise">https://github.com/catern/supervise</a> which uses PR_SET_CHILD_SUBREAPER to solve this issue without requiring privileges. Even in the presence of fork-bombs, supervise will still kill all its children in finite time, without privileges or any special setup of the system.<br> <p> Unfortunately, supervise is also not robust to arbitrary setuid binaries by default, but you can again set PR_SET_NO_NEW_PRIVS to make that issue go away.<br> </div> Thu, 31 May 2018 15:09:53 +0000 Killing processes that don't want to die https://lwn.net/Articles/756129/ https://lwn.net/Articles/756129/ sbaugh <div class="FormattedComment"> You can do killbelow in userspace, as long as you're a subreaper. I implemented it here: <a href="https://github.com/catern/supervise/blob/master/src/subreap_lib.c#L192">https://github.com/catern/supervise/blob/master/src/subre...</a><br> </div> Thu, 31 May 2018 14:53:44 +0000 Killing processes that don't want to die https://lwn.net/Articles/756111/ https://lwn.net/Articles/756111/ fanf <div class="FormattedComment"> I call these kinds of processes "fork rabbits". I got into a sticky situation with one of them once...<br> <p> I was hacking on a production server (I didn't have an adequate test environment). I had a daemon that was supposed to re-open its log files etc. when it got a signal. In order to cope with slow cleanup of the old file descriptors, it would fork and reopen the new file descriptors in the child, allowing the parent to clean up at leisure.<br> <p> I refactored the signal handling code, and screwed it up.<br> <p> When the daemon received a signal, it became a rabbit.<br> <p> It was running as root on a production server.<br> <p> I couldn't use `kill -KILL -1` and I couldn't reboot the machine. (I might have been able to kill by pgid, but I didn't think of that at the time.)<br> <p> Fortunately this machine did not have randomized pids, so I could anticipate the future pid of the rabbit a few seconds in advance and run a `while :; do kill $pid; done` loop. Of course the rabbit raced right through the trap.<br> <p> I rewrote the killer in C, and tried again, but the rabbit kept winning the race. So I tried running multiple concurrent killers targeting several adjacent pids. Eventually this worked!<br> <p> (The side effect would have been a number of failed FTP connections...)<br> </div> Thu, 31 May 2018 13:26:18 +0000 Killing processes that don't want to die https://lwn.net/Articles/756095/ https://lwn.net/Articles/756095/ grawity <p>They could now that the <tt>pids</tt> controller exists. As systemd already supports it for limiting the number of processes per-cgroup, it could drop the limit to 1 and prevent fork() from being used.</p> Thu, 31 May 2018 11:17:21 +0000 Single Unix https://lwn.net/Articles/756035/ https://lwn.net/Articles/756035/ marcH <div class="FormattedComment"> <font class="QuotedText">&gt; So we no longer need a watered-down "in-between" standard like Posix. </font><br> <p> What are the difference today between "POSIX" and "Single Unix"? Only the former is available on line? :-)<br> </div> Wed, 30 May 2018 16:58:24 +0000 kill a kernel thread https://lwn.net/Articles/755949/ https://lwn.net/Articles/755949/ flussence <div class="FormattedComment"> I'll admit i915 *used to* be pretty bad about locking up (around 2010-2012?), but it's improved a lot since then. Chromium still manages to provoke long pauses in it somehow, but it looks like there's plenty of bugs open about that already, and my problems mysteriously vanished when I switched browsers.<br> <p> Anyway those symptoms above were actually things I'm getting in amdgpu. There's corresponding bugs for them too (and a bunch of other irritants I didn't mention), so I can't really do anything but wait, and scowl at the company… their management's been overpromising and underdelivering since they bought out ATi.<br> </div> Wed, 30 May 2018 02:41:22 +0000 Killing processes that don't want to die https://lwn.net/Articles/755945/ https://lwn.net/Articles/755945/ Cyberax <div class="FormattedComment"> Try it. cgroup-based killing is way faster than the classic /proc-based tools and you can't realistically outrace it in regular conditions.<br> <p> I used to worry about it, but in practice it's not a problem. It'd be interesting to add support for atomic signalling to cgroups, though.<br> <p> If your cgroups also have attached controllers, you can start by decreasing the cgroups CPU and memory priority.<br> </div> Wed, 30 May 2018 00:14:18 +0000 Killing processes that don't want to die https://lwn.net/Articles/755943/ https://lwn.net/Articles/755943/ wahern <div class="FormattedComment"> s#/proc#/sys#<br> </div> Tue, 29 May 2018 23:46:55 +0000 Killing processes that don't want to die https://lwn.net/Articles/755942/ https://lwn.net/Articles/755942/ wahern <div class="FormattedComment"> Reading a cgroups list from /proc is not intrinsically different than reading any other process list. Yes, it's probably faster.[citation needed] But you could always have 2 or 3 or 20 processes in a fork loop. As a practical matter it's a distinction without a difference. We shouldn't address TOCTTOU races by making loops faster, and it would be foolhardy to think doing so presents any significant barrier. It would be nice to have a way to reliably, consistently, and *provably* do process management without having to roll dice. Preferably using a mechanism that isn't easily broken with the next absent-minded patch set. The nice thing about relying on UID semantics is that it's not an area where people tend to be oblivious to the ramifications of their changes because it's always been possible to atomically kill, e.g., process groups. Though, by all means lobby for making it possible to atomically kill a cgroup. Not my choice, but defensible.<br> <p> </div> Tue, 29 May 2018 23:46:23 +0000 Killing processes that don't want to die https://lwn.net/Articles/755939/ https://lwn.net/Articles/755939/ Cyberax <div class="FormattedComment"> In theory.<br> <p> In practice the kill loop in systemd works much faster than forking. And crucially processes can't escape their cgroup (unless they have sufficient privileges).<br> </div> Tue, 29 May 2018 21:24:29 +0000 Killing processes that don't want to die https://lwn.net/Articles/755938/ https://lwn.net/Articles/755938/ csigler <div class="FormattedComment"> I've got to ask:<br> <p> Who else remembers RWAST...?<br> <p> </div> Tue, 29 May 2018 21:14:49 +0000 Killing processes that don't want to die https://lwn.net/Articles/755931/ https://lwn.net/Articles/755931/ ay <div class="FormattedComment"> With sufficient capability you can PTRACE_SEIZE a process and become its parent, but that doesn't do much good for children of children and so on.<br> </div> Tue, 29 May 2018 20:00:53 +0000 Killing processes that don't want to die https://lwn.net/Articles/755930/ https://lwn.net/Articles/755930/ wahern <div class="FormattedComment"> systemd has the same race condition as killall. cgroups doesn't solve that.<br> <p> </div> Tue, 29 May 2018 19:45:56 +0000 Killing processes that don't want to die https://lwn.net/Articles/755929/ https://lwn.net/Articles/755929/ hawski <div class="FormattedComment"> You are right. I think that namespaces nowadays are a correct answer. That was just my exploration of ideas. Getting subreaper status is pretty much close to using namespaces. I don't know how good namespaces are supported on different systems, so then killbelow and subreaper could be probably a simple solution for other systems.<br> </div> Tue, 29 May 2018 19:35:03 +0000 Killing processes that don't want to die https://lwn.net/Articles/755928/ https://lwn.net/Articles/755928/ hawski <div class="FormattedComment"> It's covered by this part:<br> <p> <font class="QuotedText">&gt; killbelow() is most useful if a process calling it is a reaper for its descendant processes. Reaper status can be acquired by using procctl(2) with PROC_REAP_ACQUIRE or prctl(2) with PR_SET_CHILD_SUBREAPER.</font><br> <p> But yes, with this it's quite close to just using namespaces.<br> </div> Tue, 29 May 2018 19:31:52 +0000 Killing processes that don't want to die https://lwn.net/Articles/755927/ https://lwn.net/Articles/755927/ ebiederm <div class="FormattedComment"> The practical problem with killbelow is that a child can daemonize itself and then not be your child. So even if you could kill every child escaping from being killed is still straight forward.<br> </div> Tue, 29 May 2018 19:27:31 +0000 Killing processes that don't want to die https://lwn.net/Articles/755926/ https://lwn.net/Articles/755926/ xtifr <div class="FormattedComment"> The replacement is basically SUS (the Single Unix Specification).<br> <p> The big things that changed are that 1. VMS died, and 2. the Open Group took over the Unix trademark. Which means, modern OSes can basically be divided into two families: Unix-like, which includes Linux and MacOS, and Microsoft. So we no longer need a watered-down "in-between" standard like Posix. Microsoft (unlike DEC) just doesn't care, and everyone else just went ahead and became a more-or-less "real" Unix.<br> </div> Tue, 29 May 2018 19:15:46 +0000 Killing processes that don't want to die https://lwn.net/Articles/755892/ https://lwn.net/Articles/755892/ k8to <div class="FormattedComment"> Or comparably, do you find the opengroup docs very readable? I don't. They're not bad for what they are, but I find them laborious to follow and missing information about purpose and intent.<br> <p> Basically the value to me as a higher level user of computer systems is that someone has created more digestible information that contains what is in them and more. Is anyone doing that anymore with libc &amp; system calls? <br> <p> For example, I can work out for myself that call_l(..., locale_choice) allows me to to write code that doesn't break when someone creates a goofy set of env vars, but can most modern developers work that out on their own with the information provided in POSIX? I'd expect not.<br> </div> Tue, 29 May 2018 15:45:48 +0000 Killing processes that don't want to die https://lwn.net/Articles/755885/ https://lwn.net/Articles/755885/ k8to <div class="FormattedComment"> In industry, I find most people want to pretend the operating system isn't a matter of much concern anymore. Various abstractions should make it go away, for example no one wants to see what's going wrong with software, just reboot the VM/container programmatically with more software.<br> <p> In the opengroup docs, scanning for "what changed in here for POSIX.1-2017" which seems to be forming into SUS2018, I find items like:<br> <p> "The UUCP utilities option is added."<br> <p> It seems like mostly "a clarifying type was added to these two arguments to one function" is a pretty big change for this update. Mostly it seems like it's a matter of officially dropping already deprecated things.<br> <p> The most significant set of changes appear to be here:<br> <p> <a href="http://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xsh_chap01.html#tag_22_01_01">http://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_x...</a><br> <p> They seem to have sort of unbroken locale a bit, by letting the program ask for an answer in a specific locale.<br> the *at set of functions are moved from some kind of API annex to the main spec. <br> <p> It's difficult to figure out what's truly new.<br> <p> </div> Tue, 29 May 2018 15:13:53 +0000 Killing processes that don't want to die https://lwn.net/Articles/755869/ https://lwn.net/Articles/755869/ NightMonkey <div class="FormattedComment"> This is just a question: as a sysadmin, I've been spoiled by POSIX and Linux's attempts to keep within its specifications. As you say, I have nice documentation on expected behaviors, when I need it for that wonderful 3:37 AM troubleshooting call. :)<br> <p> I see at <a href="https://en.wikipedia.org/wiki/POSIX">https://en.wikipedia.org/wiki/POSIX</a> that indeed, there is some modern work (2017?), but is that work just whistling past the graveyard? Is there something waiting in the wings to replace POSIX? Or is that what the JVM is now for (as a "standard system interface"). (I'm half-joking with the last one, but I'm in a world of web and mobile commercial apps where few with the money or authority apparently care what is running between the metal and the application... and the JVM under Linux seems like the worst of both worlds, at least from a troubleshooting perspective.)<br> </div> Tue, 29 May 2018 14:37:13 +0000 Killing processes that don't want to die https://lwn.net/Articles/755852/ https://lwn.net/Articles/755852/ k8to <div class="FormattedComment"> The main upside to the ancient posix behavior is often someone wrote down a clear explanation of the behavior (typically R Stevens). The new interfaces coming along tend to have an out of date text file somewhere or a half-maintained website that eventually goes offline.<br> <p> Otherwise, sure, strict compliance with a spec that isn't really living anymore doesn't seem very valuable.<br> </div> Tue, 29 May 2018 07:44:11 +0000 Killing processes that don't want to die https://lwn.net/Articles/755845/ https://lwn.net/Articles/755845/ mrons <div class="FormattedComment"> These days I would use systemd cgroups to kill these rouge fork bombs.<br> <p> Back in the day, on a computer science teaching system, it would be sport for the students to try to make fork() bombs that were hard to kill (for the sys admin (me)).<br> <p> The example used in this article, where the PID is rapidly changing, was one such technique used by students. We used to call such fork bombs "comets".<br> <p> One amusing way I found to kill a comet was to use the limit of max users processes (ulimit -u). I would create a standard fork bomb, run as the rouge user, and exhaust the max number of processes the user could run. The comet would then no longer be able to fork(). Then I could killall the user processes to recover.<br> <p> So using a fork bomb to kill a fork bomb.<br> </div> Tue, 29 May 2018 04:35:04 +0000 kill a kernel thread https://lwn.net/Articles/755843/ https://lwn.net/Articles/755843/ willy <div class="FormattedComment"> No. It means if you ^C a read() and the task hasn't set a handler for SIGTERM, the task will die without waiting for the read to complete.<br> <p> If it did set a handler, the read() will block indefinitely as before.<br> </div> Tue, 29 May 2018 03:23:23 +0000 kill a kernel thread https://lwn.net/Articles/755840/ https://lwn.net/Articles/755840/ zlynx <div class="FormattedComment"> Well, when trying to use Nouveau drivers in a i915 Wayland session in Gnome and running anything more complex than glxgears with DRI_PRIME=1, the DRM subsystem locks up almost every time. It's not subtle or hard to reproduce.<br> </div> Tue, 29 May 2018 02:14:30 +0000 kill a kernel thread https://lwn.net/Articles/755835/ https://lwn.net/Articles/755835/ smurf <div class="FormattedComment"> Umm, no. revoke() is wanted for taking *devices* away from strange processes which might still be using them (like your tty or your microphone). Never intended for file systems.<br> </div> Mon, 28 May 2018 23:51:04 +0000 Killing processes that don't want to die https://lwn.net/Articles/755834/ https://lwn.net/Articles/755834/ smurf <div class="FormattedComment"> What's a "descendant process"? when its parent dies the child is "descended from" init, or the pid namespace's master. You can kill that instead, today. Problem solved. In Linux anyway.<br> </div> Mon, 28 May 2018 23:49:23 +0000 kill a kernel thread https://lwn.net/Articles/755832/ https://lwn.net/Articles/755832/ liam <div class="FormattedComment"> I believe that's one of the reasons why there have been numerous attempts to implement a revoke() syscall.<br> </div> Mon, 28 May 2018 23:33:32 +0000 Killing processes that don't want to die https://lwn.net/Articles/755821/ https://lwn.net/Articles/755821/ hawski <div class="FormattedComment"> I once wondered about this. I had an idea of a new syscall and I described it here: <a href="https://hadrianw.github.io/condemned-rfc/killbelow.2.html">https://hadrianw.github.io/condemned-rfc/killbelow.2.html</a><br> <p> Excerpt from it:<br> <p> <font class="QuotedText">&gt; int killbelow(int signal, int timeout);</font><br> &gt;<br> <font class="QuotedText">&gt; killbelow() sends the signal signal to all descendant processes. The timeout argument specifies the maximal interval in miliseconds to wait until there are no descendant processes.</font><br> &gt;<br> <font class="QuotedText">&gt; killbelow() is most useful if a process calling it is a reaper for its descendant processes. Reaper status can be aquired by using procctl(2) with PROC_REAP_ACQUIRE or prctl(2) with PR_SET_CHILD_SUBREAPER.</font><br> &gt;<br> <font class="QuotedText">&gt; Signal will be delivered to every descendant process even if its user ID is different from the process calling killbelow().</font><br> </div> Mon, 28 May 2018 21:03:34 +0000 kill a kernel thread https://lwn.net/Articles/755823/ https://lwn.net/Articles/755823/ blackwood <div class="FormattedComment"> We're extensively testing all the error paths and make sure (or try to at least) that all lock acquisition paths and anything blocking is interruptible. At least on the drm/i915 driver. Not being able to kill stuff after a gpu hang (especially when the driver didn't manage to recover the hw) is a bug. Reports would be appreciated.<br> </div> Mon, 28 May 2018 21:02:49 +0000