Cook: security things in Linux v5.1
Now /proc/$pid can be opened and used as an argument for sending signals with the new pidfd_send_signal() syscall. This handle will only refer to the original process at the time the open() happened, and not to any later 'reused' pid if the process dies and a new process is assigned the same pid. Using this method, it’s now possible to racelessly send signals to exactly the intended process without having to worry about pid reuse. (BTW, this commit wins the 2019 award for Most Well Documented Commit Log Justification.)"
Posted May 29, 2019 18:35 UTC (Wed)
by epa (subscriber, #39769)
[Link] (6 responses)
Posted May 29, 2019 19:14 UTC (Wed)
by josh (subscriber, #17465)
[Link] (5 responses)
Posted May 30, 2019 3:26 UTC (Thu)
by wahern (subscriber, #37304)
[Link] (2 responses)
Posted May 30, 2019 15:43 UTC (Thu)
by mgedmin (subscriber, #34497)
[Link] (1 responses)
Posted May 30, 2019 16:21 UTC (Thu)
by nybble41 (subscriber, #55106)
[Link]
No, the O_DIRECTORY flag just causes the open() call to fail if the path does not refer to a directory. You can open directories without it. The shell has no issues redirecting from a directory rather than a file. You can test that yourself easily:
Posted May 30, 2019 4:53 UTC (Thu)
by epa (subscriber, #39769)
[Link] (1 responses)
Posted May 30, 2019 5:00 UTC (Thu)
by epa (subscriber, #39769)
[Link]
Posted May 29, 2019 21:01 UTC (Wed)
by rweikusat2 (subscriber, #117920)
[Link] (20 responses)
Posted May 29, 2019 21:15 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link] (13 responses)
Posted May 29, 2019 21:51 UTC (Wed)
by rweikusat2 (subscriber, #117920)
[Link] (12 responses)
Putting this in another way, processes with the given name which are started after listProcess returned and before the loop terminates will only be killed if they happen to reuse a pid which hasn't already been used in the loop.
Posted May 29, 2019 21:54 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link] (11 responses)
My code does exactly what the contract of pkill is: killing processes by name.
> Putting this in another way, processes with the given name which are started after listProcess returned and before the loop terminates will only be killed if they happen to reuse a pid which hasn't already been used in the loop.
Yet my code guarantees that any matching processes that existed before the pkill launch will be killed.
Posted May 29, 2019 23:50 UTC (Wed)
by ms-tg (subscriber, #89231)
[Link] (9 responses)
My understanding of the race condition being asked about:
2. a process with name: “foo” and pid: 42 is filtered by the lambda from that list
3. This process exits. A new process named “dont_kill_me” is started and received recycled pid 42
4. The pid fd for 42 is opened. It refers to dont_kill_me
5. This new process is killed
I’d be super interested in learning more about:
(a) does the pseudocode as presented actually avoid suffering from this race, and how?
(b) is there something offensive about asking about this?
Thanks!
Posted May 29, 2019 23:57 UTC (Wed)
by ms-tg (subscriber, #89231)
[Link] (1 responses)
Is that context correct? At least that’s the inference I made about the relevance of a parent process.
Posted May 30, 2019 12:55 UTC (Thu)
by rweikusat2 (subscriber, #117920)
[Link]
[verbal redundancies due to LWN blocking 'yes' as an answer ...]
Posted May 30, 2019 0:05 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link] (6 responses)
> (a) does the pseudocode as presented actually avoid suffering from this race, and how?
In this case the pid_open() call will return the handle to that incorrect process. However, this is fine because the next check for checkProcess() will fail and nothing is going to be killed.
If this check does succeed then there are two options:
> (b) is there something offensive about asking about this?
Posted May 30, 2019 9:36 UTC (Thu)
by moltonel (guest, #45207)
[Link]
Might as well not do the initial matching then ? Just go through the unfiltered process list, open /proc/<pid>, match using content of /proc/<pid>/, and kill if it matches ? I imagine performance cost of opening /proc/<pid> is dwarfed by the general cost of process matching ?
The commit message doesn't say clearly whether "the entire procfs directory remains visible or just one entry" when /proc/<pid> is opened (it says "same as happens right now" but I don't know what that is). I'm tempted to understand that the whole directory remains visible (otherwise the patch only barely reduces the RC window), but at the same time the message says that opening /proc/<pid> doesn't reserve the pid, so I wonder.
Posted May 30, 2019 12:07 UTC (Thu)
by rweikusat2 (subscriber, #117920)
[Link] (4 responses)
It's probably possible to prevent this somehow but at least, this wasn't mentioned so far.
Posted May 30, 2019 16:12 UTC (Thu)
by FLHerne (guest, #105373)
[Link] (3 responses)
Posted May 30, 2019 17:21 UTC (Thu)
by rweikusat2 (subscriber, #117920)
[Link] (2 responses)
Posted May 30, 2019 22:57 UTC (Thu)
by cyphar (subscriber, #110703)
[Link] (1 responses)
Also, pkill is a toy example. The real users of this syscall are more complicated systems which are sure that the pidfd is correct, and need to store it somewhere so they can detect the process dying properly (and send it signals if its not dead). Container runtimes and init systems are obvious contenders.
Posted May 31, 2019 17:28 UTC (Fri)
by rweikusat2 (subscriber, #117920)
[Link]
I also didn't come up with the pkill example, I made a general statement about pidfd open calls in absence of some guarantee that the process the pid originally referred to didn't terminate in the meantime. pkill was presentend as an example where one can claim that the race wouldn't matter in practice for more or less good reasons.
[*] Both the "process terminates and pid gets reused" and the "process starts to execute a different program" races could be avoided without any new system calls by sending a SIGSTOP to a process before checking its attributes, BTW.
Posted May 30, 2019 0:38 UTC (Thu)
by rweikusat2 (subscriber, #117920)
[Link]
Posted May 29, 2019 21:33 UTC (Wed)
by sbaugh (guest, #103291)
[Link] (5 responses)
Using pkill to kill processes based on name is a *bad idea*, and it's always been a bad idea, and it remains a bad idea even with pidfd_send_signal. There's nothing preventing some random other process from having the same name and getting caught up in your killing spree.
From this standpoint, you're completely right that the now "reliable" pkill is still essentially random.
The real benefit of pidfd requires coordination with the parent of the process, as you say.
As one example: pidfds can be passed around over Unix sockets, so a process can spawn a child, pass the pidfd for that child to some other supervisor process, and exit. This supervisor process then can actually kill the child process without having a race condition! This can be provided as a service to the rest of the system as well - rather than reading a pid out of a pidfile and killing it, or using pkill to kill all processes with a specific name, I can contact some supervisor and say, "kill this process", uniquely identified via whatever means. I could even ask the supervisor to give me the pidfd, and do the killing myself!
But all this reliability still requires coordination with the parent at start time. Otherwise it's not possible for me to uniquely identify that specific process, no matter how unique I make the name, plus the executable, plus whatever else.
Posted May 29, 2019 21:54 UTC (Wed)
by rweikusat2 (subscriber, #117920)
[Link] (2 responses)
Posted May 29, 2019 23:03 UTC (Wed)
by roc (subscriber, #30627)
[Link] (1 responses)
The specification "all matching processes 'which started before or while pkill is executing' have been signaled or died" is silly because it can't be implemented: in general a matching process could start at the moment just before pkill calls exit(). It's also silly because it's useless: a matching process could start the moment *after* pkill calls exit(), so you couldn't depend on no matching process running.
Posted May 30, 2019 0:35 UTC (Thu)
by rweikusat2 (subscriber, #117920)
[Link]
Posted May 29, 2019 22:51 UTC (Wed)
by roc (subscriber, #30627)
[Link] (1 responses)
Posted May 29, 2019 23:41 UTC (Wed)
by sbaugh (guest, #103291)
[Link]
I do use pkill myself on my dev boxes, but let's not pretend pkill is a good way to do things. It's a stopgap measure that we use only because the process supervision mechanisms on Unix are insufficient. I think it's entirely possible that with further development of pidfd, along with various other mechanisms, we can replace pkill with more reliable, powerful, and easier to use tools.
For example, perhaps a tool that kills all the processes started from this terminal or its children, even if those processes have been reparented to init, called setsid, or whatever. That would solve pretty much all the cases I use pkill for.
Posted May 31, 2019 23:01 UTC (Fri)
by peda (subscriber, #5285)
[Link] (16 responses)
Fixing the spelling in a comment (or adding a comment) is not precisely the same thing as a patch for a "missing break" in my book...
Posted May 31, 2019 23:11 UTC (Fri)
by rahulsundaram (subscriber, #21946)
[Link] (2 responses)
Posted Jun 1, 2019 7:02 UTC (Sat)
by peda (subscriber, #5285)
[Link] (1 responses)
The description made it sound like the full list was patches fixed real bugs resulting from all the boring work, when in fact most of the patches /are/ just the boring work. Only a fraction of the patches fix real bugs. It is far more interesting to see the fruition of the boring work, rather than the work itself.
Posted Jun 3, 2019 22:30 UTC (Mon)
by rahulsundaram (subscriber, #21946)
[Link]
I fundamentally disagree with this as well as the notion that there was any misrepresentation. It is great to see all the work being done.
Posted Jun 3, 2019 22:02 UTC (Mon)
by bnorris (subscriber, #92090)
[Link] (12 responses)
* restricted to v5.0..v5.1, I only found a single commit that fixed spelling:
What "most" are you looking at?
Posted Jun 4, 2019 17:59 UTC (Tue)
by rweikusat2 (subscriber, #117920)
[Link] (10 responses)
Posted Jun 4, 2019 18:44 UTC (Tue)
by Cyberax (✭ supporter ✭, #52523)
[Link] (9 responses)
Posted Jun 4, 2019 18:51 UTC (Tue)
by rweikusat2 (subscriber, #117920)
[Link] (8 responses)
That's a statement-of-fact whose content you may not like but it's not going to change because of this.
Posted Jun 4, 2019 18:56 UTC (Tue)
by Cyberax (✭ supporter ✭, #52523)
[Link] (6 responses)
> Instead, it 'uncovered' that the vast majority of of switch cases without break were correct.
At this point any opposition to this feature is basically boneheaded politics.
Posted Jun 4, 2019 19:53 UTC (Tue)
by rweikusat2 (subscriber, #117920)
[Link] (5 responses)
As to the "feature": As I already wrote when this nonsense came up for the last time, the sensible way to handle this situation would be to add a multiway conditional without fallthrough to the language. This would do away with the need to use break at all unless one specifically needs fallthrough, thereby actually eliminating the problem. Eliminating the default policy of the existing construct just means more text to type for programmers, thus increasing and not decreasing the likeliness of compile-time errors while changing absolutely nothing about run-time errors as both variants can always be used incorrectly. This will still need testing and code-audits.
Posted Jun 4, 2019 20:21 UTC (Tue)
by Cyberax (✭ supporter ✭, #52523)
[Link] (4 responses)
The only realistic way to have reliable systems is to plan for failure and make sure that the system can fail safely.
> As to the "feature": As I already wrote when this nonsense came up for the last time, the sensible way to handle this situation would be to add a multiway conditional without fallthrough to the language.
Posted Jun 4, 2019 21:31 UTC (Tue)
by rweikusat2 (subscriber, #117920)
[Link] (3 responses)
You should perhaps consider jettisoning the "discussion tactic" of asserting that whoever disagrees with one of your opinions must be "somehow mad". This not only ought to qualify as abusive, it also really just communicates that you have no arguments in favour of your standpoint and thus, prefer attacking people who disagree with it. In the real word, not in the fuzztesting playpen, Linux has a well-deserved reputation for reliability and that's why it's ubquitiously being used. For instance, in Android phones. Like any other complex programs, it also has bugs.
Considering that the fallthough comment overload is a language extension, adding a language extension adressing what's claimed to be the actual issue is obviously possible. It may not be a realistic option for ISO-C but Linux uses lots of language extensions already. I would wholeheartedly welcome this as it would eliminate the need to write lots of break-statements for the most common case. In contrast to this, forcing people to write yet more semantically pointless text in order to get a program through the compiler just means "more syntax errors beause of typos", as I already wrote.
Posted Jun 4, 2019 21:57 UTC (Tue)
by mjg59 (subscriber, #23239)
[Link]
Posted Jun 7, 2019 0:18 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link] (1 responses)
I've worked with Linux a lot, working with clusters with hundreds of thousands of nodes and Linux failures were commonplace. OOPSes, hangups, livelocks, you name it. So yes, Linux is not reliable.
This is fine, engineering is a science of building reliable components from unreliable parts. You just have to plan for the worst case.
> Considering that the fallthough comment overload is a language extension, adding a language extension adressing what's claimed to be the actual issue is obviously possible.
First, we need to find a new keyword (not easy in itself) to replace "switch". Let's settle on "new_switch". This construction will require "fallthrough" keywords (again...) at the end of cases.
How do we deploy it? Linux supports quite old compilers, so you can't just mandate GCC 11 for it. Well, macros come to rescue. There's going to be a #define for the switch statement and then a #define for the fallthrough. And then you'll have to change ALL the statements in Linux to use the new statement, resulting in an even bigger patch.
And it'll take more than several years for the committee to agree on it.
With the current solution the change can be done NOW, without any performance or usage impact. With lots of actual bugfixes.
Posted Jun 11, 2019 19:23 UTC (Tue)
by nix (subscriber, #2304)
[Link]
It's not as if rweikusat's solution is a solution in any case: as you note, using any switch statement replacement would require going through all of them and changing them, inspecting every case to see if it needs to be fallthrough or not. This is exactly the same as what is being done now, except with years of added pointless delay in which actual bugs go disregarded. The perfect really *is* the enemy of the good in this case.
Posted Jun 4, 2019 19:42 UTC (Tue)
by tao (subscriber, #17563)
[Link]
In other words, it 'uncovered' that there were indeed switch cases without break that weren't correct, and that would've gone unnoticed
This patch set, when all annotations are in place, will allow all fall-throughs without such comments to be treated as errors, thus eliminating a whole class of bugs. I'd say that's pretty nice for the oh so big inconvenience for flawless programmers such as yourself that have to go through the extreme extra job of making every fall-through explicit.
Posted Jun 4, 2019 19:43 UTC (Tue)
by bnorris (subscriber, #92090)
[Link]
Ah, so I see there were multiple links:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/...
The first shows all the noisy patches that don't fix real bugs, where the latter showed true bugfixes. It's possible there was a miscommunication in the OP to which I originally replied. (It's also possible the blog has been updated in the meantime.) As I see it, the "missing break" link was not misleading: it lists a moderate number of real bugfixes.
But indeed, there were many more patches to silence the warnings than there were patches to fix bugs. Such is life when cleaning up the edges of a language like C.
Cook: security things in Linux v5.1
Cook: security things in Linux v5.1
Cook: security things in Linux v5.1
Cook: security things in Linux v5.1
Cook: security things in Linux v5.1
$ ls -l /proc/self/fd 3</etc
total 0
lrwx------ 1 user group 64 May 30 11:16 0 -> /dev/pts/N
lrwx------ 1 user group 64 May 30 11:16 1 -> /dev/pts/N
lrwx------ 1 user group 64 May 30 11:16 2 -> /dev/pts/N
lr-x------ 1 user group 64 May 30 11:16 3 -> /etc/
lr-x------ 1 user group 64 May 30 11:16 4 -> /proc/NNNN/fd/
Cook: security things in Linux v5.1
Cook: security things in Linux v5.1
Cook: security things in Linux v5.1
> Unless the process doing the open is the parent of the process referred to by the process ID, there's no way to determine if the process the pid originally referred to is still the process it's referring to at the time of the open.Cook: security things in Linux v5.1
What is this obsession with the parent process?
Here's pseudocode for absolutely reliable pkill:
processes = listProcesses(filter=lambda n: n.name == target)
for p in processes:
pidfd = pid_open("/proc/%d", p.pid)
if hasProcess(name=target, pid=p.pid): // Check that the invariant still holds
pidfd_kill(pidfd, SIGKILL)
close(pidfd)
It's guaranteed only to kill processes with the name matching the "target" name.
Cook: security things in Linux v5.1
Cook: security things in Linux v5.1
Because you're stupidly wrong?
This is true with ANY variant of pkill. A matching process can be started at the moment pkill exits and it'll be missed.
Cook: security things in Linux v5.1
1. listProcesses is called. Imagine it atomically returns a set of (name, pid) tuples alive at the time of the call
Cook: security things in Linux v5.1
Cook: security things in Linux v5.1
Cook: security things in Linux v5.1
Because rweikusat2 seems obsessed with insisting that the old Unix API is the only source of truth.
Yes. Imagine that one of the initially matched processes dies and is replaced by another process.
1) The new process also matches the target name and so it's OK to kill it (according to pkill's contract).
2) The new process has actually died in the time between the pid_open and checkProcess, and it's been replaced by a new process that matches the target. In this case the further pidfd_kill() call will be a no-op (since its target is dead), but this is still fine within the pkill's contract.
No.
Cook: security things in Linux v5.1
>
> In this case the pid_open() call will return the handle to that incorrect process. However, this is fine because the next check for checkProcess() will fail and nothing is going to be killed.
>
> If this check does succeed then there are two options:
> 1) The new process also matches the target name and so it's OK to kill it (according to pkill's contract).
> 2) The new process has actually died in the time between the pid_open and checkProcess, and it's been replaced by a new process that matches the target. In this case the further pidfd_kill() call will be a no-op (since its target is dead), but this is still fine within the pkill's contract.
Cook: security things in Linux v5.1
Cook: security things in Linux v5.1
Cook: security things in Linux v5.1
Cook: security things in Linux v5.1
Cook: security things in Linux v5.1
Cook: security things in Linux v5.1
Cook: security things in Linux v5.1
Cook: security things in Linux v5.1
Cook: security things in Linux v5.1
Cook: security things in Linux v5.1
Cook: security things in Linux v5.1
Cook: security things in Linux v5.1
Cook: security things in Linux v5.1
Cook: security things in Linux v5.1
Cook: security things in Linux v5.1
Cook: security things in Linux v5.1
Cook: security things in Linux v5.1
09186e503486 security: mark expected switch fall-throughs and add a missing break
but it also adds a missing break. There weren't many other "missing break" patches in that range.
* another arbitrary sampling of those commits turned up zero spelling fixes.
Cook: security things in Linux v5.1
Cook: security things in Linux v5.1
Cook: security things in Linux v5.1
Cook: security things in Linux v5.1
And it is.
And a lot of incorrect ones. So actual bugs were fixed and a mechanism to prevent this entire class of bugs has been instituted. Moreover, it's completely zero-cost at runtime.
Cook: security things in Linux v5.1
Cook: security things in Linux v5.1
You clearly have a rosy view of the world. In the real world out here, Linux crashes quite easily by running perf fuzz tests, for example. Or you can look at the never-ending list of CVEs.
This is not a realistic option.
Cook: security things in Linux v5.1
Cook: security things in Linux v5.1
Cook: security things in Linux v5.1
Yeah, it's so great that Google is now developing a whole new OS.
OK. So let's see how it would work.
Cook: security things in Linux v5.1
Cook: security things in Linux v5.1
otherwise, and that there was a non-zero chance of more such bugs being merged as time goes by.
Cook: security things in Linux v5.1
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/...