Handling argc==0 in the kernel [LWN.net]

Handling argc==0 in the kernel

Posted Jan 28, 2022 19:16 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (8 responses)

IMHO the kernel should only try to provide a meaningful value there if we can be 100% sure it will actually work (because if it only works 95% of the time, someone will depend on it until it breaks). Problems include:

* /proc might not be mounted at all, or it might be mounted somewhere other than /proc.
* / might not be / (e.g. chroot, containers, etc.), and so /proc might be inaccessible.
* The callee might try to pass the argv[0] string on to some other process, but then /proc/self will be interpreted as that process.
* If you try to fix the above using an absolute PID instead of "self," then you run into PID namespaces.
* There are a variety of situations where the called process's binary is not directly accessible to the called process as a regular file, and /proc/self/exe appears to be a broken symlink. The only way to fix this problem is for the caller to provide a suitable alternative path, if a functional argv[0] is desired at all.

Handling argc==0 in the kernel

Posted Jan 30, 2022 21:42 UTC (Sun) by developer122 (guest, #152928) [Link] (7 responses)

Would the case where you delete the executable file also fall into this?

Handling argc==0 in the kernel

Posted Feb 1, 2022 1:34 UTC (Tue) by NYKevin (subscriber, #129325) [Link] (6 responses)

I don't know the inner guts of procfs symlink resolution, but I imagine so.

Handling argc==0 in the kernel

Posted Feb 1, 2022 14:57 UTC (Tue) by mathstuf (subscriber, #69389) [Link] (5 responses)

I've actually played around with this before. So if you create an executable and check its `/proc/$pid/exe` (I think I was actually playing with `/proc/$pid/fd/$n`, but I imagine it's similar behavior). It is linked to `/path/to/exe`. If you delete it, the symlink now "points to" `/path/to/exe (deleted)`. If you then create *this* path, the symlink is still broken even though the "target path" exists if you manually use the `readlink` result. Creating the original path also keeps it in the "(deleted)" state IIRC.

Handling argc==0 in the kernel

Posted Feb 1, 2022 18:07 UTC (Tue) by nybble41 (subscriber, #55106) [Link] (4 responses)

> If you delete it, the symlink now "points to" `/path/to/exe (deleted)`.

Interesting fact: Even though the /proc/$PID/exe symlink is "broken" you can still open it, producing a reference to the original file. The /proc filesystem has its own special rules for resolving symlinks. However, attempting to restore the file by creating a hard-link with `ln` or `link` gives the error "Invalid cross-device link" even though the original (deleted) file and the target directory are on the same filesystem. (You can *copy* /proc/$PID/exe to a new file, though.)

Handling argc==0 in the kernel

Posted Feb 7, 2022 9:28 UTC (Mon) by jwilk (subscriber, #63328) [Link] (3 responses)

"ln /proc/$PID/exe foo" fails with EXDEV, because it's trying to hardlink the symlink that resides in /proc.

You probably wanted "ln -L /proc/$PID/exe foo" to dereference the symlink… which still doesn't work, but at least you get a more understandable error message. (The dereferencing happens on the kernel side, so it could work in principle.)

Handling argc==0 in the kernel

Posted Feb 7, 2022 18:27 UTC (Mon) by nybble41 (subscriber, #55106) [Link] (2 responses)

> You probably wanted "ln -L /proc/$PID/exe foo" to dereference the symlink...

Yes, I tried it both with and without the -L option—I suppose I should have mentioned the difference in the error message. The kernel specifically blocks the creation of a hard-link to a previously deleted (zero-refcount) inode. It does work, however, if the inode still exists in the filesystem due to hard-links, even if the original path was deleted and the /proc/$PID/fd/N symlink is broken.

Handling argc==0 in the kernel

Posted Feb 10, 2022 21:21 UTC (Thu) by Jandar (subscriber, #85683) [Link] (1 responses)

> The kernel specifically blocks the creation of a hard-link to a previously deleted (zero-refcount) inode.

If you an open the deleted /proc/pid/exe, why would linkat(fd, "", AT_FDCWD, "/path/for/file", AT_EMPTY_PATH); not work? A filedescriptor obtained with open( ".", , | O_RDWR ) has to have a zero st_nlink as well.

I have never done an open(,O_TMPFILE) + linkat(fd, "", AT_FDCWD,...) so this may be a misconception.

Handling argc==0 in the kernel

Posted Feb 11, 2022 0:29 UTC (Fri) by nybble41 (subscriber, #55106) [Link]

I also find the restrictions a bit surprising, but Linus's response[0] when someone proposed an flink() system call[1]—which would have been more-or-less equivalent to hard-linking from /proc/$PID/fd/$N, with the primary use-case involving creating links to previously unlinked files—was a flat rejection: "there is no way in HELL we can do this securely without major other incursions." I believe the result of this discussion was a number of new restrictions on hard-linking files via /proc.

There are various things you can still do with /proc/$PID/fd/$N which may prove surprising from a security point of view. For example, if a file descriptor is opened read-only and passed to another user over a socket, or inherited by a child process, then normally the recipient can only read from that FD, even if they would have had write access to the original file. However, the target process inheriting that file descriptor can open /proc/self/fd/$N *for writing* and modify the contents of the file, even without access to the original path, so long as permissions on the file itself are compatible:

user$ mkdir my_dir
user$ chmod 0700 my_dir
user$ cd my_dir
user$ echo test > testfile
user$ chmod 0666 testfile
user$ exec 3< testfile
user$ su -c 'su nobody -s /bin/bash' # can't use sudo here as by default it would close the FD
nobody$ ls -l /proc/self/fd/3
lr-x------ 1 nobody nogroup 64 Feb 10 00:00 /proc/self/fd/3 -> /home/user/my_dir/testfile
nobody$ cat /home/user/my_dir/testfile # no access via my_dir
cat: /home/user/my_dir/testfile: Permission denied
nobody$ cat /proc/self/fd/3
test
nobody$ echo attempt1 1>&3 # writing to the FD itself fails, because it was opened read-only
bash: echo: write error: Bad file descriptor
nobody$ echo attempt2 > /proc/self/fd/3 # but this succeeds!
nobody$ cat /proc/self/fd/3
attempt2
nobody$ mkdir /tmp/other_dir && cd /tmp/other_dir
nobody$ ln -L /proc/self/fd/3 testfile_link # this works despite /proc/sys/fs/protected_hardlinks because we have read and write access to the file
nobody$ ls -l testfile_link
-rw-rw-rw- 3 user user 9 Feb 10 00:00 testfile_link
nobody$ echo attempt3 > testfile_link
nobody$ cat testfile_link
attempt3
nobody$ exit
user$ cat testfile # confirm that the original file was changed
attempt3

The overall moral of this example is that if you're relying on restrictive permissions on a parent directory to restrict access to otherwise-writable files it may not work out quite the way you hoped.

[0] https://marc.info/?l=linux-kernel&m=104973707408577&...

[1] https://marc.info/?l=linux-kernel&m=104965452917349

Handling argc==0 in the kernel

Posted Jan 28, 2022 20:06 UTC (Fri) by dskoll (subscriber, #1630) [Link] (18 responses)

You don't need to muck about in /proc. The first argument to execve() is the pathname being executed; that could be placed as argv[0].

Handling argc==0 in the kernel

Posted Jan 28, 2022 20:20 UTC (Fri) by matthias (subscriber, #94967) [Link] (1 responses)

Why not just pass "", i.e. the empty string?

If a callee does not inspect argv[0], then any string is good. If a callee expects some non-empty string as argv[0], then calling with argv pointing to { NULL } does not work today. And there is no reason why it should work tomorrow. Doing anything else than aborting with an error would only be to be backwards compatible with the current situation.

The only reason this could break anything that is working now that I can images is, that some callee actually expects to be called with argc==0 and argv pointing to { NULL }. But this would be really weird.

Handling argc==0 in the kernel

Posted Jan 30, 2022 21:43 UTC (Sun) by developer122 (guest, #152928) [Link]

You could run into issues with programs that check argc. They may expect it to be zero, or they may expect it to be 1 with a valid path, but they may not expect it to be 1 with an invalid path.

Handling argc==0 in the kernel

Posted Jan 28, 2022 20:24 UTC (Fri) by khim (subscriber, #9252) [Link] (4 responses)

Nope. It couldn't. Since that one, too, may be /proc/self/exe (and, indeed there are one case where pattern exec("/proc/self/exe", NULL, NULL); is used).

So it's not bullet-proof and, I think, few examples where this Linux-only misfeature is exploited are not worth inventing crazy schemes.

Handling argc==0 in the kernel

Posted Jan 28, 2022 21:07 UTC (Fri) by dskoll (subscriber, #1630) [Link] (3 responses)

But if execve("/proc/self/exe", NULL, NULL); succeeds, under what circumstances could /proc/self/exe not be relied on in the execed program?

Handling argc==0 in the kernel

Posted Jan 28, 2022 21:15 UTC (Fri) by khim (subscriber, #9252) [Link] (2 responses)

When someone would call `realpath` on it? That never happened in real-world usecase so yeah, you are right, that may a way out.

Of course this would also break most tests anyway thus it's not clear if couple of real programs are worth that complexity.

Kernel config option looks more and more sensible: we know such programs are rare, but how rare exactly?

Handling argc==0 in the kernel

Posted Jan 28, 2022 21:26 UTC (Fri) by dskoll (subscriber, #1630) [Link] (1 responses)

realpath is an executable; I think you meant readlink(2). But my point is this: If the original execve call succeeds, then "/proc/self/exe" must still be valid in the exec'd program. if /proc/self/exe were not valid, then the execve call would fail.

Handling argc==0 in the kernel

Posted Jan 29, 2022 2:48 UTC (Sat) by comex (subscriber, #71521) [Link]

Not to get lost in the weeds, but realpath is also the name of a function; see `man 3 realpath`.

Handling argc==0 in the kernel

Posted Jan 28, 2022 20:31 UTC (Fri) by floppus (guest, #137245) [Link] (7 responses)

But if any setuid program actually relies on argv[0] being the name of the program, that's a severe bug (symlinks, TOCTOU, etc., never mind the fact that the caller can always specify any string they want.)

If the kernel were to silently replace argc==0 with argc==1, it seems like using an empty string would be most parsimonious. At least an empty string is guaranteed not to be a valid filename, whereas /proc/self/exe or the name of the executable might or might not be.

Handling argc==0 in the kernel

Posted Jan 28, 2022 21:11 UTC (Fri) by dskoll (subscriber, #1630) [Link]

I think either an empty string or a copy of the first argument to execve is a good choice.

Handling argc==0 in the kernel

Posted Jan 28, 2022 21:21 UTC (Fri) by khim (subscriber, #9252) [Link] (3 responses)

> But if any setuid program actually relies on argv[0] being the name of the program, that's a severe bug (symlinks, TOCTOU, etc., never mind the fact that the caller can always specify any string they want.)

How? All examples I have ever saw are using argv[0] only to multiplex binaries. They don't ever look on the actual binary which is specified in argv[0], rather, they are only interested in `basename`. Symlinks, TOCTOU and even the fact that caller can specify anything there are irrelevant: for them it's just an argument of the command line which happen to include name of the program.

And empty arguments have tendency to make programs wonky.

Handling argc==0 in the kernel

Posted Jan 28, 2022 22:12 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (2 responses)

> And empty arguments have tendency to make programs wonky.

setuid programs already need to defend against argv[0] == "" (or any other actual string, for that matter) because they must not trust the caller. POSIX explicitly specifies that the caller can set argv[0] to whatever it wants, and passing an invalid or incorrect argv[0] has been well understood for a long time as A Thing That Can Be Done (even though it's probably a bad idea in most cases). OTOH, the "there is no argv[0]" case is much less obvious and (I think) is not supported at all on some platforms, so it should not be too surprising that at least one setuid program failed to check for it.

So, from a security perspective, changing "the string doesn't exist" to "the string is empty" is the most straightforward way to close the security hole, since the callee is very likely already checking for an empty string (to the extent that it cares about argv[0] at all).

Handling argc==0 in the kernel

Posted Jan 28, 2022 22:41 UTC (Fri) by JoeBuck (subscriber, #2330) [Link]

Replacing argv[0] via an exec call was a common trick back in the days of students sharing some Vax machine and wanting to hide the fact that they were playing games from "ps".

Handling argc==0 in the kernel

Posted Feb 10, 2022 4:45 UTC (Thu) by rlhamil (guest, #6472) [Link]

And arguments against altering argv in the kernel don't make sense, since that already happens for interpreter execs, right?

That's a much more complex transformation than inserting argv[0]="" if argv is NULL or argv[0] is NULL.

Handling argc==0 in the kernel

Posted Jan 30, 2022 21:48 UTC (Sun) by developer122 (guest, #152928) [Link] (1 responses)

I think changing argc from 0 to 1 and adding an empty string will probably trip up legitimate programs. If argc is 0, then they may handle that by not checking argv and all is well. If argc is one, then they may expect a valid entry in argv[0]. It's still not great to require a valid name in argv[0], but regardless of what vulnerabilities may exist already I can see this causing more breakage to programs that attempt to handle argc==0 correctly.

Handling argc==0 in the kernel

Posted Feb 1, 2022 1:37 UTC (Tue) by NYKevin (subscriber, #129325) [Link]

As I explained in another comment, the caller can set argv[0] to any string it wishes, and a setuid program must not trust the caller (because the caller is unprivileged). So all setuid programs *must* behave correctly for invalid or "wrong" argv[0]. If they don't, then that's a separate vulnerability which must be fixed regardless of what the kernel does in the argc == 0 case. By coalescing the two cases into one case, you halve the number of potential vulnerabilities that application writers need to worry about.

Handling argc==0 in the kernel

Posted Feb 4, 2022 16:48 UTC (Fri) by sbaugh (guest, #103291) [Link] (2 responses)

>The first argument to execve() is the pathname being executed; that could be placed as argv[0].

Not when using execveat(fd, "", ..., AT_EMPTY_PATH)

Handling argc==0 in the kernel

Posted Feb 4, 2022 17:04 UTC (Fri) by adobriyan (subscriber, #30858) [Link]

d_path() can name any file.

Handling argc==0 in the kernel

Posted Feb 5, 2022 1:14 UTC (Sat) by mchapman (subscriber, #66589) [Link]

What is the process name -- as returned by prctl(PR_GET_NAME) -- when you do that? It seems like whatever it chooses would be an appropriate thing to default argv[0] to.

Handling argc==0 in the kernel

Posted Jan 29, 2022 0:06 UTC (Sat) by ariadne (subscriber, #138312) [Link]

In the argv construction case, I suggested {bprm->filename, NULL} which is even better than that. But the conversation shifted back to "lets just see if we can reject this entirely," so let's see how that plays out first.