Handling argc==0 in the kernel

Posted Feb 10, 2022 21:21 UTC (Thu) by Jandar (subscriber, #85683)
In reply to: Handling argc==0 in the kernel by nybble41
Parent article: Handling argc==0 in the kernel

> The kernel specifically blocks the creation of a hard-link to a previously deleted (zero-refcount) inode.

If you an open the deleted /proc/pid/exe, why would linkat(fd, "", AT_FDCWD, "/path/for/file", AT_EMPTY_PATH); not work? A filedescriptor obtained with open( ".", , | O_RDWR ) has to have a zero st_nlink as well.

I have never done an open(,O_TMPFILE) + linkat(fd, "", AT_FDCWD,...) so this may be a misconception.

Handling argc==0 in the kernel

Posted Feb 11, 2022 0:29 UTC (Fri) by nybble41 (subscriber, #55106) [Link]

I also find the restrictions a bit surprising, but Linus's response[0] when someone proposed an flink() system call[1]—which would have been more-or-less equivalent to hard-linking from /proc/$PID/fd/$N, with the primary use-case involving creating links to previously unlinked files—was a flat rejection: "there is no way in HELL we can do this securely without major other incursions." I believe the result of this discussion was a number of new restrictions on hard-linking files via /proc.

There are various things you can still do with /proc/$PID/fd/$N which may prove surprising from a security point of view. For example, if a file descriptor is opened read-only and passed to another user over a socket, or inherited by a child process, then normally the recipient can only read from that FD, even if they would have had write access to the original file. However, the target process inheriting that file descriptor can open /proc/self/fd/$N *for writing* and modify the contents of the file, even without access to the original path, so long as permissions on the file itself are compatible:

user$ mkdir my_dir
user$ chmod 0700 my_dir
user$ cd my_dir
user$ echo test > testfile
user$ chmod 0666 testfile
user$ exec 3< testfile
user$ su -c 'su nobody -s /bin/bash' # can't use sudo here as by default it would close the FD
nobody$ ls -l /proc/self/fd/3
lr-x------ 1 nobody nogroup 64 Feb 10 00:00 /proc/self/fd/3 -> /home/user/my_dir/testfile
nobody$ cat /home/user/my_dir/testfile # no access via my_dir
cat: /home/user/my_dir/testfile: Permission denied
nobody$ cat /proc/self/fd/3
test
nobody$ echo attempt1 1>&3 # writing to the FD itself fails, because it was opened read-only
bash: echo: write error: Bad file descriptor
nobody$ echo attempt2 > /proc/self/fd/3 # but this succeeds!
nobody$ cat /proc/self/fd/3
attempt2
nobody$ mkdir /tmp/other_dir && cd /tmp/other_dir
nobody$ ln -L /proc/self/fd/3 testfile_link # this works despite /proc/sys/fs/protected_hardlinks because we have read and write access to the file
nobody$ ls -l testfile_link
-rw-rw-rw- 3 user user 9 Feb 10 00:00 testfile_link
nobody$ echo attempt3 > testfile_link
nobody$ cat testfile_link
attempt3
nobody$ exit
user$ cat testfile # confirm that the original file was changed
attempt3

The overall moral of this example is that if you're relying on restrictive permissions on a parent directory to restrict access to otherwise-writable files it may not work out quite the way you hoped.

[0] https://marc.info/?l=linux-kernel&m=104973707408577&...

[1] https://marc.info/?l=linux-kernel&m=104965452917349