Process ids again [LWN.net]

Process ids again

Posted Jun 6, 2025 16:51 UTC (Fri) by AClwn (subscriber, #131323) [Link] (10 responses)

OpenBSD randomizes its PIDs to reduce predictability and to avoid leaking information (i.e. preventing users from subtracting PIDs to determine how many processes had been created in the interim). Randomized PIDs would be incompatible with a no-reuse guarantee because you'd have to store (and search!) a table of previously-used PIDs.

The obvious retort is that Linux doesn't randomize PIDs and it never will, so the only things you lose by extending PIDs to 64 bits are (1) a little bit of space wherever they're stored and (2) an entire class of PID-reuse security vulnerabilities, and that this is a pretty good tradeoff. I have nothing to say to that; I just wanted to mention PID randomization.

Process ids again

Posted Jun 7, 2025 5:45 UTC (Sat) by epa (subscriber, #39769) [Link] (1 responses)

I suppose if you had a big enough space for the ids you could randomize and still have them in monotone order. Just leave random-sized gaps in the sequence. 64 bits might be enough that you could do that and still never run out of ids.

Process ids again

Posted Jun 22, 2025 9:10 UTC (Sun) by l0kod (subscriber, #111864) [Link]

FYI, Landlock IDs follow this approach. One important difference is that we only need a bijection (i.e. Landlock IDs are not used to get a reference but only to identify an existing object). From the commit description, Landlock IDs have these properties:
- They are unique during the lifetime of the running system thanks to
the 64-bit values: at worse, 2^60 - 2*2^32 useful IDs.
- They are always greater than 2^32 and must then be stored in 64-bit
integer types.
- The initial ID (at boot time) is randomly picked between 2^32 and
2^33, which limits collisions in logs across different boots.
- IDs are sequential, which enables users to order them.
- IDs may not be consecutive but increase with a random 2^4 step, which
limits side channels.

For more details, see https://git.kernel.org/torvalds/c/d9d2a68ed44bbae598a81cb...

Unique randomized wide PIDs

Posted Jun 8, 2025 9:20 UTC (Sun) by jreiser (subscriber, #11027) [Link] (6 responses)

> Randomized PIDs would be incompatible with a no-reuse guarantee because you'd have to store (and search!) a table of previously-used PIDs.

Um, no. A Linear Feedback Shift Register (LFSR) that is based on an irreducible polynomial guarantees uniqueness over its entire period, which is near to 2**N. Just initialize it to a random point in its sequence.

Unique randomized wide PIDs

Posted Jun 8, 2025 10:03 UTC (Sun) by dezgeg (subscriber, #92243) [Link] (5 responses)

Would that really help with anything though, given the sequence is still entirely predictable?

Unique randomized wide PIDs

Posted Jun 8, 2025 18:11 UTC (Sun) by bmenrigh (subscriber, #63018) [Link] (4 responses)

Yeah an LFSR wouldn’t be secure. However a customer block cipher (where the block size matches the PID size) eliminates the predictability issue. Just generate a random key at startup.

Unique randomized wide PIDs

Posted Jun 9, 2025 12:00 UTC (Mon) by bluca (subscriber, #118303) [Link] (3 responses)

Although it is mildly amusing watching the rediscovery of UUIDs in this thread (hint: it's what Windows uses), it is unnecessary: with the new kernel the pidfd inode number is guaranteed unique per boot, so we have a solution for this now (there's also the boot uuid, so the combination of boot uuid + pidfd inode id gives a universal unique identifier, if that is needed). PIDs can never be changed, as all existing software will break. No, it's not just a matter of "recompiling", existing binaries need to work too with new kernels, so that's not an option. But we are starting to expose pidfd inodes in more places now, so I guess they'll slowly take over for new functionality.

Unique randomized wide PIDs

Posted Jun 12, 2025 8:48 UTC (Thu) by donald.buczek (subscriber, #112892) [Link] (2 responses)

However, I wonder how userspace can easily determine whether a pidfd inode number comes from a system that guarantees uniqueness.

Unique randomized wide PIDs

Posted Jun 12, 2025 11:15 UTC (Thu) by bluca (subscriber, #118303) [Link] (1 responses)

There was a way, I forget the details, but on systems where it's not unique it's a fixed/hardcoded/well-known inode number, because it's an anonymous inode rather than from pidfdfs? Details are fuzzy so I might be getting this wrong

Unique randomized wide PIDs

Posted Jun 14, 2025 9:45 UTC (Sat) by donald.buczek (subscriber, #112892) [Link]

I found that you can use fstatfs() on the file descriptor and see if f_type == PID_FS_MAGIC ( 0x50494446; /* "PIDF" */ ). On an elder system it is ANON_INODE_FS_MAGIC (0x09041934)

Note, that although pidfd_open(2) says opening a "/proc/[PID]" directory would be an alternative way to get a PID file descriptor, this is only half true: You can use such a file descriptor with pidfd_* calls, but it is another type of file descriptor with f_type == PROC_SUPER_MAGIC ( 0x9fa0 ) and you can't use the inode number from that kind of file descriptor as a unique process identifier.

I still wish, processes had UUIDs.

Process ids again

Posted Jun 9, 2025 16:37 UTC (Mon) by dsfch (subscriber, #176007) [Link]

64bit PIDs would be an ABI break, right ? `pid_t` being 32bit is all over glibc & Co. as well as in `<uapi/asm-generic/posix_types.h>` so it's not just a trivial substitute ?

Process ids again

Posted Jun 6, 2025 16:52 UTC (Fri) by bluca (subscriber, #118303) [Link]

Because it would most likely break everything left and right. With the pidfdfs (say that quickly 3 times in a row) inode we have something close enough.

Process ids again

Posted Jun 6, 2025 17:09 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (5 responses)

Long numbers SUCK for UI purposes. What would help is a _separate_ handle-type process identifier that is guaranteed to be unique and can be even 128-bit long.

Process ids again

Posted Jun 6, 2025 20:42 UTC (Fri) by warrax (subscriber, #103205) [Link] (4 responses)

If you ever do anything other than copy/paste PIDs when using the command line (which I assume you mean by UI), I'd be a bit worried. Actually... I think it'd actually be better if it was hard-enough-to-type so that it MUST be copy/pasted. Killing a mistyped PID *should* do nothing.

Process ids again

Posted Jun 6, 2025 20:43 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

I absolutely do stuff other than act as a copypaste bot. Including looking at logs, and correlating things across multiple sources. I almost never need to actually kill processes by ID.

Long IDs make that harder.

Process ids again

Posted Jun 7, 2025 5:42 UTC (Sat) by epa (subscriber, #39769) [Link] (2 responses)

On the other hand, right now if you are looking for a process id in a log file, you don’t have any guarantee that it’s the same process, since pids can be reused. Making them unique has to be better, even if it means the number might be longer. (You’d only get the big numbers on a long-running system that has forked vast numbers of processes, and in that case I suggest you’d particularly benefit from not having a number reused.)

Even in ordinary command line use like “see a process id in top and then kill it” there is a race condition and some danger if pids are not unique.

Process ids again

Posted Jun 7, 2025 5:53 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link]

> On the other hand, right now if you are looking for a process id in a log file, you don’t have any guarantee that it’s the same process

That's true, but in practice infrequent, outside of deliberate attacks.

Process ids again

Posted Jun 7, 2025 6:51 UTC (Sat) by iabervon (subscriber, #722) [Link]

I often use them to identify the lines from some process that are interspersed with lines from other related processes. Occasionally, the same PID is a different process later, but only after the last line that is from the same process. It's also useful in REST API server logs: different lines from serving the same request have the same PID, but even if there isn't any PID reuse, the same process generally goes on to serve another request, but not until it's done with the first one.

Process ids again

Posted Jun 6, 2025 17:38 UTC (Fri) by Nahor (subscriber, #51583) [Link] (7 responses)

> Why don't we move to 64-bit process ids

What do you give to a process that still expects a 32-bit pid when the value does not fit?
And if we provide a 64-bit API while keeping the 32-bit values for a while for backward compatibility, how long should we wait before switching? ...And in the interim, the problem persists, even for applications that did update.

With pidfs, you solve the problem for modern applications *now*, while keeping backward compatibility for older code forever (or until we choose to remove support for 32-bit PIDs).

> surely if time_t could become 64 bit, we can do the same for pid_t.

You do know how a painful that switch was (and still is, since not everything code base has been updated yet), right?
Remember that this is not just about updating the kernel API, but also updating code wherever pids are used (internally in applications, or externally, i.e. storage, network, ...)

> they will never be used in the long tail of shell scripts and old code

If they won't be updated to pidfs, why do you believe they will be updated for 64-bit pids?

Process ids again

Posted Jun 7, 2025 2:48 UTC (Sat) by NYKevin (subscriber, #129325) [Link] (3 responses)

PID namespaces are already a thing, so you could (in theory) migrate a system gradually rather than all at once. In practice, I'm not sure how useful that is to any distro that is not NixOS. But I suppose it probably does also work for Flatpak-shaped things?

> What do you give to a process that still expects a 32-bit pid when the value does not fit?

Under the assumption that we're migrating individual PID namespaces rather than a system-wide setting, if a process is in a PID namespace that uses 64-bit PIDs, it should have been migrated to the new API already (or else userspace should not have enabled 64-bit PIDs for this namespace). If it nevertheless asks for a 32-bit PID by calling into the old 32-bit interface, then it gets -ENOSYS or some equivalent, and probably crashes.

Process ids again

Posted Jun 7, 2025 19:29 UTC (Sat) by Nahor (subscriber, #51583) [Link] (2 responses)

> PID namespaces

That looks like a big big can of worms.

How do non-updated apps and updated ones mix? Say an updated shell trying to start an old app or vice-versa?
Or do you expect the user to use different launcher/shell and choose which to interact with depending on what type of apps they use? And have different binaries with different pid size for apps used in both (shell, launcher, UI, ssh, ...)?
How do apps communicates pids with each other if they are not in the same namespace? Say someone uses an updated "top" command and thus gets 64-bit pids, then try to use the shell's builtin "kill" command which is still expecting 32-bit pid?

What/who decides what namespace to use? The kernel? The shell/launcher? The user? How does it/he/she know what namespace to use?

Namespaces work well if a whole ecosystem can be independent from everything else wrt to that namespace. They also work because only values changes, not the types, the binaries are the same (i.e. a shell in one namespace can work as well in another, they will just print different values for pids, or see different files, ...)

Process ids again

Posted Jun 8, 2025 3:16 UTC (Sun) by NYKevin (subscriber, #129325) [Link] (1 responses)

The idea would be, broadly speaking, all namespaces are 64-bit capable, but by default they only generate PIDs in the 32-bit compatible range. You can use the new 64-bit API for everything and it always works, and you can use the old API if you're in a namespace that is limited to the 32-bit range. A 64-bit namespace may contain 32-bit children, but not vice-versa.

The answers to most of your more specific questions can be summarized as "the distro can do what it sees fit, and if it chooses to do nothing, then it continues to use 32-bit PIDs for everything indefinitely."

Process ids again

Posted Jun 8, 2025 3:20 UTC (Sun) by NYKevin (subscriber, #129325) [Link]

Addendum: Some of your questions are better answered by referring you to pid_namespaces(7), which see.

Process ids again

Posted Jun 7, 2025 5:38 UTC (Sat) by epa (subscriber, #39769) [Link] (2 responses)

A shell script needs no update for 64-bit process ids. It’s just a slightly longer string than before. A C program using pid_t may just need a recompile, though of course there will be some code putting a process id into an int (I imagine a compiler warning can mostly catch this).

Rewriting a C program to use pidfds is a much bigger task, and rewriting a shell script with them is essentially impossible.

Scripting languages like Perl, Python, and Tcl would usually just need the interpreter itself recompiled for 64-bit pids and existing scripts will work unchanged.

Process ids again

Posted Jun 7, 2025 18:58 UTC (Sat) by Nahor (subscriber, #51583) [Link]

> A shell script needs no update for 64-bit process ids

Most won't, but a script can still make assumption on the pid size, e.g. it only contains 10 digits.

> A C program using pid_t may just need a recompile

Keyword "may".
And even for the simple cases, this depends a lot on how 64-bits pids would be implemented, e.g. would this be a compilation flag? Or a "#define USE_PID64"? Or would this be changing all the "pid_t"/"getpid()"/... to "pid64_t"/"getpid64()"/...?

And in the non-simple cases, the issues are the same for 64-bits pids and pidfds (apps using pid in a "smart" way will be majorly broken, pid64/pidfd cannot be passed as-is when communicating a 32-bit pid apps, ...).

> I imagine a compiler warning can mostly catch this

Only in simple cases, maybe. And AFAIK, currently, compilers will not complain when storing a in64_t in an int32_t without the "-Wconversion" flag (which is not enabled even when using "-Wall -Wextra -pedantic"). And even "-Wconversion" will not complain if there is a cast involved. https://godbolt.org/z/z3jGbYb86

> there will be some code putting a process id into an int

Or putting it in the low bits of an int64_t then use the high bits for something else.
Or assume that a struct containing a pid has a specific size. Or that fields in that struct after the pid are at specific offsets.
Or ... (don't underestimate what people do when they assume something will be true forever)...

Basically, one can look at what happened during then transition from 32-bit to 64-bit platforms, the switch to large file (>4GB), and the Y38 problems, to see all the possible issues than can arise.

> Rewriting a C program to use pidfds is a much bigger task

I'm not so sure. Since a pidfd is actually a pid_t type, and depending on what pid/pidfd are used for, updating could boil down to calling "pidfd_xyz()" instead of "xyz()", or passing a "XYZ_PIDFD" flag.

For the rest, that can be a big task to fix in either case. For instance, if the problem is someone combining the pid with something else in an int64_t, then a pidfd will still work fine, while the pid64 will need a redesign.
Or people might make the same assumption that you did, that pid64 is just a recompilation, then spend a lot of time tracking down bugs. While with pidfs, they would spend time looking at each call sites first, fixing the problems before they arise and need tracking.
Which one will take more time will depend on the applications. Sometimes it's faster to think things through first, some others it faster to just try&fix. This very much apply here IMHO.

Process ids again

Posted Jul 4, 2025 13:51 UTC (Fri) by judas_iscariote (guest, #47386) [Link]

> A C program using pid_t may just need a recompile

You are assuming a properly carefully wrritten program..there is still code out there that assumes pids are 16 bit and store them in a ushort..There is incorrect casting, there is code not using pid_t at all.. I mean there is a lot of buggy software out there...

Process ids again

Posted Jun 7, 2025 19:05 UTC (Sat) by donald.buczek (subscriber, #112892) [Link] (6 responses)

> Why don't we move to 64-bit process ids, and guarantee that they are not reused except after a reboot

Or a UUID for every process. While pidfs inode numbers (or a hypothetical 64bit pid) are good for the lifetime of the system, you sometimes want to persist identifiers for longer than that, for example in a database or on another system. We have this problem for a cluster queuing system where the job controller is designed to be restartable and has to identify which of its jobs, which are stored in a sql database, are still alive and which aren't. Our current solution is so ugly that I don't want to mention it and our next solution might be to consider boot_id + pidfs inode number. Any better ideas someone?

Process ids again

Posted Jun 7, 2025 19:37 UTC (Sat) by snajpa (subscriber, #73467) [Link] (5 responses)

I think if you keep adding more context from /proc/<pid>, you should arrive at sufficiently reliable input for some good enough hash function. Start time of the process can't be changed by the process itself AFAIK. Just that along with the pid could do the trick, IMHO. Save the hash when the task is started, maybe assert that it hasnt changed while doing a dev build shutdown... cca what I would do

Process ids again

Posted Jun 9, 2025 12:14 UTC (Mon) by bluca (subscriber, #118303) [Link] (4 responses)

> Start time of the process can't be changed by the process itself AFAIK. Just that along with the pid could do the trick, IMHO.

It very much doesn't, so-much-so that relying on that combination for uniqueness caused several CVEs in the past. The start time is not granular enough, and attackers are able to cause a PID + start time clash at their leisure. This is why PIDFDs exist, and we use them when we need to uniquely identify processes for any security-relevant reason (and also more and more non-security-relevant too)

Process ids again

Posted Jun 9, 2025 20:10 UTC (Mon) by snajpa (subscriber, #73467) [Link] (3 responses)

Are you able to link simply just a single CVE that proves it isn't sufficient in the real practical world? You know, hash algos are bound to have collisions too, yet we use them. Taking an argument to an extreme isn't helpful.

Besides, how are you going to use pidfds in this specific case you are replying to? Much confidence in your reply, let's see if you can back that confidence up with something.

Process ids again

Posted Jun 9, 2025 20:26 UTC (Mon) by bluca (subscriber, #118303) [Link] (2 responses)

You can start from CVE-2019-6133 and continue from there.

The combination of pidfd inode id plus boot uuid can uniquely identify a process across machines/reboots/everything, so it is suitable for that use case.

Process ids again

Posted Jun 10, 2025 16:11 UTC (Tue) by snajpa (subscriber, #73467) [Link] (1 responses)

I'm gonna solve this by muting you, as your whole reaction is just to prove... you can't read or fit into context of what you're replying to, I can only feel your need to be right every single time we interact here. Context be damned, right...

Process ids again

Posted Jun 10, 2025 16:13 UTC (Tue) by bluca (subscriber, #118303) [Link]

Suit yourself. You asked for a CVE, it's been provided. You asked for a solution for a problem, it's been provided. If you can't handle receiving answers, maybe stop asking questions?