|
|
Subscribe / Log in / New account

Slowing the flow of core-dump-related CVEs

By Jonathan Corbet
June 6, 2025
The 6.16 kernel will include a number of changes to how the kernel handles the processing of core dumps for crashed processes. Christian Brauner explained his reasons for doing this work as: "Because I'm a clown and also I had it with all the CVEs because we provide a **** API for userspace". The handling of core dumps has indeed been a constant source of vulnerabilities; with luck, the 6.16 work will result in rather fewer of them in the future.

The problem with core dumps

A core dump is an image of a process's data areas — everything except the executable text; it can be used to investigate the cause of a crash by examining a process's state at the time things went wrong. Once upon a time, Unix systems would routinely place a core dump into a file called core in the current working directory when a program crashed. The main effects of this practice were to inspire system administrators worldwide to remove core files daily via cron jobs, and to make it hazardous to use the name core for anything you wanted to keep. Linux systems can still create core files, but are usually configured not to.

An alternative that is used on some systems is to have the kernel launch a process to read the core dump from a crashing process and, presumably, do something useful with it. This behavior is configured by writing an appropriate string to the core_pattern sysctl knob. A number of distributors use this mechanism to set up core-dump handlers that phone home to report crashes so that the guilty programs can, hopefully, be fixed.

This is the "**** API" referred to by Brauner; it indeed has a number of problems. For example, the core-dump handler is launched by the kernel as a user-mode helper, meaning that it runs fully privileged in the root namespace. That, needless to say, makes it an attractive target for attackers. There are also a number of race conditions that emerge from this design that have led to vulnerabilities of their own.

See, for example, this recent Qualys advisory describing a vulnerability in Ubuntu's apport tool and the systemd-coredump utility, both of which are designed to process core dumps. In short, an attacker starts by running a setuid binary, then forcing it to crash at an opportune moment. While the core-dump handler is being launched (a step that the attacker can delay in various ways), the crashed process is killed outright with a SIGKILL signal, then quickly replaced by another process with the same process ID. The core-dump handler will then begin to examine the core dump from the crashed process, but with the information from the replacement process.

That process is running in its own attacker-crafted namespace, with some strategic environmental changes. In this environment, the core-dump handler's attempt to pass the core-dump socket to a helper can be intercepted; that allows said process to gain access to the file descriptor from which the core dump can be read. That, in turn, gives the attacker the ability to read the (original, privileged) process's memory, happily pillaging any secrets found there. The example given by Qualys obtains the contents of /etc/shadow, which is normally unreadable, but it seems that SSH servers (and the keys in their memory) are vulnerable to the same sort of attack.

Interested readers should consult the advisory for a much more detailed (and coherent) description of how this attack works, as well as information on some previous vulnerabilities in this area. The key takeaways, though, are that core-dump handlers on a number of widely used distributions are vulnerable to this attack, and that reusable integer IDs as a way to identify processes are just as much of a problem as the pidfd developers have been saying over the years.

Toward a better API

The solution to this kind of race condition is to give the core-dump handler a way to know that the process it is investigating is, indeed, the one that crashed. The 6.16 kernel contains two separate changes toward that goal. The first is this patch from Brauner adding a new format specifier ("%F") for the string written to core_pattern. This specifier will cause the core-dump handler to be launched with a pidfd identifying the crashed process installed as file descriptor number three. Since it is a pidfd, it will always refer to the intended process and cannot be fooled by process-ID reuse.

This change makes it relatively easy to adapt core-dump handlers to avoid the most recently identified vulnerabilities; it has already been backported to a recent set of stable kernels. But it does not change the basic nature of the core_pattern API, which still requires the launch of a new, fully privileged process to handle each crash. It is, instead, a workaround for one of the worst problems with that API.

The longer-term fix is this series from Brauner, which was also merged for 6.16. It adds a new syntax to core_pattern instructing the kernel to write core dumps to an existing socket; a user-space handler can bind to that socket and accept a new connection for each core dump that the kernel sends its way. The handler must be privileged to bind to the socket, but it remains an ordinary process rather than a kernel-created user-mode helper, and the process that actually reads core dumps requires no special privileges at all. So the core-dump handler can bind to the socket, then drop its privileges and sandbox itself, closing off a number of attack vectors.

Once a new connection has been made, the handler can obtain a pidfd for the crashed process using the SO_PEERPIDFD request for getsockopt(). Once again, the pidfd will refer to the actual crashed process, rather than something an attacker might want the handler to treat like the crashed process. The handler can pass the new PIDFD_INFO_COREDUMP option to the PIDFD_GET_INFO ioctl() command to learn more about the crashed process, including whether the process is, indeed, having its core dumped. There are, in other words, a couple of layers of defense against the sort of substitution attack demonstrated by Qualys.

The end result is a system for handling core dumps that is more efficient (since there is no need to launch new helper processes each time) and which should be far more resistant to many types of attacks. It may take some time to roll out to deployed systems, since this change seems unlikely to be backported to the stable kernels (though distributors may well choose to backport it to their own kernels). But, eventually, this particular source of CVEs should become rather less productive than it traditionally has been.

Index entries for this article
KernelReleases/6.16
KernelSecurity/Vulnerabilities


to post comments

Process ids again

Posted Jun 6, 2025 15:11 UTC (Fri) by epa (subscriber, #39769) [Link] (33 responses)

then quickly replaced by another process with the same process ID
Argh. Why don't we move to 64-bit process ids, and guarantee that they are not reused except after a reboot? There are some fields expecting a smaller value, but surely if time_t could become 64 bit, we can do the same for pid_t. As it stands pretty much any use of process ids has this race condition. A lot of effort has gone into pidfds, but they will never be used in the long tail of shell scripts and old code.

Process ids again

Posted Jun 6, 2025 16:51 UTC (Fri) by AClwn (subscriber, #131323) [Link] (10 responses)

OpenBSD randomizes its PIDs to reduce predictability and to avoid leaking information (i.e. preventing users from subtracting PIDs to determine how many processes had been created in the interim). Randomized PIDs would be incompatible with a no-reuse guarantee because you'd have to store (and search!) a table of previously-used PIDs.

The obvious retort is that Linux doesn't randomize PIDs and it never will, so the only things you lose by extending PIDs to 64 bits are (1) a little bit of space wherever they're stored and (2) an entire class of PID-reuse security vulnerabilities, and that this is a pretty good tradeoff. I have nothing to say to that; I just wanted to mention PID randomization.

Process ids again

Posted Jun 7, 2025 5:45 UTC (Sat) by epa (subscriber, #39769) [Link] (1 responses)

I suppose if you had a big enough space for the ids you could randomize and still have them in monotone order. Just leave random-sized gaps in the sequence. 64 bits might be enough that you could do that and still never run out of ids.

Process ids again

Posted Jun 22, 2025 9:10 UTC (Sun) by l0kod (subscriber, #111864) [Link]

FYI, Landlock IDs follow this approach. One important difference is that we only need a bijection (i.e. Landlock IDs are not used to get a reference but only to identify an existing object). From the commit description, Landlock IDs have these properties:
- They are unique during the lifetime of the running system thanks to
the 64-bit values: at worse, 2^60 - 2*2^32 useful IDs.
- They are always greater than 2^32 and must then be stored in 64-bit
integer types.
- The initial ID (at boot time) is randomly picked between 2^32 and
2^33, which limits collisions in logs across different boots.
- IDs are sequential, which enables users to order them.
- IDs may not be consecutive but increase with a random 2^4 step, which
limits side channels.

For more details, see https://git.kernel.org/torvalds/c/d9d2a68ed44bbae598a81cb...

Unique randomized wide PIDs

Posted Jun 8, 2025 9:20 UTC (Sun) by jreiser (subscriber, #11027) [Link] (6 responses)

> Randomized PIDs would be incompatible with a no-reuse guarantee because you'd have to store (and search!) a table of previously-used PIDs.

Um, no. A Linear Feedback Shift Register (LFSR) that is based on an irreducible polynomial guarantees uniqueness over its entire period, which is near to 2**N. Just initialize it to a random point in its sequence.

Unique randomized wide PIDs

Posted Jun 8, 2025 10:03 UTC (Sun) by dezgeg (subscriber, #92243) [Link] (5 responses)

Would that really help with anything though, given the sequence is still entirely predictable?

Unique randomized wide PIDs

Posted Jun 8, 2025 18:11 UTC (Sun) by bmenrigh (subscriber, #63018) [Link] (4 responses)

Yeah an LFSR wouldn’t be secure. However a customer block cipher (where the block size matches the PID size) eliminates the predictability issue. Just generate a random key at startup.

Unique randomized wide PIDs

Posted Jun 9, 2025 12:00 UTC (Mon) by bluca (subscriber, #118303) [Link] (3 responses)

Although it is mildly amusing watching the rediscovery of UUIDs in this thread (hint: it's what Windows uses), it is unnecessary: with the new kernel the pidfd inode number is guaranteed unique per boot, so we have a solution for this now (there's also the boot uuid, so the combination of boot uuid + pidfd inode id gives a universal unique identifier, if that is needed). PIDs can never be changed, as all existing software will break. No, it's not just a matter of "recompiling", existing binaries need to work too with new kernels, so that's not an option. But we are starting to expose pidfd inodes in more places now, so I guess they'll slowly take over for new functionality.

Unique randomized wide PIDs

Posted Jun 12, 2025 8:48 UTC (Thu) by donald.buczek (subscriber, #112892) [Link] (2 responses)

However, I wonder how userspace can easily determine whether a pidfd inode number comes from a system that guarantees uniqueness.

Unique randomized wide PIDs

Posted Jun 12, 2025 11:15 UTC (Thu) by bluca (subscriber, #118303) [Link] (1 responses)

There was a way, I forget the details, but on systems where it's not unique it's a fixed/hardcoded/well-known inode number, because it's an anonymous inode rather than from pidfdfs? Details are fuzzy so I might be getting this wrong

Unique randomized wide PIDs

Posted Jun 14, 2025 9:45 UTC (Sat) by donald.buczek (subscriber, #112892) [Link]

I found that you can use fstatfs() on the file descriptor and see if f_type == PID_FS_MAGIC ( 0x50494446; /* "PIDF" */ ). On an elder system it is ANON_INODE_FS_MAGIC (0x09041934)

Note, that although pidfd_open(2) says opening a "/proc/[PID]" directory would be an alternative way to get a PID file descriptor, this is only half true: You can use such a file descriptor with pidfd_* calls, but it is another type of file descriptor with f_type == PROC_SUPER_MAGIC ( 0x9fa0 ) and you can't use the inode number from that kind of file descriptor as a unique process identifier.

I still wish, processes had UUIDs.

Process ids again

Posted Jun 9, 2025 16:37 UTC (Mon) by dsfch (subscriber, #176007) [Link]

64bit PIDs would be an ABI break, right ? `pid_t` being 32bit is all over glibc & Co. as well as in `<uapi/asm-generic/posix_types.h>` so it's not just a trivial substitute ?

Process ids again

Posted Jun 6, 2025 16:52 UTC (Fri) by bluca (subscriber, #118303) [Link]

Because it would most likely break everything left and right. With the pidfdfs (say that quickly 3 times in a row) inode we have something close enough.

Process ids again

Posted Jun 6, 2025 17:09 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (5 responses)

Long numbers SUCK for UI purposes. What would help is a _separate_ handle-type process identifier that is guaranteed to be unique and can be even 128-bit long.

Process ids again

Posted Jun 6, 2025 20:42 UTC (Fri) by warrax (subscriber, #103205) [Link] (4 responses)

If you ever do anything other than copy/paste PIDs when using the command line (which I assume you mean by UI), I'd be a bit worried. Actually... I think it'd actually be better if it was hard-enough-to-type so that it MUST be copy/pasted. Killing a mistyped PID *should* do nothing.

Process ids again

Posted Jun 6, 2025 20:43 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

I absolutely do stuff other than act as a copypaste bot. Including looking at logs, and correlating things across multiple sources. I almost never need to actually kill processes by ID.

Long IDs make that harder.

Process ids again

Posted Jun 7, 2025 5:42 UTC (Sat) by epa (subscriber, #39769) [Link] (2 responses)

On the other hand, right now if you are looking for a process id in a log file, you don’t have any guarantee that it’s the same process, since pids can be reused. Making them unique has to be better, even if it means the number might be longer. (You’d only get the big numbers on a long-running system that has forked vast numbers of processes, and in that case I suggest you’d particularly benefit from not having a number reused.)

Even in ordinary command line use like “see a process id in top and then kill it” there is a race condition and some danger if pids are not unique.

Process ids again

Posted Jun 7, 2025 5:53 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link]

> On the other hand, right now if you are looking for a process id in a log file, you don’t have any guarantee that it’s the same process

That's true, but in practice infrequent, outside of deliberate attacks.

Process ids again

Posted Jun 7, 2025 6:51 UTC (Sat) by iabervon (subscriber, #722) [Link]

I often use them to identify the lines from some process that are interspersed with lines from other related processes. Occasionally, the same PID is a different process later, but only after the last line that is from the same process. It's also useful in REST API server logs: different lines from serving the same request have the same PID, but even if there isn't any PID reuse, the same process generally goes on to serve another request, but not until it's done with the first one.

Process ids again

Posted Jun 6, 2025 17:38 UTC (Fri) by Nahor (subscriber, #51583) [Link] (7 responses)

> Why don't we move to 64-bit process ids

What do you give to a process that still expects a 32-bit pid when the value does not fit?
And if we provide a 64-bit API while keeping the 32-bit values for a while for backward compatibility, how long should we wait before switching? ...And in the interim, the problem persists, even for applications that did update.

With pidfs, you solve the problem for modern applications *now*, while keeping backward compatibility for older code forever (or until we choose to remove support for 32-bit PIDs).

> surely if time_t could become 64 bit, we can do the same for pid_t.

You do know how a painful that switch was (and still is, since not everything code base has been updated yet), right?
Remember that this is not just about updating the kernel API, but also updating code wherever pids are used (internally in applications, or externally, i.e. storage, network, ...)

> they will never be used in the long tail of shell scripts and old code

If they won't be updated to pidfs, why do you believe they will be updated for 64-bit pids?

Process ids again

Posted Jun 7, 2025 2:48 UTC (Sat) by NYKevin (subscriber, #129325) [Link] (3 responses)

PID namespaces are already a thing, so you could (in theory) migrate a system gradually rather than all at once. In practice, I'm not sure how useful that is to any distro that is not NixOS. But I suppose it probably does also work for Flatpak-shaped things?

> What do you give to a process that still expects a 32-bit pid when the value does not fit?

Under the assumption that we're migrating individual PID namespaces rather than a system-wide setting, if a process is in a PID namespace that uses 64-bit PIDs, it should have been migrated to the new API already (or else userspace should not have enabled 64-bit PIDs for this namespace). If it nevertheless asks for a 32-bit PID by calling into the old 32-bit interface, then it gets -ENOSYS or some equivalent, and probably crashes.

Process ids again

Posted Jun 7, 2025 19:29 UTC (Sat) by Nahor (subscriber, #51583) [Link] (2 responses)

> PID namespaces

That looks like a big big can of worms.

How do non-updated apps and updated ones mix? Say an updated shell trying to start an old app or vice-versa?
Or do you expect the user to use different launcher/shell and choose which to interact with depending on what type of apps they use? And have different binaries with different pid size for apps used in both (shell, launcher, UI, ssh, ...)?
How do apps communicates pids with each other if they are not in the same namespace? Say someone uses an updated "top" command and thus gets 64-bit pids, then try to use the shell's builtin "kill" command which is still expecting 32-bit pid?

What/who decides what namespace to use? The kernel? The shell/launcher? The user? How does it/he/she know what namespace to use?

Namespaces work well if a whole ecosystem can be independent from everything else wrt to that namespace. They also work because only values changes, not the types, the binaries are the same (i.e. a shell in one namespace can work as well in another, they will just print different values for pids, or see different files, ...)

Process ids again

Posted Jun 8, 2025 3:16 UTC (Sun) by NYKevin (subscriber, #129325) [Link] (1 responses)

The idea would be, broadly speaking, all namespaces are 64-bit capable, but by default they only generate PIDs in the 32-bit compatible range. You can use the new 64-bit API for everything and it always works, and you can use the old API if you're in a namespace that is limited to the 32-bit range. A 64-bit namespace may contain 32-bit children, but not vice-versa.

The answers to most of your more specific questions can be summarized as "the distro can do what it sees fit, and if it chooses to do nothing, then it continues to use 32-bit PIDs for everything indefinitely."

Process ids again

Posted Jun 8, 2025 3:20 UTC (Sun) by NYKevin (subscriber, #129325) [Link]

Addendum: Some of your questions are better answered by referring you to pid_namespaces(7), which see.

Process ids again

Posted Jun 7, 2025 5:38 UTC (Sat) by epa (subscriber, #39769) [Link] (2 responses)

A shell script needs no update for 64-bit process ids. It’s just a slightly longer string than before. A C program using pid_t may just need a recompile, though of course there will be some code putting a process id into an int (I imagine a compiler warning can mostly catch this).

Rewriting a C program to use pidfds is a much bigger task, and rewriting a shell script with them is essentially impossible.

Scripting languages like Perl, Python, and Tcl would usually just need the interpreter itself recompiled for 64-bit pids and existing scripts will work unchanged.

Process ids again

Posted Jun 7, 2025 18:58 UTC (Sat) by Nahor (subscriber, #51583) [Link]

> A shell script needs no update for 64-bit process ids

Most won't, but a script can still make assumption on the pid size, e.g. it only contains 10 digits.

> A C program using pid_t may just need a recompile

Keyword "may".
And even for the simple cases, this depends a lot on how 64-bits pids would be implemented, e.g. would this be a compilation flag? Or a "#define USE_PID64"? Or would this be changing all the "pid_t"/"getpid()"/... to "pid64_t"/"getpid64()"/...?

And in the non-simple cases, the issues are the same for 64-bits pids and pidfds (apps using pid in a "smart" way will be majorly broken, pid64/pidfd cannot be passed as-is when communicating a 32-bit pid apps, ...).

> I imagine a compiler warning can mostly catch this

Only in simple cases, maybe. And AFAIK, currently, compilers will not complain when storing a in64_t in an int32_t without the "-Wconversion" flag (which is not enabled even when using "-Wall -Wextra -pedantic"). And even "-Wconversion" will not complain if there is a cast involved. https://godbolt.org/z/z3jGbYb86

> there will be some code putting a process id into an int

Or putting it in the low bits of an int64_t then use the high bits for something else.
Or assume that a struct containing a pid has a specific size. Or that fields in that struct after the pid are at specific offsets.
Or ... (don't underestimate what people do when they assume something will be true forever)...

Basically, one can look at what happened during then transition from 32-bit to 64-bit platforms, the switch to large file (>4GB), and the Y38 problems, to see all the possible issues than can arise.

> Rewriting a C program to use pidfds is a much bigger task

I'm not so sure. Since a pidfd is actually a pid_t type, and depending on what pid/pidfd are used for, updating could boil down to calling "pidfd_xyz()" instead of "xyz()", or passing a "XYZ_PIDFD" flag.

For the rest, that can be a big task to fix in either case. For instance, if the problem is someone combining the pid with something else in an int64_t, then a pidfd will still work fine, while the pid64 will need a redesign.
Or people might make the same assumption that you did, that pid64 is just a recompilation, then spend a lot of time tracking down bugs. While with pidfs, they would spend time looking at each call sites first, fixing the problems before they arise and need tracking.
Which one will take more time will depend on the applications. Sometimes it's faster to think things through first, some others it faster to just try&fix. This very much apply here IMHO.

Process ids again

Posted Jul 4, 2025 13:51 UTC (Fri) by judas_iscariote (guest, #47386) [Link]

> A C program using pid_t may just need a recompile

You are assuming a properly carefully wrritten program..there is still code out there that assumes pids are 16 bit and store them in a ushort..There is incorrect casting, there is code not using pid_t at all.. I mean there is a lot of buggy software out there...

Process ids again

Posted Jun 7, 2025 19:05 UTC (Sat) by donald.buczek (subscriber, #112892) [Link] (6 responses)

> Why don't we move to 64-bit process ids, and guarantee that they are not reused except after a reboot

Or a UUID for every process. While pidfs inode numbers (or a hypothetical 64bit pid) are good for the lifetime of the system, you sometimes want to persist identifiers for longer than that, for example in a database or on another system. We have this problem for a cluster queuing system where the job controller is designed to be restartable and has to identify which of its jobs, which are stored in a sql database, are still alive and which aren't. Our current solution is so ugly that I don't want to mention it and our next solution might be to consider boot_id + pidfs inode number. Any better ideas someone?

Process ids again

Posted Jun 7, 2025 19:37 UTC (Sat) by snajpa (subscriber, #73467) [Link] (5 responses)

I think if you keep adding more context from /proc/<pid>, you should arrive at sufficiently reliable input for some good enough hash function. Start time of the process can't be changed by the process itself AFAIK. Just that along with the pid could do the trick, IMHO. Save the hash when the task is started, maybe assert that it hasnt changed while doing a dev build shutdown... cca what I would do

Process ids again

Posted Jun 9, 2025 12:14 UTC (Mon) by bluca (subscriber, #118303) [Link] (4 responses)

> Start time of the process can't be changed by the process itself AFAIK. Just that along with the pid could do the trick, IMHO.

It very much doesn't, so-much-so that relying on that combination for uniqueness caused several CVEs in the past. The start time is not granular enough, and attackers are able to cause a PID + start time clash at their leisure. This is why PIDFDs exist, and we use them when we need to uniquely identify processes for any security-relevant reason (and also more and more non-security-relevant too)

Process ids again

Posted Jun 9, 2025 20:10 UTC (Mon) by snajpa (subscriber, #73467) [Link] (3 responses)

Are you able to link simply just a single CVE that proves it isn't sufficient in the real practical world? You know, hash algos are bound to have collisions too, yet we use them. Taking an argument to an extreme isn't helpful.

Besides, how are you going to use pidfds in this specific case you are replying to? Much confidence in your reply, let's see if you can back that confidence up with something.

Process ids again

Posted Jun 9, 2025 20:26 UTC (Mon) by bluca (subscriber, #118303) [Link] (2 responses)

You can start from CVE-2019-6133 and continue from there.

The combination of pidfd inode id plus boot uuid can uniquely identify a process across machines/reboots/everything, so it is suitable for that use case.

Process ids again

Posted Jun 10, 2025 16:11 UTC (Tue) by snajpa (subscriber, #73467) [Link] (1 responses)

I'm gonna solve this by muting you, as your whole reaction is just to prove... you can't read or fit into context of what you're replying to, I can only feel your need to be right every single time we interact here. Context be damned, right...

Process ids again

Posted Jun 10, 2025 16:13 UTC (Tue) by bluca (subscriber, #118303) [Link]

Suit yourself. You asked for a CVE, it's been provided. You asked for a solution for a problem, it's been provided. If you can't handle receiving answers, maybe stop asking questions?

More efficient?

Posted Jun 6, 2025 16:27 UTC (Fri) by Nahor (subscriber, #51583) [Link] (11 responses)

> The end result is a system for handling core dumps that is more efficient (since there is no need to launch new helper processes each time)

Having a process running all the time (and thus using RAM) for the (hopefully) rare crash is not what I would call a "more efficient" use of resources.

Either way is probably efficient (a helper in stand-by should be using very little resources, while starting a new process is likely very cheap compared to handling a core dump) but I don't know which would be "more efficient", much less if it's significantly so (*).

(*) and if it is, it's probably context dependent too, i.e. if one is memory-bound or CPU+IO bound

More efficient?

Posted Jun 6, 2025 16:51 UTC (Fri) by bluca (subscriber, #118303) [Link] (10 responses)

More efficient?

Posted Jun 6, 2025 17:37 UTC (Fri) by Nahor (subscriber, #51583) [Link] (9 responses)

What of it? Sure, systemd is already be running anyway, so it wouldn't use more resource (nitpick: arguably it does since it will have one more socket to track), but it then needs to start a new process to handle a crash, like core_pattern did, so it's no more efficient than before.

The only case I can imagine where the new method would be more efficient is if one already has a crash handler daemon, and use a core_pattern helper to pass the data from the kernel to the daemon. In this case the helper can now be skipped and the core dump can go directly from the kernel to the daemon. But I doubt this is common usage, if it even exists anywhere, since it would combine the worst of both methods.

More efficient?

Posted Jun 6, 2025 17:54 UTC (Fri) by bluca (subscriber, #118303) [Link] (6 responses)

> but it then needs to start a new process to handle a crash, like core_pattern did, so it's no more efficient than before.

It is, as there's one fewer process. It's already socket activated - the umh only receives the core, it doesn't do any analysis, as it's dangerous. The umh forwards the core to a different socket-activated process, that is ran at minimum privilege, which does the analysis.
Now we can remove the middleman.

> But I doubt this is common usage, if it even exists anywhere, since it would combine the worst of both methods.

It's the most common usage (whether via apport or systemd-coredump or something else), as the article writes just writing files around from the kernel is really bad, and only legacy (or manual) setups do that.

More efficient?

Posted Jun 6, 2025 19:25 UTC (Fri) by Nahor (subscriber, #51583) [Link] (5 responses)

I guess we are not arguing at the same level.

If your crash manager already has a persistent process, like systemd does, then yes, you become more efficient. Having a persistent process for them was a sunk cost since systemd already has a persistent process for monitoring services. But the gain in efficiency for them comes from their implementation choices, from the kernel API now matching their usage better, it does not come from a more efficient API. I'm arguing the latter, you're arguing the former.

More efficient?

Posted Jun 6, 2025 20:49 UTC (Fri) by bluca (subscriber, #118303) [Link] (4 responses)

> If your crash manager already has a persistent process, like systemd does

It doesn't

More efficient?

Posted Jun 7, 2025 0:41 UTC (Sat) by Nahor (subscriber, #51583) [Link] (3 responses)

>> If your crash manager already has a persistent process, like systemd does
>
> It doesn't

Uh? Systemd has a persistent process, it's called "systemd" daemon, aka "init", aka PID 1. What do you think monitors the various systemd units, including sockets?

More efficient?

Posted Jun 7, 2025 11:46 UTC (Sat) by bluca (subscriber, #118303) [Link] (2 responses)

Oh, really? I had no idea! /s

It's not an _extra_ one as you implied. There's no extra cost, it's already there for other purposes.

More efficient?

Posted Jun 9, 2025 10:04 UTC (Mon) by paulj (subscriber, #341) [Link] (1 responses)

Every additional feature/thing that systemd has to watch for consumes another little bit of memory though, and probably another thing that sits in a list or other DS that has to be scanned through regularly.

Worth it, for me I'd say yes, and I doubt anyone could notice that 1 additional feature, but you can't dismiss the argument others have on the basis there are no extra resources used.

More efficient?

Posted Jun 9, 2025 10:48 UTC (Mon) by bluca (subscriber, #118303) [Link]

Once again: there are no extra resources used. There is already a systemd-coredump.socket, and there always was, as it's trivial to verify.

With this feature it's now kernel -> socket, instead of kernel -> usermode helper -> socket.

More efficient?

Posted Jun 7, 2025 8:41 UTC (Sat) by james (subscriber, #1325) [Link] (1 responses)

but it then needs to start a new process to handle a crash, like core_pattern did, so it's no more efficient than before.
Is this really the sort of efficiency we should be optimising for? I mean, by definition we've just had a program crash: this should not be a common occurrence!

More efficient?

Posted Jun 7, 2025 15:23 UTC (Sat) by Nahor (subscriber, #51583) [Link]

> Is this really the sort of efficiency we should be optimising for?

Probably not, but the article does assert the new API is "more efficient (since there is no need to launch new helper processes each time)". I question that (at least a generalization, since it's true for the particular case of systemd, because systemd-coredump's main part is already socket-based, so now it can just drop the core_pattern helper part).

Get the kernel out of this business

Posted Jun 7, 2025 9:51 UTC (Sat) by quotemstr (subscriber, #45331) [Link] (2 responses)

Crash dumping is something userspace can do all by itself --- consider breakpad and crashpad. Wire up a signal handler; make direct system calls to fork and execve a helper; ptrace your parent. Now you can dump the process without the kernel's help, the usual way, with normal privileges. The whole kernel-driven core dump mechanism is more trouble than it's worth. libc can provide a default handler that approximates what we have today if we really need it.

Get the kernel out of this business

Posted Jun 7, 2025 16:31 UTC (Sat) by jwadams (subscriber, #123485) [Link] (1 responses)

Getting this right completely in process is extremely hard to do (using only async-safe interfaces, possibly racing with other signal handlers trying to do the same thing, etc), and has a good chance of destroying the state you were trying to report.

Get the kernel out of this business

Posted Jun 7, 2025 23:30 UTC (Sat) by quotemstr (subscriber, #45331) [Link]

> Getting this right completely in process is extremely hard

Yet it's done. Every major Android app uses some kind of in-process signal-based crash reporting system and it works fine. You barely have to do anything in an async-signal-safe way: just use direct system calls like in https://github.com/linux-on-ibm-z/linux-syscall-support/b..., vfork, and execve a crash handler that ptrace it parent and run normal signal-unsafe code at its leisure to dump the crashing process --- probably to a format better than traditional coredumps, one like minidump.

Speaking of minidumps: Windows crash dumps are all-userspace and work just fine.

At the most, I'd approve of the Linux kernel having a mechanism to signal some kind of registered crash daemon over IPC when another process crashes. This way, the crashing process doesn't need a signal handler or any async-signal-safe code at all. Linux should just delete all the code that produces actual core dumps and delegate the dirty work to userspace.

> possibly racing with other signal handlers

Safely sharing signal handlers is a problem all its own. Besides: futex works fine even in async-signal-safe code.


Copyright © 2025, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds