LWN: Comments on "Reliable user-space stack traces with SFrame" https://lwn.net/Articles/932209/ This is a special feed containing comments posted to the individual LWN article titled "Reliable user-space stack traces with SFrame". en-us Sat, 04 Oct 2025 15:27:23 +0000 Sat, 04 Oct 2025 15:27:23 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net Reliable user-space stack traces with SFrame https://lwn.net/Articles/932918/ https://lwn.net/Articles/932918/ himi <div class="FormattedComment"> I wonder if it'd be practical to generate the SFrame data from the DWARF data on the fly in userspace, and then present it to the kernel - probably a lot of work the first time a particular executable was processed, but it'd be fairly easy to cache it.<br> <p> I think it'd be pretty similar to the dlopen() scenario, except that instead of just pointing at existing SFrame data for the object it'd generate the data from another source first.<br> </div> Thu, 25 May 2023 02:53:45 +0000 Reliable user-space stack traces with SFrame https://lwn.net/Articles/932766/ https://lwn.net/Articles/932766/ lathiat <div class="FormattedComment"> Not sure if this makes you feel better or worse, but Polar Signals/Parca are doing DWARF unwinding in BPF for continuous profiling:<br> <a href="https://www.polarsignals.com/blog/posts/2022/11/29/dwarf-based-stack-walking-using-ebpf/">https://www.polarsignals.com/blog/posts/2022/11/29/dwarf-...</a><br> <a href="https://news.ycombinator.com/item?id=33788794">https://news.ycombinator.com/item?id=33788794</a><br> <p> </div> Wed, 24 May 2023 05:09:33 +0000 Reliable user-space stack traces with SFrame https://lwn.net/Articles/932760/ https://lwn.net/Articles/932760/ eklitzke <div class="FormattedComment"> Obviously care needs to be taken with the code you write in a signal handler, but that doesn't mean they're not useful, and they're definitely not only useful for a crashing process. At the company I work for we use setitimer with ITIMER_PROF, and in the SIGPROF signal handler we unwind the stack following frame pointers up to 48 frames deep, and these are written into a fixed-size circular ring buffer, so we have the last ~10s of profile data in memory at all times. None of this requires using stdio or memory allocation or anything else unsafe. There is some slightly tricky locking logic for reading/writing the ring buffer (when we dump profiles from the buffer we need to make sure it doesn't race with the signal handler), but it isn't rocket science.<br> </div> Tue, 23 May 2023 23:32:33 +0000 Reliable user-space stack traces with SFrame https://lwn.net/Articles/932757/ https://lwn.net/Articles/932757/ nevets <div class="FormattedComment"> Fair enough, but the point was that ORC was mostly a proof of concept that this works, and it works well. As live kernel patching depends heavily on accurate stack traces and it uses ORC unwinding for that. The point was that sframes uses the same concept. But it all really depends on what your definition of "based on" is.<br> </div> Tue, 23 May 2023 20:50:39 +0000 Reliable user-space stack traces with SFrame https://lwn.net/Articles/932740/ https://lwn.net/Articles/932740/ ibhagat <div class="FormattedComment"> In theory, DWARF-based EH_Frame information is a superset of information in SFrame (the former has information to restore complete register state as well).<br> </div> Tue, 23 May 2023 17:34:08 +0000 Reliable user-space stack traces with SFrame https://lwn.net/Articles/932738/ https://lwn.net/Articles/932738/ ibhagat <div class="FormattedComment"> <span class="QuotedText">&gt; SFrame is based on ORC;</span><br> <p> The commonality between SFrame and ORC is that both encode the stack offsets directly. But beyond that, there are enough divergences between the two formats making them quite different - SFrame is generated by the toolchain, has support for AMD64 and AArch64 (AAPCS64), has compactness related optimizations in its on-disk representation; ORC is designed to work for the kernel stack tracing use case.<br> <p> Just saying..."SFrame is based on ORC" can be misleading.<br> </div> Tue, 23 May 2023 17:29:19 +0000 Reliable user-space stack traces with SFrame https://lwn.net/Articles/932730/ https://lwn.net/Articles/932730/ SLi <div class="FormattedComment"> I assume DWARF would normally contain enough information to construct SFrames without recompiling?<br> </div> Tue, 23 May 2023 16:38:31 +0000 Reliable user-space stack traces with SFrame https://lwn.net/Articles/932712/ https://lwn.net/Articles/932712/ nix <div class="FormattedComment"> <span class="QuotedText">&gt; Now, if we were to do the unwinding in a signal handler instead of hard coding SFrame</span><br> <p> I'm not sure what this means. The mechanism for unwinding (in-kernel, copies to userspace, whatever) is orthogonal to the format being used (DWARF, SFrame, ORC): they can presumably all be unwound using code running in many contexts. They're just formats after all.<br> <p> But... in general in a signal handler you can't do anything useful involving the process you're running inside -- in particular you can't use stdio or allocate memory and more or less arbitrary locks might be taken out, and that's when nothing has gone wrong: and if you're backtracing quite often it's because all hell has broken loose and the program might be in any state at all. glibc removed the machinery that gave (fp-based) backtraces on stack-protector failure for a reason.<br> <p> One attractive-sounding alternative suggested at a past LPC is to use a coredump handler: that is given an image of as much or as little of the process as you wish to configure (this stuff is customizable in /proc) and can do whatever it wants because it's a completely separate process that nothing has gone wrong with and which isn't in a signal handler and has no unexpected locks or half-completed mallocs fouling things up. But a signal handler? The more you do with signals, the more pain you'll eventually be in, and that goes double if the process is halfway through crashing!<br> <p> </div> Tue, 23 May 2023 13:29:43 +0000 Reliable user-space stack traces with SFrame https://lwn.net/Articles/932711/ https://lwn.net/Articles/932711/ nix <div class="FormattedComment"> Just a note: the SFrame format is not based on ORC at all (it has rather different goals: simplicity of reading is there, but compactness is valued higher than ORC, plus of course it's targetting representing stacks for all of userspace rather than purely the kernel). The implementation happens to use the same in-kernel API, but that's because it has to: that's the API the in-kernel users are using.<br> <p> </div> Tue, 23 May 2023 13:24:08 +0000 Reliable user-space stack traces with SFrame https://lwn.net/Articles/932682/ https://lwn.net/Articles/932682/ nevets <div class="FormattedComment"> It's not specific to dlopen(). It was just that dlopen() is probably the most known and easiest to explain the issue.<br> </div> Tue, 23 May 2023 11:31:20 +0000 Reliable user-space stack traces with SFrame https://lwn.net/Articles/932676/ https://lwn.net/Articles/932676/ atnot <div class="FormattedComment"> To answer my question, here is a paper on using malicious debug info to take over a process during DWARF stack unwinding: <a href="https://static.usenix.org/event/woot11/tech/final_files/Oakley.pdf">https://static.usenix.org/event/woot11/tech/final_files/O...</a><br> </div> Tue, 23 May 2023 09:59:54 +0000 Reliable user-space stack traces with SFrame https://lwn.net/Articles/932675/ https://lwn.net/Articles/932675/ atnot <div class="FormattedComment"> DWARF is indeed incredibly complex. It is basically a turing complete stack machine that can run arbitrary computations. It has to be able to do this to be able to reverse arbitrary compiler optimizations for debugging. For a trivial example, while your code may declare a pointer, the compiler might actually store it as an offset from another pointer instead. In that case some simple addition will do, but this gets a lot more complicated when you stack layers of optimizations, computations that end up being eliminated due to inlining, VLAs, varargs, etc. Answering what the value of a variable is accurately might require substantial emulation of your code.<br> <p> I don't know how much of this is needed to only do unwinding, but the idea of DWARF in the kernel is a very spooky prospect to me.<br> </div> Tue, 23 May 2023 09:52:20 +0000 Reliable user-space stack traces with SFrame https://lwn.net/Articles/932673/ https://lwn.net/Articles/932673/ Sesse <div class="FormattedComment"> Uninformed guess: One would assume that the big difference then is whether you unwind when the sample comes (i.e., in the kernel; SFrame), or whether you snapshot the entire top of the stack to unwind later (i.e., in userspacer; DWARF). The DWARF standard is complex enough that I don't think anyone really wants to parse it in the privileged context of the kernel, even if one happens to use only a subset of it. Plus it can be in a separate dbgsym file, which has all sorts of other implications around needing userspace helpers?<br> </div> Tue, 23 May 2023 08:40:21 +0000 Reliable user-space stack traces with SFrame https://lwn.net/Articles/932671/ https://lwn.net/Articles/932671/ izbyshev <div class="FormattedComment"> <span class="QuotedText">&gt; Other outstanding problems include the need to handle dlopen(), which maps executable text from another file into a range of the calling process's memory.</span><br> <p> Why would this problem be specific to dlopen()? ISTM it's the same for any dynamically-linked executables (even if they don't use dlopen()). Dynamic linking happens in user space, so the kernel currently learns about libraries only indirectly (by seeing them mmap'ed for execution).<br> </div> Tue, 23 May 2023 07:29:50 +0000 Reliable user-space stack traces with SFrame https://lwn.net/Articles/932670/ https://lwn.net/Articles/932670/ roc <div class="FormattedComment"> You can already emit partial DWARF debuginfo that supports stack unwinding but not other debugging features. Has anyone compared SFrame to that?<br> <p> I'm worried that people who want to build binaries with full debugging information or just stack traces with parameter values are going to have to build even *bigger* binaries with both DWARF and SFrame information.<br> </div> Tue, 23 May 2023 07:25:46 +0000 Reliable user-space stack traces with SFrame https://lwn.net/Articles/932666/ https://lwn.net/Articles/932666/ quotemstr <div class="FormattedComment"> Ah, so close. Now, if we were to do the unwinding in a signal handler instead of hard coding SFrame, as I've proposed previously, we'd at last have fully general stack unwinding after all <br> </div> Tue, 23 May 2023 03:14:08 +0000 Reliable user-space stack traces with SFrame https://lwn.net/Articles/932659/ https://lwn.net/Articles/932659/ brenns10 <div class="FormattedComment"> It's worth noting that SFrame is renamed from CTF Frame, which caught me off guard.<br> <p> This is such an exciting project, it's the "have your cake and eat it, too" approach to stack unwinding. No extra code generated for frame pointers, no wasted register or icache. But still reliable unwinding without relying on the full DWARF debuginfo.<br> <p> Hopefully this becomes standard along with CTF for lightweight introspection. Programs may want to unwind their own stack or examine the layout of data structures, so there's already good use cases. What's more, debuggers can do a lot with a symbol table, a reliable unwinder, and the basic information about types provided by CTF. While dwarf is better suited for development tasks, these smaller formats could fill a nice for basic diagnostics in production environments where debuginfo isn't available.<br> </div> Tue, 23 May 2023 00:32:07 +0000