|
|
Log in / Subscribe / Register

Compact formats for debugging—and more

By Jake Edge
February 16, 2026

LPC

At the 2025 Linux Plumbers Conference in Tokyo, Stephen Brennan gave a presentation on the debuginfo format, which contains the symbols and other information needed for debugging, along with some alternatives. Debuginfo files are large and, he believes, are a bit scary to customers because of the "debug" in their name. By rethinking debuginfo and the tools that use it, he hopes that free-software developers "can add new, interesting capabilities to tools that we are already using or build new interesting tools".

He works on the sustaining-engineering team at Oracle, which means that, unlike many in the room, he is mainly concerned with "fixing bugs in old released products" rather than adding new features to the latest kernel. Fixing bugs in customers' production kernels has "its own set of challenges". It has given him some insight into the needs of enterprise-kernel users, as well, which is what led him to conclude that debuginfo is not well-liked in that world.

Debuginfo

He introduced debuginfo with a few examples of using GDB on C-language "hello world" binaries built in different ways. Using the strip utility on a binary produced by GCC results in something that is not really debuggable—it lacks symbols and other information so that breakpoints cannot be set, for example. That is kind of self-inflicted; skipping the strip produces a binary with some debugging information, so breakpoints can be set, but it lacks line numbers and other data that would allow single-stepping. Normally, GDB would step by setting a breakpoint at the start of the next line of code, but it lacks the information needed to do so.

As most people already know, he said, using the "-g" option for GCC will add DWARF debugging information to the binary, which will allow "the full fat GDB debugging experience". For example, setting a breakpoint on a function will show the source file and line number rather than just an address; hitting the breakpoint will show the line of code from the source as well. In addition, arguments are shown by name with their values. GDB can also interpret various complex types, such as structures and unions.

[Stephen Brennan]

While none of that is surprising to most, it demonstrates what he sees as the classical approach to debugging: "you get nothing until you use -g and then you get everything". Meanwhile, distributions build their packages with DWARF information but most distributions provide them as separate "debuginfo" packages because "DWARF is really big". In practice, that means regular binaries on Linux systems will have minimal debugging information, similar to his second example.

When users encounter a crash, the typical, though perhaps a bit dated, suggestion is to install a debuginfo package. Then they can run a debugger, generate a report, or send a full core dump to a support person for diagnosis. There are now some better tools, including debuginfod and various helpful crash-reporting and handling tools that he encourages people to look into. But in Brennan's experience, it often comes down to convincing customers to install debuginfo packages—something they are allergic to, at least in the enterprise-kernel world.

But he has a gripe with the name "debuginfo" for two reasons. First, it is misleading because that information can be used for more than just standard debugging with GDB. An application may have a need to unwind its own stack or examine its types at run time, for example. The term is also not specific about what kind of information it provides; it encompasses many different kinds of information about the program and its types, variables, source code, and even macro definitions. He is not proposing that some kind of alternative term be adopted, but noted that, in practice, it is simply a shorthand for DWARF information.

Introspection

There are facilities for run-time introspection of code in many high-level languages. He noted that Java has ways to inspect running code and that Python "would be a hilarious example of just how much you can do" with the ability to look at dictionaries of global and local variables, inspect everything in a class, unwind the stack, and more. Those facilities effectively use "debuginfo", but they do not call it that. C has only limited inspection options, such as backtrace(), and compilers can do some introspection for array-bounds checking and other things, but that ability "completely disappears after compile-time".

The Linux kernel is an unusual C application because it has quite a bit of introspection support. It has stack-unwinding metadata built in; it can also look up its symbols using kallsyms. Beyond that, it has a type system, BPF Type Format (BTF), available as well.

There is a spectrum of things that he considers to be debuginfo, ranging from the standard debugging information, such as DWARF, "to maybe weirder things to consider debuginfo, but that kind of fit the bill". After DWARF comes Compact Type Format (CTF) and BTF, which provide type information. SFrame and ORC are next; both are aimed at stack unwinding, but ORC is only available for x86-64. ELF symbol tables round out the standard formats.

Moving into the weirder end is kallsyms, which is used by various tools. Something that the Fedora project does, which he really likes, is to create an ELF section (.gnu_debugdata) with a compressed set of debugging symbols that can be used with GDB, the Python-based drgn debugger, and others. Two other oddball sources of debuginfo would be the last branch record (LBR) hardware feature and frame pointers.

[Slide]

He put up a slide (slides), shown above, that summarized the kinds of information that are contained in the different formats. Obviously, if DWARF is available, it covers pretty much everything, he said, but it is not available in some environments. For those, "you can kind of pick and choose a few of these other things on the right and piece together something that might be useful for you".

There are some "warning" signs in the slide, which he briefly touched on. For example, macro definitions are only available in DWARF if extra flags (-gdwarf -g3) are passed to GCC and BTF only has information on functions and per-CPU variables, not all variables. The latter is something he plans to work on changing.

Case studies

He then moved on to describe "a few case studies, historically, of how compact formats are useful in different Linux applications and tools" with an eye toward future ideas for using those formats. He started with the venerable ps utility, which at one time worked by reading /dev/kmem (literally, kernel memory as a file), as he learned from an LWN article about the removal of that interface. It would root around in the task structures in memory to pull out the things that it wanted to report, "which is, honestly, pretty smart, pretty cool, [and] a little bit dangerous". It required that ps have setuid-root privileges and it might need to be rebuilt any time the kernel's data structures changed. Now ps just reads information out of the /proc filesystem, which is far superior.

While it makes sense to have a dedicated interface for ps, there is other information locked inside the kernel where adding a user-space interface is not really called for, he said, which is where something like BTF or CTF could be used. The kernel's BPF developers had a problem similar to that of the older ps, in that BPF code needed to be rebuilt for the target kernel because the details of a data structure may have changed, but they solved it another way. In order to support compile once, run everywhere (CO-RE) for BPF, BTF was used to provide structure offsets to adjust the BPF for the target kernel, which eliminates the need for a compiler on the target and BPF binaries can be run on multiple kernel versions.

Another interesting user of compact debuginfo is the drgn programmable debugger, which has a focus on the kernel. It normally uses DWARF, but work has been done to enable "DWARFless debugging" with drgn. For example, kallsyms support was added in December 2024 and stack unwinding using ORC from x86-64 kernel core images (/proc/vmcore) was added in April 2025. Using CTF (which is available in Oracle kernels) is under review and Brennan is working on BTF support; he is hopeful that CTF and BTF can converge since they are already quite similar.

VMCOREINFO is a 4KB ELF section that contains only the limited amount of information about the kernel needed to construct a smaller dump file. It was not one of the entries on his list, but he thinks that VMCOREINFO is a good example of how to think about compact formats. The makedumpfile utility is used to make the small dump file from a kernel memory image in /proc/vmcore by filtering out unneeded data. It needs some basic symbol and type information, which can come from DWARF, "but that's a pain to use", especially in a kdump environment, "where there's limited memory, limited ... everything, honestly". VMCOREINFO is a tiny fraction of the size of the DWARF information.

Ideas for the future

Allowing makedumpfile to access kallsyms and BTF would provide ways to exclude more memory, such as GPU buffers, from a dump file. It would also mean that things like user-space stack memory could be added to the dump so that process stack traces could be examined. Brennan was working on adding that support when Tao Liu pointed to his patches that do much the same thing; Brennan said that they plan to work together on the feature. Another version was posted in mid-January 2026.

His final slide consisted of some "things that I was spitballing when I came up with these slides"; the intent was to try to get others thinking about better debugging tooling. For example, he noted that GDB and drgn can both produce nicely formatted output of structures in memory in a way that is useful to a developer, rather than just a hex dump. Perhaps it makes sense to add a new printk() format specifier that would use the BTF information, which could be helpful while developing and debugging. That could be extended to user space, as well, so that output from applications would use type information to pretty-print structures.

Another area that could be addressed is converting enum values to strings; it could be done via some kind of option to the compiler, which is, of course, open source, so he should simply write some code to do it, Brennan said. He also suggested combining kallsyms and BTF in the kernel as they currently carry a lot of the same information, but have separate string tables, so combining them would save space. In general, there is a lot of overlap between the two, so "we could probably combine them in interesting ways to further compact the formats".

The "perf mem" and "perf c2c" commands are used to look at memory accesses and cache sharing on a system, but their output is address-based. Instead it could use type information to say: "This is a slab address and it has this type object and I can tell you that that's the offset of this field in the kernel." That would help in finding problems like false sharing, for example.

He concluded by noting that "DWARF is really excellent, if you have it, definitely use it for debugging", but if not, there are options that can provide various pieces of that information. The compact formats can be used for more than debugging and can provide introspection features that bring those capabilities from higher-level languages to C. He believes there is a lot of room to rethink the tools that are being used in light of the availability of these other sources of information, which can lead to a more user-friendly experience.

The YouTube video of the talk is available for those interested.

[ I would like to thank our travel sponsor, the Linux Foundation, for assistance with my travel to Tokyo for Linux Plumbers Conference. ]

Index entries for this article
ConferenceLinux Plumbers Conference/2025


to post comments

Subsets of DWARF?

Posted Feb 16, 2026 20:33 UTC (Mon) by dave_malcolm (subscriber, #15013) [Link] (3 responses)

Would it be useful/possible to retain DWARF as the format, but define some subsets of the standard that cater to specific use-cases? (e.g. "DWARF, but only what's needed for stack tracing"). Perhaps with tooling to strip the "full" DWARF down to the subset?

That might avoid the need to write readers and writers for all these various formats, and be more flexible, so that if you decide that you do want a bit more information, you opt-in to that, rather than having to extend/invent yet another format.

Subsets of DWARF?

Posted Feb 17, 2026 9:21 UTC (Tue) by dottedmag (subscriber, #18590) [Link] (1 responses)

Frankly, DWARF is so ridiculous that having a simpler format with a simple parser is better than trying to finesse a profile.

Subsets of DWARF?

Posted Feb 17, 2026 17:53 UTC (Tue) by quotemstr (subscriber, #45331) [Link]

What makes DWARF "ridiculous"? The variable-length integer stuff is a bit silly, but I'm not seeing the fundamental problem with expressing transformation of memory and registers to semantic program information as a tiny bytecode. I worry that things like SFrame and ORC overfit for the program structures and calling conventions we have today and might make it harder for people to invent kinds of program these newer debugging formats can't express.

Subsets of DWARF?

Posted Feb 17, 2026 18:25 UTC (Tue) by wahern (subscriber, #37304) [Link]

> Would it be useful/possible to retain DWARF as the format, but define some subsets of the standard that cater to specific use-cases? (e.g. "DWARF, but only what's needed for stack tracing").

GCC and clang already have options for controlling how much debuginfo to emit. See, e.g., -g1, "Level 1 produces minimal information, enough for making backtraces in parts of the program that you don’t plan to debug. This includes descriptions of functions and external variables, and line number tables, but no information about local variables." https://gcc.gnu.org/onlinedocs/gcc-15.2.0/gcc/Debugging-O...

> Perhaps with tooling to strip the "full" DWARF down to the subset?

The strip(1) utility supports various options for controlling what to strip.

These options have always been there. But most people consider debuginfo and related aspects of building binaries a kind of black magic, and few bother to learn about it and how to use the tools. This is an area where even seasoned hackers with encyclopedic knowledge of programming arcana tend to copy+paste incantations. I count myself in that camp, especially earlier in my career.

Renaming "debuginfo"

Posted Feb 17, 2026 9:00 UTC (Tue) by hailfinger (subscriber, #76962) [Link] (6 responses)

I agree that the name "debuginfo" may scare some people. "De-bug" implies the presence of bugs. We need innovative ways to make things sound better, and "debuginfo" is on that list of things needing better names.

Some vendors are very well-versed in friendly naming:
- "backdoor" -> "device debug interface"
- "keylogger" -> "telemetry"
- "needs connection to the vendor for log-in" -> "cloud-enabled with additional security"

What about renaming "debuginfo" to "analysis-helper" or "inspection-tools" or similar?

Renaming "debuginfo"

Posted Feb 17, 2026 9:14 UTC (Tue) by jengelh (subscriber, #33263) [Link] (1 responses)

Analysis and Introspection data, or AI for short. That'll sell it.

Renaming "debuginfo"

Posted Feb 17, 2026 12:54 UTC (Tue) by vasvir (subscriber, #92389) [Link]

>Analysis and Introspection data, or AI for short. That'll sell it.

You forgot the data. We should also add System because it sounds cool to the older people.

Analysis and Introspection Data System: AIDS that should nail it.

Renaming "debuginfo"

Posted Feb 22, 2026 14:47 UTC (Sun) by marcH (subscriber, #57642) [Link] (3 responses)

> > Debuginfo files are large and, he believes, are a bit scary to customers because of the "debug" in their name.

> I agree that the name "debuginfo" may scare some people. "De-bug" implies the presence of bugs. (sarcastic, I think)

I don't understand what could be "scary" either... Yes they are big, so it makes sense to be "opt-in". But what is "scary" about those files?

BTW one very useful yet often missed purpose of a debugger is: discovering and learning about new (parts of) source code. Exploring "live" stack traces is a bit like touring a museum with a personal guide versus none: it can help you instantly focus on the specific parts that interest you as opposed to semi-randomly wandering and getting lost. Getting lost can still be fun in a museum - in source code less frequently so.

This is performed with a debugger but it's not fixing bugs... should we rename debuggers to "explorers"? /s

Renaming "debuginfo"

Posted Feb 22, 2026 15:47 UTC (Sun) by excors (subscriber, #95769) [Link] (2 responses)

I figure it's referring to the traditional attitude that you have a clear distinction between debug builds and release builds. Debug builds have maximum debug information and assert() and no optimisation and are never deployed to production; release builds are optimised and stripped and asserts are disabled. If your debug build is orders of magnitude too slow to run your test cases and the release build is impossible to get stack traces from, tough luck, those are your only two options. Crossing the streams is unnatural and new and therefore scary.

In e.g. Java and Python, reflection is a standard part of the language and not a "debug" feature, so nobody is going to be concerned about having type information in their production builds, and programs can rely on functionality that uses that information. That doesn't work with the C attitude that debug information is only for debug builds (as the terminology suggests), and that's the attitude he's trying to challenge in this talk.

Renaming "debuginfo"

Posted Feb 23, 2026 3:00 UTC (Mon) by marcH (subscriber, #57642) [Link] (1 responses)

That makes sense, thanks! Maybe this was more obvious in the video - I only read the LWN article.

Renaming "debuginfo"

Posted Feb 23, 2026 11:06 UTC (Mon) by excors (subscriber, #95769) [Link]

For avoidance of doubt, what I posted was mostly my own supposition - I watched part of the video and it didn't really say any more than the LWN article on this point. But the talk was explicitly criticising the "prevailing mental model" of all-or-nothing debuginfo (in which some enterprise customers find the 'all' option "scary" for unspecified reasons), and promoting a more fine-grained understanding that ideally shouldn't be tied to the misleading and misunderstood word "debuginfo", so I'm moderately confident in my interpretation of that.


Copyright © 2026, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds