The Compact C Type Format in the GNU toolchain

By Jonathan Corbet
August 6, 2019

The Compact C Type Format (CTF) is a way of representing information about a binary program; it can be seen as a simpler alternative to the widely used DWARF format. While CTF has been around for some years, it has not seen much use in the Linux world. According to Elena Zannoni, who talked about CTF at the 2019 Open Source Summit Japan, that situation may be about to change; work is underway to bring CTF support to the GNU tools shipped universally with Linux systems.

Compiling a program into its binary form discards a lot of information found in the source code; that information can be needed when the time comes to track down a bug in the compiled program. To facilitate this work, compilers create debugging information that records the names and types of the variables used by a program, along with function names, the line numbers in the source program, and more; this information is then stored in one of many formats. DWARF is by far the most commonly used format on Unix-like systems, but it is not the only one.

Given the dominance of DWARF, one might wonder why anyone would want to work on alternatives. One problem is that DWARF is complex; rather than containing straightforward information about a program, a DWARF entry is essentially a program in its own right that can be run to generate the needed information. That makes DWARF flexible, but it's also complicated and verbose; the DWARF data associated with a program can be huge. That size means that, on most systems, the DWARF data for the installed programs is relegated to "debuginfo" packages that are not even present unless the owner has gone out of their way to install them.

CTF was created out of a desire to be able to perform most debugging tasks even in the absence of debuginfo packages, and to be able to do so in a simpler and faster way. DWARF can also expose a lot of information about the program source that some companies might wish to keep to themselves; CTF contains a lot less unneeded information. The CTF format was first created for the Solaris system, but has been used with the Linux DTrace port since 2012. There is, she said, nothing DTrace-specific about CTF, though. It is also available on FreeBSD and macOS.

The key difference between CTF and DWARF, perhaps, is that CTF limits itself to modeling the type system and managing the mapping from symbol-table entries to specific types. DWARF is much more ambitious, she said, modeling everything relating to the C language and how it maps to the hardware. CTF's simplicity means it can omit location lists, stack machines, and a lot of other machinery.

Bringing CTF to Linux requires contributing a lot of code upstream. One piece of the puzzle is adding support to GCC; the new -gt option will cause the compiler to generate CTF data. Getting that data into the final executable requires support in the binutils package as well; this includes enhancing the linker as well as adding support to tools like objdump and readelf. Work is also being done to get CTF support into the GDB debugger. Zannoni showed some sample output from the size utility; the CTF data (stored in an ELF section named .ctf) required about 213KB of space, as compared to over 4MB for the DWARF data for the same program. DWARF data for the kernel requires 1.6GB; the CTF data fits in just under 7MB.

CTF and DWARF data can coexist in the same ELF file, she said, since the CTF data has its own dedicated section. The CTF data is naturally smaller, but the format also includes compression to reduce the size requirements further. The result is that this data, unlike DWARF information, need not be stripped to get the executable file down to a reasonable size. Thus, while DWARF data is normally shipped in separate debuginfo packages, CTF data is easily included in the binary package and can be always available.

Internally, CTF data is stored in a structure called a "container" or a "dictionary". Each dictionary contains a header and a number of subsections dedicated to data like function information, variable information, types, and a string table for names not already present in the ELF string table. The header starts with a magic number (useful for determining the endianness of the rest of the data), a version number, and a set of flags. Version one is the original Solaris CTF, while version two was created during the porting of DTrace to Linux. It mostly increases a number of limits found in the first version. The third version is still being defined; it will include a number of header changes among other things. There is even a fourth version in an "initial planning stage". The intent is to keep this data ABI compatible, though, she said.

Returning to linker support, Zannoni noted that GCC will place a single .ctf section in each object file it creates. The linker then has to take these sections and merge them into a larger section, removing any duplicate information. There is a potential problem, though, in that different object files may define conflicting objects using the same names. When this happens, the linker will create a child dictionary associated with a specific translation unit for the conflicting data. Most of the time, though, a linked executable will contain one large shared CTF dictionary and perhaps a small number of tiny subdictionaries.

There is a libctf library being added to the binutils package that implements the ability to read and write CTF data; it is used by the compiler, the linker, and the debugger. This library, along with the readelf and objdump changes, were merged into the binutils trunk in May; the linker changes have been posted but need some more work before they can be merged. The hope is that all of the CTF support will land in binutils for the upcoming 2.33 release.

The GCC patches have been posted a few times; they too are being modified in response to review comments. One piece that has not yet been posted is link-time optimization support, but it is coming soon. With luck, she said, all of this support will be merged in time for the GCC 10 release due in 2020. GDB support is also under discussion on the project mailing list; the GDB 8.4 release is being targeted for this work.

Zannoni closed with a look at where things go from here. There is a set of discussions planned for the Toolchains microconference at the 2019 Linux Plumbers Conference. There are, evidently, still optimizations that can be made to further reduce the size of CTF data. There is also a fairly significant gap in that backtrace support is not yet present for CTF. An expansion to languages other than C is on the horizon. Then, there is that perennial lowest priority for development teams: documentation. The "specification" for the format lives in a C header file for now; that will clearly need to change in the future.

[Your editor thanks the Linux Foundation for supporting his travel to the event.]

Index entries for this article
Conference	Open Source Summit Japan/2019

CTF's simplicity means it can omit location lists, stack machines, and a lot of other machinery.

Posted Aug 6, 2019 19:18 UTC (Tue) by scientes (guest, #83068) [Link] (4 responses)

What does this mean for the user experience? I really can't imagine a gdb session that doesn't understand the stack. And without location lists there is no understanding of variable names?

some future CTF API ideas

Posted Aug 6, 2019 20:13 UTC (Tue) by nix (subscriber, #2304) [Link] (3 responses)

It is true that CTF doesn't know what scopes are, and that it will almost certainly never grasp local variables (unlike global-scope variables, which it understands already) -- but the backtrace section should be able to grasp what function parameters are, their names, types, and all but the oddest examples of locations. The idea is that you can get enough knowledge for 'bt' to work, and that you can then dig around in the values of the function parameters that the backtrace can show you. It's not a full debugger, but then it'll also be much smaller than DWARF and always present. (Of course, the backtrace section is still being designed, so all of this may change...)

The "always present" part allows for some very neat things to happen in future which seem to belong to languages much more dynamic than C. There are so many this comment box is far too small to contain them, but one small example. An API I've been vaguely thinking of adding to libctf, which requires no extensions to the current CTF file format, is this:

void *ctf_get_value_from_function (ctf_file_t *fp, const char *symbol, void *root, const char *path);

This is basically XPath for the C type system. You dig out the CTF section of a shared library via a ctf_open() call (giving you 'fp' above), find out the return value and args of the function "foo" you want to call by asking CTF, call it (perhaps via ffcall or dlsym) and stuff the return value in a suitable variable (call it "bar"), then you can call

ctf_get_value_from_function (fp, "foo", bar, ".memb_a.u_b.c");

which would look up the type of the return value of foo(), take 'bar' to point to that return value, and traverse the chain of structure/union members memb_a, then u_b from the resulting structure or union, then c from that, returning the value of c as the return value of ctf_get_value_from_function(). The above would be the equivalent of doing the second line of this at compile time:

some_type *bar = foo();
bar->memb_a.u_b.c

or perhaps

bar->memb_a->u_b.c

or anything similar.

Of course, you *can* do this now with shared header files at compile time -- but with this hypothetical new API, libctf would let you do it at *runtime*. No need to be a debugger and literally two or three lines of code. And obviously this syntax could be extended to support other languages once CTF supported them, and you could still call it from C perfectly well -- and you can't do *that* with shared header files at compile time. And it would always work without needing to get the user to install huge debuginfo packages, and with no perceptible delay.

(btw, I would recommend nobody try using the linker patches Jon has linked to here. They crash the assembler. New patches should land soon, and I'll follow up here when they do. This stuff is very bleeding-edge right now and debugging is ongoing. My TODO list is huge.)

some future CTF API ideas

Posted Aug 11, 2019 14:05 UTC (Sun) by scientes (guest, #83068) [Link]

> It is true that CTF doesn't know what scopes are, and that it will almost certainly never grasp local variables

What about with Zig, where you still have local scope, but where variables never shadow each other?

some future CTF API ideas

Posted Aug 12, 2019 5:18 UTC (Mon) by alison (subscriber, #63752) [Link] (1 responses)

Doesn't eBPF via uprobes provide functionality similar to ctf_get_value_from_function() at runtime already? I guess that gdb has the advantage of not requiring SUID.

some future CTF API ideas

Posted Aug 27, 2019 20:05 UTC (Tue) by k8to (guest, #15413) [Link]

In production, i find that no SUID doesn't help much if you still need ptrace.

The Compact C Type Format in the GNU toolchain

Posted Aug 6, 2019 19:25 UTC (Tue) by clugstj (subscriber, #4020) [Link] (48 responses)

If all it understands is "C", it's a non-starter for everyone except masochists.

The Compact C Type Format in the GNU toolchain

Posted Aug 6, 2019 19:48 UTC (Tue) by nix (subscriber, #2304) [Link]

The plan is to grow support for more languages over time -- but this is obviously a lot of work, since every language has a different type system! (And the ones people are most likely to want, like C++, are the most work. C'est la vie...)

There is certainly nothing fundamentally *stopping* us growing support for more languages.

The Compact C Type Format in the GNU toolchain

Posted Aug 6, 2019 20:37 UTC (Tue) by wahern (subscriber, #37304) [Link] (46 responses)

If CTF annotations are built into Debian and RPM binary packages by default (as opposed to being stripped as is the default for DWARF), and especially if made a default in GCC and clang, then we could have pervasive, [relatively] type-safe FFI in various languages. C is the lingua franca of FFI for not historical but also practical reasons. CTF being limited to describing C-like type systems isn't a significant limitation in that regard, while the potential utility is immense.

What we need is a critical mass of CTF annotated binaries such that a few projects can make the leap of depending on automagic, type-safe FFI. Then it becomes self-reinforcing. Once CTF is reliably pervasive, who knows what could come next in terms of support for high-order type systems. It's taking that first step that is critical.

If I had my way I'd make built-in DWARF annotations mandatory, but economy of space concerns always seem to win the day even though engineers spend countless hours, days, and even years of time struggling to diagnose and debug production issues because of the lack of annotations.[1] I see nothing but upsides with CTF.

[1] Yes, you can packages DWARF annotations on the side, but it's complicated and there's too much friction involved, especially with all the new half-baked build systems out there. Even when projects are capable of doing it they don't because people systematically underestimate the costs of missing debug symbols.

The Compact C Type Format in the GNU toolchain

Posted Aug 6, 2019 21:24 UTC (Tue) by nix (subscriber, #2304) [Link] (45 responses)

If CTF annotations are built into Debian and RPM binary packages by default

They are explicitly not marked as debugging sections in the linker and are even kept in when --strip-unneeded for that exact reason :) if CTF is stripped out, it's useless. You might as well use DWARF in that case.

btw, in re the sizes above: the 1.6GiB -> 7MiB figure is for the old non-ld-toolchain deduplicator: the deduplicator in the linker patches above doesn't really deserve the name because it does no cross-TU deduplication whatsoever. A quick check with the GNU ld patches I have in preparation now (still with no deduplicator) shows GNU ld itself clocking in at:

section                 size      addr
.interp                   28   4195040
.note.gnu.build-id        36   4195068
.note.ABI-tag             32   4195104
.gnu.hash                224   4195136
.dynsym                 6384   4195360
.dynstr                 3914   4201744
.gnu.version             532   4205658
.gnu.version_r           112   4206192
.rela.dyn                408   4206304
.rela.plt               5784   4206712
.init                     26   4214784
.plt                    3872   4214816
.plt.got                   8   4218688
.text                 289042   4218704
.fini                      9   4507748
.rodata              1229488   4509696
.eh_frame_hdr           6804   5739184
.eh_frame              43248   5745992
.tbss                      8   5794512
.init_array                8   5794512
.fini_array                8   5794520
.data.rel.ro             768   5794528
.dynamic                 496   5795296
.got                      32   5795792
.got.plt                1952   5795840
.data                   4604   5797792
.bss                    6416   5802400
.comment                  91         0
.debug_aranges          2320         0
.debug_info          1732570         0
.debug_abbrev          57088         0
.debug_line           241926         0
.debug_str            138861         0
.debug_loc            563108         0
.debug_ranges          50320         0
.ctf                  180737   5817008
Total                4571264

So with a dreadful deduplicator that I have spent literally no effort on at all, the CTF is a bit smaller than the .text, and perhaps 10% the size of the DWARF. It will only go down as the deduplicator is written, as the file format improves and as better compressors are added (lzma support seems likely, since binutils can already use it for .gnu_debugdata). If CTF ends up adding more than 1% to the size of executables once all this is done I will be quite surprised. Throwing the old kernel-type-focused deduplicator at this file produces CTF 58207 bytes long, already a radical reduction. I expect the ld deduplicator, once I write it, to do a better job.

The Compact C Type Format in the GNU toolchain

Posted Aug 6, 2019 21:42 UTC (Tue) by josh (subscriber, #17465) [Link] (6 responses)

I'd love to have much a smaller debugging format available.

But meanwhile, I don't look forward to figuring out a whole new set of incantations to strip it out, for applications and systems that *do* have real space considerations to deal with. I'd still like to have support for "separate debug symbols", and then have the option of much smaller debug symbols.

The Compact C Type Format in the GNU toolchain

Posted Aug 6, 2019 22:53 UTC (Tue) by nix (subscriber, #2304) [Link] (5 responses)

The idea is to make it so small that you don't consider stripping it out. If you're really so short of space that saving 1% of the executable size will save you, you're going to die in a month *anyway* from routine binary growth.

If people start "saving space" by forcibly stripping this section out, it's useless. It's meant to become as reliably present as, say, the .dynsym section is today. Again, this is *not* debugging information: it's *introspection* information. It's not just meant to be used to find problems in programs that have gone wrong: it's meant for C programs to be able to see their own types, and the types of programs they are interacting with, at runtime, as part of their normal operation. That's not just something that's useful for debugging.

The Compact C Type Format in the GNU toolchain

Posted Aug 6, 2019 23:05 UTC (Tue) by roc (subscriber, #30627) [Link] (1 responses)

Given you don't support backtraces or C++ yet, it seems premature to assume that 1% number.

The Compact C Type Format in the GNU toolchain

Posted Aug 6, 2019 23:42 UTC (Tue) by nix (subscriber, #2304) [Link]

Yeah, I was explicitly not including stuff I haven't designed yet and have no data on the size of :)

The Compact C Type Format in the GNU toolchain

Posted Aug 6, 2019 23:22 UTC (Tue) by josh (subscriber, #17465) [Link] (1 responses)

Binary growth is *not* inevitable or routine. And I certainly hope there's an option to disable or strip *anything* that adds bytes to the binary.

The Compact C Type Format in the GNU toolchain

Posted Aug 6, 2019 23:46 UTC (Tue) by nix (subscriber, #2304) [Link]

There is! objcopy. Most sections of the binary that add bytes to it cannot be stripped via an explicit strip option because doing so would inevitably break things. Do you really *want* an explicit option to strip out .text.cold?! (Sure, .ctf is not remotely comparable to part of the text section: but if it does start to get used for e.g. plugin interoperation, it woudl probably become comparable to .dynsym. You don't usually want to strip that out, either.)

In my experience, binary growth is both inevitable and routine: it even happens when constant effort is imposed to prevent it: the most you can do is slow it down, unless you want to eschew all new features forever. Even busybox grows slowly over time. Even that paragon of the optimizer's art, Elite, grew until it could no longer have known bugs fixed because fixing them would take a few bytes that just weren't there any more. Everything grows.

(Note that the .ctf section is *not* a loaded section. So it won't slow down program startup, make it use more memory or address space, or anything like that. It's loaded as needed by the things that use it.)

Run-time Introspection capability using the Compact C Type Format in the GNU toolchain

Posted Mar 23, 2024 6:08 UTC (Sat) by adityagurajada (guest, #170330) [Link]

Hi, (In 3/2024) I came across this older post, from 7/2019, discussion on using Compact C-Type format.

I am particularly interested in the comments of this responder who says: "... this is *not* debugging information: it's *introspection* information. ... [ it's meant for C programs to be able to see their own types, and the types of programs they are interacting with, at runtime, as part of their normal operation."

I've been thinking and doodling a lot on how-to build C-type system information into a C-program, so that one can query for that information at "run-time". And, do stuff like build automated print methods to pretty-print run-time structures (as an example).

This responder has hit the nail on the head. I'm wondering if you have any tooling that makes this possible. Or, if you would like to get connected off-line, outside this forum, to brainstorm the things necessary to build such a kind of infrastructure.

(How do we share email-IDs via this forum? I'm new to this group.)

Thanks in advance, --AdityA>

The Compact C Type Format in the GNU toolchain

Posted Aug 6, 2019 21:55 UTC (Tue) by roc (subscriber, #30627) [Link] (37 responses)

> They are explicitly not marked as debugging sections in the linker and are even kept in when --strip-unneeded for that exact reason :)

That's a neat trick, but inevitably it will lead to stripping tools getting --strip-really-unneeded options, which just means more complexity for the ecosystem going forward.

The Compact C Type Format in the GNU toolchain

Posted Aug 6, 2019 22:55 UTC (Tue) by nix (subscriber, #2304) [Link] (36 responses)

I don't see why. We don't have a --strip-dynsym-and-C++-rtti-and-drop-.eh_frame option, because programs use .eh_frame and .dynsym and RTTI and break if they're not present, and they're not very large: people who really want to strip such things use objcopy, and expect things to break horribly except in very special circumstances. I hope, in time, for CTF to be useful enough *outside* debugging contexts that this becomes true of CTF as well.

The Compact C Type Format in the GNU toolchain

Posted Aug 6, 2019 23:02 UTC (Tue) by roc (subscriber, #30627) [Link] (35 responses)

Indeed, almost every binary containing .eh_frame and .dynsym and RTTI breaks if they're not present so it almost never makes sense to strip them, and it never has.

OTOH it will be a long time, if ever, before almost every binary requires CTF to be present. In the meantime people will want to strip CTF and you won't be able to stop tools adding support for that and people configuring their builds to do so by default.

The Compact C Type Format in the GNU toolchain

Posted Aug 6, 2019 23:54 UTC (Tue) by nix (subscriber, #2304) [Link] (34 responses)

objcopy works. However, frankly, my attitude to people who try to rip random things out of binaries to save a few bytes in any but the most extreme embedded environments these days is to wonder if they're stuck in the 80s. Even if every binary in /usr/bin was a gigabyte in size we would *still* have huge oceans of untouched space on most current disks.

Fundamentally, there's a *reason* strip(1) doesn't strip CTF by default: it should hardly save any space and it rips out something that offers facilities not otherwise available. The format will be useless if it's stripped out routinely, and it should be small enough that *most* people don't bother. People who need to hunt for every last byte and are willing to use obscure options to do so probably both have a reason and are used to coping with the resulting breakage. (It will certainly break Objective Caml programs to strip out non-loaded sections that you don't recognise, for instance.)

(However... if you really want separated debugging information, we *do* have a CTF archive format that is specifically intended for sticking big piles of CTF into for later mmapping out. If people really want separated debug info, we could in theory arrange to dump all the CTF on the system into a .ctfa, and remove items from the CTFA on package uninstallation, and have libctf know to look there to pick it up -- or just look in /usr/lib/ctf/ -- a tree like /usr/lib/debug/ -- or whatever. My worry is that if you did that, people would soon say oh let's put it in a separate package! And now it's never present and it's useless. Having CTF in a separate file is not really a problem, though it doesn't buy you anything that I can see. Having it in a separate *package* that is not installed when the package is... that's a problem. That's what makes life so hard for systemwide debuggers now: the DWARF is never there.)

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 0:11 UTC (Wed) by roc (subscriber, #30627) [Link] (33 responses)

I think it's much more important for your goals to convince distro vendors to cooperate with you than to play tricks with header flags pretending CTF is not really debug info.

> That's what makes life so hard for systemwide debuggers now: the DWARF is never there.

It's not super hard to have debuggers automatically fetch and use system debuginfo packages. Pernosco does this. Even Fedora's gdb gets you most of the way there by telling you the command you need to run. We don't need new formats to solve this particular problem. (OK, to tell the truth, there is one other problem that needs to be fixed: you need an archive of all debuginfo for all versions of packages so you can debug the non-latest version of a package. We've built that for Pernosco too.)

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 0:30 UTC (Wed) by nix (subscriber, #2304) [Link] (15 responses)

I think it's much more important for your goals to convince distro vendors to cooperate with you than to play tricks with header flags pretending CTF is not really debug info.

It's not. It's type introspection info. It's no more debug info than C++ RTTI is. Programs can perfectly well introspect their own types without being debuggers in any sense.

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 0:40 UTC (Wed) by roc (subscriber, #30627) [Link] (12 responses)

You're right, that's fair.

However, you are pretending it is *needed* when for most binaries it currently is not.

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 10:52 UTC (Wed) by nix (subscriber, #2304) [Link] (11 responses)

I'm not talking about *now* -- and I'm also not suggesting that most binaries should be built with this. They probably shouldn't except if the distro knows it has tools that can use CTF on arbitrary programs (that's why you need a non-default compiler option, -gt, to turn CTF emission on), or unless the tool uses CTF to introspect itself. What I'm suggesting is that if you *have* built something with it, you probably did so because you *need* it -- and if you strip the CTF out of the binary there are literally zero libraries in existence that will know how to get at it unless you explicitly specify the path to the stripped-out CTF. And why would anyone compile with CTF only to render it immediately useless?

I have... painful experiences here. Back when we were converting DWARF to CTF at kernel link time and linking it into kernel modules, we had to actually *hack RPM at build time* via PATH shuffling and patching of /usr/lib/rpm/find-debuginfo.sh to even make it possible for RPM to not just strip out all non-loaded sections on the grounds that they must be unnecessary, no matter what size they were or whether RPM had never seen them before, including ripping all the CTF that we'd just gone to some lengths to link in.

To me that just seems like unwise behaviour on the part of a packaging system. RPM didn't know what that section was: why was it removing it? It might have been necessary. It *was* necessary for what we were doing, and RPM just removed it without so much as a by-your-leave. So... guess why strip(1) doesn't remove CTF? I don't want anyone who's actually using CTF to have to go through anything like that again just so they can package their own software without it being randomly broken by the packaging system.

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 11:45 UTC (Wed) by roc (subscriber, #30627) [Link] (1 responses)

That seems like a reasonable argument.

But it also seems like that would apply to DWARF debuginfo too. Why ask the compiler to generate DWARF if you're going to strip it out? Yet here we are.

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 12:19 UTC (Wed) by nix (subscriber, #2304) [Link]

I can only guess how this became the default.

I'd guess that debuginfo, in a world where debuggers are a special thing that is explicitly run by human beings when things go wrong, is something huge that is only *needed* when things go wrong, when there will be a human around who can install the necessary big packages. But you never want to compile something without any debug info for use in a production environment because if things go wrong you then have no debuginfo to use to diagnose it! So -g -O2 has become a sort of de facto standard for CFLAGS.

Of course the "you only need it when things go wrong" attitude has now been retarding the development of always-on systemwide debugging tools for something like fifteen years; but nobody wants to add extensive debuginfo shrinking machinery because it will slow down the link for something that is only rarely needed. It seems to me that the only *reason* debuginfo is only rarely needed is that tools that use debuginfo routinely cannot be developed because it can never be relied on to be present, because it is too big... it's a vicious circle.

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 12:19 UTC (Wed) by mjw (subscriber, #16740) [Link] (8 responses)

> I have... painful experiences here. Back when we were converting DWARF to CTF at kernel link time and linking it into kernel modules, we had to actually *hack RPM at build time* via PATH shuffling and patching of /usr/lib/rpm/find-debuginfo.sh to even make it possible for RPM to not just strip out all non-loaded sections on the grounds that they must be unnecessary, no matter what size they were or whether RPM had never seen them before, including ripping all the CTF that we'd just gone to some lengths to link in.

> To me that just seems like unwise behaviour on the part of a packaging system. RPM didn't know what that section was: why was it removing it? It might have been necessary. It *was* necessary for what we were doing, and RPM just removed it without so much as a by-your-leave.

I might be responsible for that. But it is simply that RPM follows normal ELF rules for stripping [*] (unless you give define macros to give find-debuginfo.sh additional arguments [**]). In general any non-allocated section can be stripped away (or put into a separate .debug file). Because that simply means that the section isn't needed at runtime.

> So... guess why strip(1) doesn't remove CTF? I don't want anyone who's actually using CTF to have to go through anything like that again just so they can package their own software without it being randomly broken by the packaging system.

Sorry, RPM uses elfutils eu-strip, which will not have special magic to treat .ctf sections specially.

But I do like CTF and I do hope it will become the default one day. Not to replace DWARF (it should be a companion to that), but to replace .gnu_debugdata [***]. Which is used by various tools now to have an "extra symbol table".

So lets talk how to integrate this with RPM/elfutils/systemtap/etc. Maybe on the elfutils and/or binutils mailinglist?

[*] http://www.linker-aliens.org/blogs/ali/entry/how_to_strip...
[**] https://gnu.wildebeest.org/blog/mjw/2017/06/30/fedora-rpm...
[***] https://fedoraproject.org/wiki/Features/MiniDebugInfo

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 21:48 UTC (Wed) by nix (subscriber, #2304) [Link]

In general any non-allocated section can be stripped away (or put into a separate .debug file). Because that simply means that the section isn't needed at runtime.

Well, it means it isn't needed by the executable loader. I was *forced* to make libctf non-loadable by internal constraints in ld (roughly, that you cannot simultaneously have an allocated section whose size is not known before bfd_elf_final_link() and that the symtab and strtab are not laid out until halfway through that function and that CTF needs to know the offsets of all strings in the strtab and the order of symbols, *and* it's compressed so its content affects its size: so by extension the section may not be allocated). That doesn't mean it's not going to be used by programs at runtime. It is. (Well, assuming anyone uses it at all. :) ). They load it out of the binary as needed using BFD.

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 21:55 UTC (Wed) by nix (subscriber, #2304) [Link] (6 responses)

Sorry, RPM uses elfutils eu-strip, which will not have special magic to treat .ctf sections specially.

I guess that means CTF will always be stripped out of RPMs, and libctf and the CTF format by extension will be useless on RPM systems. This seems unfortunate. Is it really so hard to mark .ctf sections as not stripped? If it takes more than a couple of lines, something seems to me to be wrong.

(This is not the only tool missing support right now, of course: gold can't link CTF sections either. But I plan to add that and I did also plan to submit changes to elfutils to stop eu-strip throwing the section out. I'm rather unhappy to discover that this is pre-emptively rejected.)

But I do like CTF and I do hope it will become the default one day. Not to replace DWARF (it should be a companion to that), but to replace .gnu_debugdata [***]. Which is used by various tools now to have an "extra symbol table".

That won't work, I'm afraid. CTF does not contain a symbol table, since that would be a waste of space since ELF already has one. Instead, it relies on the ELF symtab. Its function and data object sections are 1:1 ordered in the same order as the ELF symtab (basically, you traverse the ELF symtab and whenever you pass another function symbol, you match it to a function info section entry: whenever you pass another data symbol, you match it to another data object section entry). This saves quite a lot of space: data object section entries in particular are only four bytes each (one type ID).

(To deal with the problems of dynamic symbol tables getting stripped out of binaries, Solaris defined .ldynsym, which appears to be much what .gnu_debugdata is, only it's just a symbol table rather than a whole LZMA-compressed ELF object containing a symbol table.)

Plus of course there's not much chance of CTF becoming the default if you insist on stripping it out of executables so nothing that needs it can ever find it. ;)

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 22:28 UTC (Wed) by mjw (subscriber, #16740) [Link] (5 responses)

> I guess that means CTF will always be stripped out of RPMs, and libctf and the CTF format by extension will be useless on RPM systems. This seems unfortunate. Is it really so hard to mark .ctf sections as not stripped? If it takes more than a couple of lines, something seems to me to be wrong.

It shouldn't be hard to keep it, if a package or distro decides that is the thing they want.
For example rust packages do something like:
%global _find_debuginfo_opts --keep-section .rustc

So all we need to do is define some macro that packages can set for find-debuginfo.sh to do "the right thing" and then a package or distro can decide to make that the default.

> I did also plan to submit changes to elfutils to stop eu-strip throwing the section out. I'm rather unhappy to discover that this is pre-emptively rejected.

That is not my intention. Note that I am a not a native English speaker. My apologies if I seem to come over as negative.

> CTF does not contain a symbol table, since that would be a waste of space since ELF already has one. Instead, it relies on the ELF symtab. Its function and data object sections are 1:1 ordered in the same order as the ELF symtab (basically, you traverse the ELF symtab and whenever you pass another function symbol, you match it to a function info section entry: whenever you pass another data symbol, you match it to another data object section entry). This saves quite a lot of space: data object section entries in particular are only four bytes each (one type ID).
>
> (To deal with the problems of dynamic symbol tables getting stripped out of binaries, Solaris defined .ldynsym, which appears to be much what .gnu_debugdata is, only it's just a symbol table rather than a whole LZMA-compressed ELF object containing a symbol table.)

OK. So how do you deal with .symtab being stripped away by default then?
Would it be an idea to adopt the .ldynsym from Solaris?
.gnu_debugdata was defined before we had compressed ELF sections in the standard.
Now that we have it maybe we should make .symtab a compressed section?
https://gnu.wildebeest.org/blog/mjw/2016/01/13/elf-libelf...

> Plus of course there's not much chance of CTF becoming the default if you insist on stripping it out of executables so nothing that needs it can ever find it. ;)

Really, I don't understand why you think that is my intention. I might not fully understand all details yet. But I am actually interested in making CTF into something useful.

Will you be at the GNU Tools Cauldron in Montréal, Canada next week?
It might be easier to talk some ideas over in person.
https://gcc.gnu.org/wiki/cauldron2019

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 22:32 UTC (Wed) by mjw (subscriber, #16740) [Link] (1 responses)

> Will you be at the GNU Tools Cauldron in Montréal, Canada next week?

Sorry, next month. (September 12 to 15, 2019)

The Compact C Type Format in the GNU toolchain

Posted Aug 9, 2019 11:33 UTC (Fri) by nix (subscriber, #2304) [Link]

Ah. I thought the timing you gave was strange for the Cauldron! That *overlaps* with LPC so in the absence of a teleporter or military jet to get from Lisbon to Montreal in no time... :/

The Compact C Type Format in the GNU toolchain

Posted Aug 9, 2019 0:51 UTC (Fri) by himi (subscriber, #340) [Link] (1 responses)

Sorry, RPM uses elfutils eu-strip, which will not have special magic to treat .ctf sections specially.

This can be read as meaning "no version of eu-strip will ever have the special magic", rather than what I believe you meant: "any eu-strip you find in the world right now will not have the necessary special magic".

The Compact C Type Format in the GNU toolchain

Posted Aug 9, 2019 11:21 UTC (Fri) by nix (subscriber, #2304) [Link]

Oh. Yes, that is how I parsed it. I completely understand that *existing* strip tools will strip this out. This just means that if you upgrade binutils to a CTF-generating version, you'd have to upgrade elfutils to one that doesn't strip it out as well. Given that you're already stuck having to upgrade the compiler in synchrony too to get this stuff to work, adding one extra package doesn't sound like an intolerable administrative burden. (I hope.)

The Compact C Type Format in the GNU toolchain

Posted Aug 9, 2019 11:32 UTC (Fri) by nix (subscriber, #2304) [Link]

> %global _find_debuginfo_opts --keep-section .rustc

... now why didn't I think of that? Probably because when I was doing this back when I had *multiple* sections to deal with, with names like .ctf.*, so telling other things what the sections were called this time would have been quite painful. We have our own internal container format now precisely to avoid this sort of problem, so we could use this quite well.

> That is not my intention. Note that I am a not a native English speaker. My apologies if I seem to come over as negative.

Sorry, I completely misparsed your sentence! (See my comment a couple of hops down). Phew, that had me panicking a bit for a moment. :)

> Now that we have it maybe we should make .symtab a compressed section?

Compressed sections in GNU ld at least seem a bit ad hoc. I think you'd need to do quite a lot of work to bfd_elf_final_link and environs to make it possible to have allocated sections that other sections depend upon that are also compressed: every existing section with content that changes after layout time (earlier in bfd_elf_final_link than strtab / symtab layout time) either has unchanging size or is non-allocated (and even there, there are fairly dreadful hacks around .zdebug, which I'm afraid I made a little bit worse with .ctf :) ).

I'm not sure .symtab would compress terribly well, either -- it has a lot of fields with "ID-like" content that only appears once, and thus compresses rather badly. (CTF goes to some lengths to avoid content like this for just that reason). The strtab would certainly compress better, but I can see why you don't compress it -- you don't want to impose decompression costs on the whole strtab on every execution.

> Really, I don't understand why you think that is my intention.

A really terrible misparsing of an ambiguous sentence on my part, and you know how hard it is to find alternate meanings of a sentence when you've already fixated on one that is dreadful :) Sorry!

> Will you be at the GNU Tools Cauldron in Montréal, Canada next week?

Alas, I'm going to LPC instead. I'm spending the next two weeks listening to chamber music in the North York moors...

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 3:44 UTC (Wed) by josh (subscriber, #17465) [Link] (1 responses)

And today, people regularly compile out C++ RTTI.

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 10:19 UTC (Wed) by khim (subscriber, #9252) [Link]

And THAT is how CTF should be handled, too. If you *reaaaly* don't want it - ask compiler not to produce it. Most developers wouldn't care.

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 15:53 UTC (Wed) by luto (guest, #39314) [Link] (16 responses)

IMO what we need for locating debuginfo is a standard API by which a debugger can ask the distro to locate a debug info file. Distros patches to gdb are annoying, and they also preclude things other than gdb from reliably finding debug info files.

I would love for the kernel to be able to drop something in /usr/lib/debuginfo.d/find_vdso_debuginfo.sh, for example.

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 15:59 UTC (Wed) by fuhchee (guest, #40059) [Link] (15 responses)

Some of us are working on just such a thing, and will present our progress at the GNU Cauldron in Montreal next month.

https://sourceware.org/git/?p=elfutils.git;a=tree;f=dbgse...

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 16:34 UTC (Wed) by luto (guest, #39314) [Link] (3 responses)

I peeked a bit. Will this also handle debuginfo files that are directly installed on the filesystem?

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 16:42 UTC (Wed) by fuhchee (guest, #40059) [Link] (2 responses)

Yup!

% dbgserver -F /path/to/base/directory

should find executables / debuginfo / corresponding sources

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 16:53 UTC (Wed) by luto (guest, #39314) [Link] (1 responses)

How does this ending up working on a normal desktop or server? Is the idea that there would be a systemwide dbgserver instance, perhaps socket activated, or maybe several instances?

As an admin, I would much rather *not* have a systemwide daemon for this, since that implies a path by which one user can attempt to attack another user or the system as a whole. I'd rather if each user could, on demand, start up their own instance of whatever libraries and programs are needed to make debugging work. Nothing here should require any form of privilege.

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 17:47 UTC (Wed) by fuhchee (guest, #40059) [Link]

> Is the idea that there would be a systemwide dbgserver instance, perhaps socket activated, or maybe several instances?

Any of the above.

>As an admin, I would much rather *not* have a systemwide daemon for this, since that implies a path by which one user can attempt to attack another user or the system as a whole.

Fair enough, though DoS is perhaps the worst of the possible attacks.

> Nothing here should require any form of privilege.

Right.

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 18:05 UTC (Wed) by madscientist (subscriber, #16861) [Link] (6 responses)

I hope that consideration of non-packaged / generic sources for debuginfo is being considered / allowed for. I've really wanted to start stripping out debuginfo from our binaries on my development team since they're so huge, but we don't create RPM or DEB files. When I propose using external debuginfo people are not happy about the developer overhead (even scripted) so we've never done it.

If it were possible to easily "register" debuginfo files created through Jenkins or some other build service without having to turn them into distro packages, then allow tools (i.e., GDB) to download them more or less invisibly when needed, that would be really nice.

Also is this limited to just debuginfo files?

The other big problem we have is cores being generated on remote systems which are using system libraries other than the local ones: in this situation we need to obtain the remote system's libc.so and other necessary system libraries. It would be really, really nice if we could register shared libraries from different systems, perhaps indexed via a hash of some kind, then have GDB automatically download them as well.

Of course, before this can be done we need to ensure that the core file contains enough information about the shared libraries to perform the lookup, which I doubt it does today, so this is requires more work in other places... however it would be good if the design of this service was able to be extended in this way in the future if/when it becomes feasible. For example in our system we use Google coredumper library rather than the kernel to dump cores and this allows us to add "notes" into the generated core file, so we could take advantage of this without Linux kernel support.

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 19:18 UTC (Wed) by fuhchee (guest, #40059) [Link] (5 responses)

Compiling buildids into your binaries is enough for this widget to find system shared libraries & their debuginfo (and possibly their sources), and serve them to a remote debugger.

It seems like we're all thinking roughly alike. Exciting times!

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 19:41 UTC (Wed) by madscientist (subscriber, #16861) [Link] (4 responses)

Hm... we must be talking about something different. Let be more clear.

I compile a program on my build system (I use a sysroot to ensure that it links against sufficiently older system libraries that it can run "anywhere"). I send my program out to run tests some other system running some random distribution completely different than the one it was built on, which is using a different GNU libc, etc. Maybe Travis, or AWS, or just a local test farm.

It fails and a core is generated. To debug that core I need my program, the debuginfo from my program (if the program is stripped), the core file, and the system libraries from the system it was running on when the core is generated.

I can't see any way that a buildid compiled into my binary can be sufficient to retrieve the runtime system libraries.

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 19:52 UTC (Wed) by fuhchee (guest, #40059) [Link] (3 responses)

The runtime system shared libraries have their own buildids, and the relevant ELF note sections should show up in the core dump. From those buildids, the relevant binaries / debuginfo can be found.

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 20:22 UTC (Wed) by madscientist (subscriber, #16861) [Link] (2 responses)

I see. So dbgserver_find_executable() is intended to be used with shared libs as well? Or is this part not quite complete?

I think it would also be helpful if the client interface provided separate lookup and download methods rather than forcing them to both be a single method (there can be a simplified "do both" method as well if wanted). I can easily imagine situations where we want to know whether a given buildid exists on the server without actually downloading it.

For example, suppose I have a suite of test servers running random environments; during test runs a core is generated. I want to know if the program under test and/or system libraries for this system already exist in the debug server or not: I just want to look them up but not download them. If they don't exist perhaps I'll include them along with the core file when I bundle up the build results. If they do exist I don't need to add them.

Or perhaps I have an automated way for the test system to upload binaries and/or system libraries that aren't already on the debug server (I understand that upload is not in scope for this project and would need some other process) but I don't want to bother uploading things that I already have so I need to be able to check.

A simple program that uses the client interface to look up and/or download files would be very useful, as an example if nothing else (and probably for people who would like to add scripting to systems where it's not so simple to recode them to use it).

Cheers!

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 21:09 UTC (Wed) by fuhchee (guest, #40059) [Link] (1 responses)

> I see. So dbgserver_find_executable() is intended to be used with shared libs as well?

Yes.

> I think it would also be helpful if the client interface provided separate lookup and download methods

Will consider that ... though there may be better ways to service the needs you outline. Deduplication at upload time should be easy too. Re. optimizing packaging of core dumps ... not sure how much sense that makes. The core dump recipient could consult the same debuginfo servers too; or you could preemptively package all the files. Will think on it more.

> A simple program that uses the client interface to look up and/or download files would be very useful

It just appeared in the repo! We employ only the most talented psychics and keyboard monks.

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 21:29 UTC (Wed) by madscientist (subscriber, #16861) [Link]

> Deduplication at upload time should be easy too

If you mean deduplication by the server that's probably helpful but it's a lot of wasted effort to upload 10's or 100's of MB of libraries, binaries, etc., only to have it tossed on the floor as duplicate. Consider a build farm with 200 systems, which are upgraded via apt-get update or whatever at random intervals so they have different system libraries, different program instances, etc... having every system upload all its files for every core even though the system libraries might only change once every few weeks or less seems like overkill.

> Re. optimizing packaging of core dumps ... not sure how much sense that makes. The core dump recipient could consult the same debuginfo servers too; or you could preemptively package all the files.

For this I wasn't thinking that the dbgserver code would do that, I was thinking about scripting that users are using with their test clients to bundle results of failures so they can be uploaded to a test server for further investigation. Our current scripting already preemptively packages all the files: what I'd like to be able to do is detect when some/all of these items are not needed and skip that to reduce the size of uploaded artifacts.

When you're talking about moving content into/out of AWS or other cloud providers, the amount of data sent over the network directly equates to $$ spent and reducing it is always welcome.

Thanks for working on this, it'll be very cool!

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 22:01 UTC (Wed) by nix (subscriber, #2304) [Link] (1 responses)

Ooh this is very nice, very nice indeed.

I wonder if it can handle more than just debuginfo... musing about having libctf automatically launch dbgserver queries for missing CTF sections now -- so people can have separated CTF if they really want *and* it is as if it were always present. Best of all worlds! For that matter they can also do both -- perhaps an option at CTF generation time which automatically emits separated CTF *if* its size passes some threshold, or some percentage threshold of the total binary size, or the .text size, or something. Of course then you'd have to arrange for the dbgserver to see it, but presumably whatever method is used for separated debuginfo would work for this too.

The Compact C Type Format in the GNU toolchain

Posted Aug 8, 2019 12:28 UTC (Thu) by fuhchee (guest, #40059) [Link]

> I wonder if it can handle more than just debuginfo... musing about having libctf automatically launch dbgserver queries for missing CTF sections now -- so people can have separated CTF if they really want

It should be a small amount of extra effort to extend it sideways to a 'ctf' sibling to 'debuginfo'.

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 22:26 UTC (Wed) by thoughtpolice (subscriber, #87455) [Link] (1 responses)

This is very interesting. We have a tool for NixOS that basically does except it is based on a FUSE filesystem and a content addressable packaging system: https://github.com/edolstra/dwarffs

Essentially, every version of every package has a unique hash. We build a reverse mapping from the buildids of the binaries in a package to its unique hash, and upload that metadata along with the package to the package server. We then patch GDB (and elfutils) to look in a specific directory for debug info. This directory is a FUSE filesystem, and when any tool tries to look in `.build-id/...` for the debug info -- it does a query to the package server, obtaining the unique package ID containing the symbols, and transparently installs them through the package manager. It is effectively a version of Microsoft Symbol Server, which is basically what people want, from what I can tell...

Perhaps we could replace dwarffs with something like dbgserver, however. Or integrate them so there's a single UX. We could for instance, perhaps replace the client tooling with a separate "backend" for our case, and the tools can all just work around that instead...

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 23:25 UTC (Wed) by fuhchee (guest, #40059) [Link]

Yup, was aware of your server! We wanted something more http flavoured and a little more distro-independent. Joining forces would of course be wonderful!

The Compact C Type Format in the GNU toolchain

Posted Aug 6, 2019 20:21 UTC (Tue) by nix (subscriber, #2304) [Link] (7 responses)

Note that the CTF format used on FreeBSD, Solaris, and MacOS is *different* from this CTF, an older cousin if you will: it is stored in an ELF section with a different name, it has fairly harsh range limits, and there are some things it cannot represent, not just nonportable things but some perfectly normal C like enum bitfields. Right now the two formats are incompatible and no tools can read both, but my hope in future is for libctf to become a sort of 'format oracle': not only will it be able to read MacOS/FreeBSD CTF and write it out as the latest version of the CTF described here, but it will also be able to read in the CTF described here and write it out as older formats, assuming the limitations of those older formats allow it (e.g. trying to write a dictionary with 2^17 types in it as a FreeBSD-compatible CTF must fail because it can't hold that many).

(But of course this is only a plan and right now this does not exist.)

The Compact C Type Format in the GNU toolchain

Posted Aug 6, 2019 21:38 UTC (Tue) by markjdb (guest, #94056) [Link] (6 responses)

Is there a specification somewhere for the updated format? I recently wrote a new libctf and DWARF->CTF converter for FreeBSD with the aim of not having to run ctfconvert on each CU before linking. The limits you mention are indeed quite annoying and I'd be interested in adopting a newer CTF version if someone's already thought through the details.

The Compact C Type Format in the GNU toolchain

Posted Aug 6, 2019 23:00 UTC (Tue) by nix (subscriber, #2304) [Link] (1 responses)

There will be a spec by the end of the month -- and I'll add the plans for v4 into it as well so people can tell what is planned on the format front. v3 is definitely less compact than FreeBSD's format, and v4 will get that compactness back again for all but very large projects that really need the increased type range.

If we could cooperate on future improvements, I'd be very happy! (But I fear that, as usual, code reuse is difficult because libctf is in the GNU toolchain, so it's GPL... how annoying, a planned format oracle that its principal beneficiaries will probably refuse to use :( but that, I suppose, is why the format needs documentation as well.)

The Compact C Type Format in the GNU toolchain

Posted Aug 6, 2019 23:47 UTC (Tue) by markjdb (guest, #94056) [Link]

I would be happy with "just" documentation. :)

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 9:27 UTC (Wed) by movement (subscriber, #871) [Link] (3 responses)

Any reason you didn't use illumos's newer CTF tools, which also don't need a separate ctfconvert step for each .o?

The Compact C Type Format in the GNU toolchain

Posted Aug 8, 2019 5:03 UTC (Thu) by markjdb (guest, #94056) [Link] (2 responses)

They still work by generating CTF for each CU, and then pairwise merging the CTF files. It is faster to generate the type graph directly from DWARF, or at least without using CTF as an intermediate representation.

The Compact C Type Format in the GNU toolchain

Posted Aug 8, 2019 8:53 UTC (Thu) by movement (subscriber, #871) [Link]

I see, thanks.

The Compact C Type Format in the GNU toolchain

Posted Aug 9, 2019 11:35 UTC (Fri) by nix (subscriber, #2304) [Link]

One of our motivations for getting this stuff in GCC is that it's even faster to not generate the DWARF at all, particularly as it is often so large that it overflows caches and now you're incurring disk write time for all that stuff, which if you only want the CTF is completely wasted. (Of course you can build both in if you want to, but you don't *have* to.)

The Compact C Type Format in the GNU toolchain

Posted Aug 6, 2019 20:26 UTC (Tue) by nix (subscriber, #2304) [Link] (7 responses)

btw, my next priority after getting the linker working, even before writing a proper deduplicator for the types in the .ctf section, is documentation. This *matters*.

The Compact C Type Format in the GNU toolchain

Posted Aug 6, 2019 21:53 UTC (Tue) by roc (subscriber, #30627) [Link] (6 responses)

When writing a spec please please make sure the spec includes everything a consumer needs to know to parse all relevant versions.

I.e. don't do the DWARF thing where the latest version of the spec describes only the latest version of the format, so to write a tool that handles DWARF 2-5 you have to read multiple specs and combine them in your head.

The Compact C Type Format in the GNU toolchain

Posted Aug 6, 2019 23:05 UTC (Tue) by nix (subscriber, #2304) [Link] (5 responses)

I'm probably going to do the spec in ANTINEWS format: the latest version, and then earlier versions defined in terms of their differences from the next higher version. This seems like a joke when Emacs does it but I think for file formats it might make considerable sense and allow more flexibility than the whole-new-spec-every-time approach DWARF takes. (Also, note that if you have libctf available, in time you won't need to go through agony parsing every possible version yourself: you can just ask libctf to convert the CTF dictionary into a version you understand, and it'll tell you if it can't. Not that libctf can *do* that yet, but that's planned. It already silently upconverts from older versions into the latest at open time, so consumers only need to understand the latest version and can rely on libctf for older versions. Not back to FreeBSD/Solaris versions yet, mind you: that too is coming.)

The Compact C Type Format in the GNU toolchain

Posted Aug 6, 2019 23:16 UTC (Tue) by roc (subscriber, #30627) [Link] (4 responses)

ANTINEWS definitely sounds better than what DWARF does!

If CTF takes off people are definitely going to write consumers that don't use libctf. For one thing, I do not trust C libraries to parse possibly-malicious data so we would write a Rust library if we added CTF support to Pernosco. And inevitably there will be people who don't like libctf for *reasons* and do their own thing.

> It already silently upconverts from older versions into the latest at open time

Hmm, so in that case it parses the entire file up-front? That seems like a regression from DWARF.

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 0:06 UTC (Wed) by nix (subscriber, #2304) [Link] (3 responses)

If CTF takes off people are definitely going to write consumers that don't use libctf. For one thing, I do not trust C libraries to parse possibly-malicious data so we would write a Rust library if we added CTF support to Pernosco.

Errr... if your CTF is malicious, the binary is malicious. Isn't the malicious executable code likely to be a bigger worry? (But I would rejoice at the thought of a Rust libctf. It's just a shame binutils libctf wouldn't really be able to use it: I doubt Rust works on mingw. :P )

Hmm, so in that case it parses the entire file up-front? That seems like a regression from DWARF.

It has to, as a consequence of the decision to not spend space on indexes or indentifiers. On loading, we have to sweep through the type section and construct a mapping of type ID -> file offset, intern type names in various hashes etc (we sweep through the two symbol-related sections too, constructing a symbol number -> file offset mapping). Given that we have to sweep through the file to decompress it anyway, the thing is almost always already in L2 cache in any case so the cost is in the noise. Back when I had a deduplicator so the files were small, I literally could not measure the cost of doing this, nor the cost of doing aggressive upgrading of the entire file. And it's far simpler than doing it lazily.

In any case, one of the planned revisions for v4 is a proper intermediate representation inside libctf whereby we unpack individual types into their largest possible form on the fly for internal processing, then pack them back down into their maximally compact form for storage (even just in memory), which lets us eliminate a lot of ugly redundancy in libctf. With this in place, we can shift from aggressive upgrading to literally storing in whatever old format people want. (But we'll still need to sweep through the file at open time to construct that mapping. And this is all months away in any case.)

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 0:24 UTC (Wed) by roc (subscriber, #30627) [Link] (2 responses)

> Errr... if your CTF is malicious, the binary is malicious. Isn't the malicious executable code likely to be a bigger worry?

Not necessarily. For example you might want to run `objdump` on a possibly-malicious executable and not get owned.

Alternatively, you might have a debugger that runs the debuggee code in a sandbox but where you want to process debuginfo outside that sandbox. This is our situation.

> It has to, as a consequence of the decision to not spend space on indexes or indentifiers.

Hmm. You should probably highlight this tradeoff, because it is significant.

> Back when I had a deduplicator so the files were small, I literally could not measure the cost of doing this, nor the cost of doing aggressive upgrading of the entire file.

OK but how big was that binary? Things that work well for reasonable-sized programs don't necessarily work well for Firefox or Chromium. If you want people to use CTF to generate FFI glue at runtime, for example, even a small startup penalty is going to cause people to look for alternatives.

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 0:55 UTC (Wed) by nix (subscriber, #2304) [Link] (1 responses)

OK but how big was that binary? Things that work well for reasonable-sized programs don't necessarily work well for Firefox or Chromium. If you want people to use CTF to generate FFI glue at runtime, for example, even a small startup penalty is going to cause people to look for alternatives.

Let's try it for an enterprise Linux kernel (because I've got one sitting here waiting). The kernel splits its CTF unusually: let's try the output of the old deduplicator, vmlinux.ctf (types only used by the core kernel) plus its parent shared_ctf.ctf (types used by more than one module, or by at least one module and the core kernel). Put together these are 1509340 bytes compressed, 4267753 bytes uncompressed. With a good deduplicator you need a *big* program for that, though no doubt a C++-capable CTF would find Chromium to be just such.

A thousand cats:

1.26user 0.33system 0:01.52elapsed 104%CPU (0avgtext+0avgdata 3320maxresident)k

A thousand uncompresses (done by hacking libctf to abort on error and free everything immediately after uncompressing). Unsurprisingly gunzip is not free:

34.42user 3.64system 0:38.03elapsed 100%CPU (0avgtext+0avgdata 9472maxresident)k

A thousand dumps of the CTF header redirected to /dev/null (which roughly involves open, decompress, and sweep for indexes etc, do almost no work, close):

35.28user 2.97system 0:38.23elapsed 100%CPU (0avgtext+0avgdata 9468maxresident)k

That's in the noise: if it costs anything, the indexing costs well under 1% of the cost of decompression: and since it increases the efficiency of compression to do this sort of thing, it may in the end *save* time as well as space. (I also tried this with an old-format file: the transparent upgrade pass was also in the noise.)

Note that the CTF link section merging machinery almost entirely resides in libctf and is intended to be reusable by other projects: it's not ld-specific, and you're not restricted to doing CTF merging the exact same way ld does it. Things like Chromium and Firefox might well elect to postprocess themselves and split up their CTF differently, yielding smaller CTF dictionaries customized for their use. (Right now, you can choose to split along boundaries different from translation unit boundaries, lumping TUs together into bigger units, and you can choose an alternative conflict-resolution strategy where rather than placing all types in one big dictionary unless they conflict, we place all types in per-TU subdictionaries unless they are used by more than one TU: so the parent TU gets a lot smaller. The linker doesn't use any of this stuff yet, but in time it might grow options controlling some of this. There's no point yet since most of that depends on a good deduplicator. The one I haven't written yet. :) )

... also of course we'd need clang support for CTF generation and gold and lld support for .ctf section merging *and* C++ support for CTF before Chromium or Firefox would become likely users. That's some way off, I think.

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 1:05 UTC (Wed) by roc (subscriber, #30627) [Link]

Those are certainly encouraging results.

Meaningful comparison of CTF and DWARF

Posted Aug 6, 2019 21:32 UTC (Tue) by roc (subscriber, #30627) [Link] (11 responses)

It would be helpful for someone to explain why CTF is more compact than DWARF at representing the same information.

The article touches on that, but not in a way that makes sense to me:
> a DWARF entry is essentially a program in its own right that can be run to generate the needed information. That makes DWARF flexible, but it's also complicated and verbose;

DWARF is complicated and verbose, but DWARF location expressions (the Turing-complete part) are not the problem, because in practice almost all of them are very simple. Also, most location expressions describe the locations of local variables, which CTF completely ignores.

Typically a lot of DWARF data is present to support stack walking of code without frame pointers, and to walk stacks of inlined functions. (Given CTF backtrace support is "still being designed", that makes size comparisons currently not very meaningful.) A very important question is, if you configured a compiler to emit just enough DWARF to represent what CTF represents today, or after backtrace support is implemented in CTF, how would the DWARF compare to CTF in size, and what would be the cause(s) of that?

The article also says
> The CTF data is naturally smaller, but the format also includes compression to reduce the size requirements further.

DWARF supports compression too --- in at least three different ways, unfortunately --- so no advantage for CTF there.

> The linker then has to take these sections and merge them into a larger section, removing any duplicate information. There is a potential problem, though, in that different object files may define conflicting objects using the same names.

This could be a relevant difference. Today's linkers aren't merging duplicate DWARF information across object files. However, compilers use tricks to try to reduce the duplicate information they emit into object files, and there are tools like dwz to merge duplicate DWARF information after linking. So comparing CTF sizes vs DWARF sizes should include a comparison with dwz-processed DWARF. (Also note that there has been significant resistance in LLD at least to proposals to implement DWARF deduplication in the linker, so it's not a given that this approach will actually become ubiquitous.)

From reading https://gcc.gnu.org/ml/gcc-patches/2019-06/msg01711.html I see that CTF can reference strings in the main ELF symbol table. That's a good idea but it's not clear it's responsible for big size wins (and DWARF could easily be extended to do that).

The fact that CTF doesn't support C++ yet is a problem. In my experience DWARF size for C++ and Rust are a much bigger problem than for C because of their much more complex type systems and much larger mangled identifiers, so comparisons of DWARF vs CTF for those languages are more important. Also those languages may well require a significant increase in complexity of CTF, so it seems to me it would make sense to make sure C++ is supported before promulgating a version of CTF that tools are expected to keep compatibility with.

For C++ and other reasons the dream that CTF will be small enough to be "always present" may founder. The relevant comparison is the size of CTF vs the size of the rest of the ELF binary; it seems likely to me that there will always be people who see an advantage in stripping all kinds of debug info. Identifying isomorphic type subgraphs for merging is likely to slow down the link, and that too will encourage people to disable CTF.

Don't get me wrong, I think DWARF is a terrible format that could be greatly improved in compactness and in other ways, and it would be great if CTF can be that, but as a debugger developer the last thing I want is an XKCD 927 situation. In particular it seems that if we get an ecosystem of tools that depend on CTF, then for tasks like debugging with local variable values which are explicitly outside the scope of CTF, we'll have to build and link binaries with both DWARF and CTF, which doesn't sound good.

I apologize if the issues I've raised are answered in online documentation; I searched, but other than ctfout.h I couldn't find the talk or other relevant information.

Meaningful comparison of CTF and DWARF

Posted Aug 6, 2019 23:39 UTC (Tue) by nix (subscriber, #2304) [Link] (10 responses)

What an excellent comment! This sort of thing is why I subscribe to LWN :)

A very important question is, if you configured a compiler to emit just enough DWARF to represent what CTF represents today, or after backtrace support is implemented in CTF, how would the DWARF compare to CTF in size, and what would be the cause(s) of that?

That's a very interesting question. I'm not aware of any way to configure or indeed modify any extant compiler to do that without fairly major surgery, alas :(

DWARF supports compression too --- in at least three different ways, unfortunately --- so no advantage for CTF there.

Actually, there is. CTF has been designed with compression in mind from the start, and a number of its design decisions are aimed to increase compressibility: e.g. the almost complete eschewing of self-identifiers (in favour of implied identifiers via array offsets) and self-description. Self-identifiers are killers for compression ratios because they are all unique, and that means incompressible.

I see that CTF can reference strings in the main ELF symbol table. That's a good idea but it's not clear it's responsible for big size wins

The size wins are mostly that function and data symbols don't need their names to be repeated. The remaining space in CTF string tables is mostly full of structure member names, which do indeed rarely appear in the ELF strtab. We save a bit more space by sorting the strtab. I tried saving more space by slicing CTF strtab entries up on case changes and underscores (combining them back where every user of any of those chunks used all of them) and pointing to those chunks of strings via a "strtab chunks table". It saved about 50--80% of the strtab space... when uncompressed. When compressed, it cost about 10%. So I ripped it out again. Modern compressors are good. But it is possible that when really long identifiers are involved, as in C++, this chunking trick may work better. I still have the code and can try that out when the time comes. Perhaps we can save space by compressing all of the CTF except for the table of chunk indexes, so as not to pollute the compression dictionary...

Identifying isomorphic type subgraphs for merging is likely to slow down the link

The deduplicator I had before was very slow, but I think I now have an algorithm that is O(n) in the number of nodes, i.e. not too bad. (But that remains to be seen, since it's not written yet and the algorithm may be broken). We will soon be able to use multiple threads in GNU ld too, which should help as well (the algorithm is parallelizable).

In particular it seems that if we get an ecosystem of tools that depend on CTF, then for tasks like debugging with local variable values which are explicitly outside the scope of CTF, we'll have to build and link binaries with both DWARF and CTF, which doesn't sound good.

It is very much our intention to collaborate with the rest of the debugging ecosystem. Right now CTF isn't interacting with any of these things because it is very new, consists entirely of sharp edges and it is evolving rapidly. But there has already been internal discussion about where to put CTF in the future once it settles down a bit, and DWARF has obviously come up as a useful umbrella project. This LWN article is arguably part of that: increasing awareness...

We could perhaps have CTF augment DWARF in some future spec revision, so DWARF could optionally drop some of its type representations when CTF is present and use CTF's instead, much as CTF drops some of its strings and uses the ELF strtab's instead: having DWARF augment CTF is probably impractical because it is usually not present. (A future version of CTF, obviously. This version is definitely transitional: as usual, when you start to upstream things, you get really good review comments from all sorts of people, and those trigger changes...)

Meaningful comparison of CTF and DWARF

Posted Aug 7, 2019 0:03 UTC (Wed) by roc (subscriber, #30627) [Link] (9 responses)

> That's a very interesting question. I'm not aware of any way to configure or indeed modify any extant compiler to do that without fairly major surgery, alas :(

I think it would not be very difficult to prototype a DWARF post-processing tool to do this. (If it was very difficult, that would be interesting!)

I really think this ought to be done for size comparisons to be meaningful. If it were possible to make some simple changes to DWARF to reduce its space usage, and configure DWARF producers emit a subset of DWARF that covers the functionality of CTF with not much more size, I think there would be a compelling argument to just do that instead of CTF.

> CTF has been designed with compression in mind from the start, and a number of its design decisions are aimed to increase compressibility: e.g. the almost complete eschewing of self-identifiers

Can you identify the specific DWARF design decisions that make it compress poorly?

> The deduplicator I had before was very slow, but I think I now have an algorithm that is O(n) in the number of nodes, i.e. not too bad.

Does it handle recursive data structures? I actually implemented this for a C static analysis tool many many years ago, and it was pretty hard to find all possible duplicates efficiently. To find all possible duplicates, you pretty much have to start by optimistically assuming all types with the same name are equivalent and then split those equivalence classes when you find they aren't, cascading the splits backwards along your type membership graph.

You can of course give up on finding all possible duplicates, but that would raise the question of how well your deduplicator works on C++ and complex applications.

Of course, a deduplicator for DWARF would face the same issues. I wonder what dwz does...

> We will soon be able to use multiple threads in GNU ld too, which should help as well (the algorithm is parallelizable).

Good. Though, when measuring overhead, it would be more meaningful in the context of a linker that's actually fast, e.g. LLD rather than GNU ld.

> We could perhaps have CTF augment DWARF in some future spec revision, so DWARF could optionally drop some of its type representations when CTF is present and use CTF's instead

That sounds pretty complicated unfortunately.

Meaningful comparison of CTF and DWARF

Posted Aug 7, 2019 12:33 UTC (Wed) by nix (subscriber, #2304) [Link] (8 responses)

> I really think this ought to be done for size comparisons to be meaningful.

I agree! Anyone want to try? :)

> If it were possible to make some simple changes to DWARF to reduce its space usage, and configure DWARF producers emit a subset of DWARF that covers the functionality of CTF with not much more size, I think there would be a compelling argument to just do that instead of CTF.

I'd think we'd need simpler ways to read it as well: better libraries that do what libctf does, but for DWARF. libdwfl is very nice but it doesn't give you a view of the C language's type system, or any language's type system: it gives you a view of DWARF, and you have to dig around in the DWARF spec yourself to figure out how to interpret that. Even for the simple cases with no interpreter this is fiddly and often has spec-version variation (the various tools I wrote never gained DWARF 4 type signature support, for instance, because it was just too much work that gained me *nothing*, just running to stay in place).

And when you get to questions that honestly in most languages usually have really simple answers like "what is the offset of this structure member?" either you suddenly need a full DWARF exprloc interpreter, which libdwfl did *not* provide last I saw, or (far more commonly) one just looks to see what one's personal compiler usually emits for offsets and jams in an expectation that that is there. So now you have a tool that only works for some DWARF versions and only for the output of some compilers. And that is the *norm* for programs trying to use DWARF to do type introspection that aren't dedicated full-featured debuggers written by people who've been immersed in this stuff for years. I can't imagine doing it in under a thousand lines, even with libdwfl's help.

It should not be this hard! The actual norm is that people don't bother trying to introspect into the C type system because it's too difficult if they have to use DWARF, and resort to preprocessor kludges or shared headers or out-of-band communication methods, none of which do remotely as good a job as a nice compact easy-to-read type description would do. CTF is not *just* about compactness. It's about letting normal mortals get useful results out, too.

(And, yes, obviously adding toolchain support for CTF means that CTF only works for the output of compilers with that support, too. It is possible to convert DWARF to CTF as well, and there are multiple tools that do it: but this does mean that you need to generate DWARF first, and if you don't otherwise want it this slows down the link really quite a lot, *on top* of any time hit you take to deduplicate the resulting CTF.)

Meaningful comparison of CTF and DWARF

Posted Aug 7, 2019 22:44 UTC (Wed) by roc (subscriber, #30627) [Link] (3 responses)

> And when you get to questions that honestly in most languages usually have really simple answers like "what is the offset of this structure member?"

I agree implementing a full DWARF location expression interpreter for field offsets is sad. AFAIK the only case in practice where non-constant location expressions are used is C++ virtual base classes. That raises the question: will CTF support C++ virtual base classes? And if so, then how?

If I were you I would simply add C++ virtual base classes as a feature to CTF and ensure CTF encodes enough information for tools to find the VBC offsets at runtime.

But of course we could also effectively retrofit that approach to DWARF by adding a requirement that member location expressions have restricted forms, and ensuring DWARF producers follow that requirement.

> So now you have a tool that only works for some DWARF versions and only for the output of some compilers

It's certainly a problem if consumers independently come up with their own definitions of the allowed forms of member locations. If consumers work together to standardize a single definition, it's no worse than defining a new format or a new version of an existing format. In fact, it's better, assuming you standardize a definition that matches what tools already do.

Meaningful comparison of CTF and DWARF

Posted Aug 9, 2019 11:40 UTC (Fri) by nix (subscriber, #2304) [Link] (2 responses)

> I agree implementing a full DWARF location expression interpreter for field offsets is sad. AFAIK the only case in practice where non-constant location expressions are used is C++ virtual base classes. That raises the question: will CTF support C++ virtual base classes? And if so, then how?

The killer here for me is 'in practice'. That, again, means you are depending on what compilers happen to do now, with no guarantee at all that this will remain true in the future.

CTF doesn't support C++ yet. When it does, I won't consider it done until it supports all of it, even the tricky parts. The one bit I am wondering if I can somehow avoid having to do is ctf_lookup_by_name(), which takes a sort of crippled cut-down C identifier and parses it to yield the terminal type node you're interested in. Doing that with C++ name lookup rules seems likely to be a flaming nightmare... but then maybe the rules don't need to be perfect: they're already very far from perfect even for C and nobody ever noticed in the 17-year lifespan of libctf so far. A lot of programs seem likely to do a lot of type node navigation without wanting libctf to help with parsing anyway, taking the names of individual types and parsing them themselves rather than trusting libctf to do it.

Meaningful comparison of CTF and DWARF

Posted Aug 9, 2019 12:03 UTC (Fri) by roc (subscriber, #30627) [Link] (1 responses)

> That, again, means you are depending on what compilers happen to do now, with no guarantee at all that this will remain true in the future.

We could guarantee this will remain true in the future by promulgating and enforcing a rule saying it should. This is no different in principle to promulgating rules about how CTF should be implemented in tools. In fact it's easier, because current code doesn't have to change.

> When it does, I won't consider it done until it supports all of it, even the tricky parts.

Then I think you ought to do it now, because it will impact the design of CTF quite a bit.

For example, the actual worst thing about virtual base classes is that "what is the offset of this field within this type?" is no longer statically knowable. It depends on inspecting the run-time value of the object. This will have significant impact on your library API.

Meaningful comparison of CTF and DWARF

Posted Aug 9, 2019 22:47 UTC (Fri) by nix (subscriber, #2304) [Link]

I think getting a spec working, getting the linker stuff in etc is far more important, and much faster to do, than the enormous job that is adding C++ support, sorry. It's on the todo list, but it is definitely *not* at the top.

(The library API will need to grow for C++ support in any case.)

Meaningful comparison of CTF and DWARF

Posted Aug 7, 2019 23:00 UTC (Wed) by roc (subscriber, #30627) [Link] (3 responses)

> I agree! Anyone want to try? :)

I would, but no-one's going to pay me to do it, so I probably won't. It's also perhaps not the right time for someone to do it, since CTF is a moving target with backtraces and C++ etc not being supported yet.

I'm quite worried about how this situation is going to evolve over time, especially for situations where full debuginfo is needed. You have said that somehow DWARF could/should be extended so that it can reuse the information contained in the CTF section. That sounds hard and complicated. It also sounds brittle for the future, because if successful, the scope of CTF is likely to increase. Users will find that CTF fits their needs except they just need one more bit of information that's present in DWARF and not CTF, and you will face enormous pressure to add that one little feature, because that's so much easier for those CTF users than switching to full DWARF. As the scope of CTF grows, the overlap with DWARF will grow and the size penalty will grow. I guess you could evolve the "DWARF reusing CTF" spec in lockstep but that would be a big burden on the ecosystem and likely won't happen since the CTF maintainers presumably don't much care about DWARF.

Meaningful comparison of CTF and DWARF

Posted Aug 9, 2019 11:45 UTC (Fri) by nix (subscriber, #2304) [Link] (2 responses)

What you say is true: scope creep is a problem in all software systems. It's just something to watch out for, I suppose. But note that CTF has an internal section layout system, so even if it does creep like hell there won't be a size penalty if the extra bits are in sections which are optional (there are many optional sections already). There would still be a complexity penalty, though, which matters a great deal to me.

I do want to keep CTF tightly focused on type introspection. The backtrace stuff is going to be in an optional CTF section and will only be generated if you ask for it: it hasn't had any implications for other, existing parts of CTF in any of my various half-designs so far. It just uses them. It could potentially be an entirely separate library, generating a separate ELF section, though given how painful it is to add more ELF sections I'm very strongly inclined to keep it inside the .ctf ELF section in a new CTF section instead, so the ELF machinery need not know it is there.

> and likely won't happen since the CTF maintainers presumably don't much care about DWARF

Well, I *am* the CTF maintainer right now and I do care about DWARF. :) It's hard to avoid caring about it since if you want to do debugging of anything with, say, scopes in it, DWARF is the right thing, not CTF.

Meaningful comparison of CTF and DWARF

Posted Aug 14, 2019 18:35 UTC (Wed) by khim (subscriber, #9252) [Link] (1 responses)

> I do want to keep CTF tightly focused on type introspection.

BTW, if it's about introspection... would it be possible to know if a given structure should be passed on stack or in registers? I mean something like this: compare foo, bar and baz. Especially bar and baz. These types have more-or-less identical properties... yet one is passed (and returned!) using registers and other is passed (and returned!) on stack.

Meaningful comparison of CTF and DWARF

Posted Aug 14, 2019 19:18 UTC (Wed) by excors (subscriber, #95769) [Link]

Looks like that's defined by http://itanium-cxx-abi.github.io/cxx-abi/abi.html#value-p... ? Working out whether a type is non-trivial seems non-trivial, though presumably it's easy for the C++ compiler, so it sounds like useful information to expose.

The Compact C Type Format in the GNU toolchain

Posted Aug 6, 2019 21:38 UTC (Tue) by dw (subscriber, #12017) [Link] (1 responses)

Can anyone explain the relationship between this work and the (AFAIK still presently ongoing) work to add BTF support to the kernel? It seems they duplicate each other. As I understand it, in FreeBSD and Solaris CTF serves both roles that we appear to be ending up with two very similar syntaxes for in Linux, depending on whether you are inspecting userspace or the kernel.

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 0:13 UTC (Wed) by nix (subscriber, #2304) [Link]

I don't think they are really duplicates. BTF is describing the BPF type system: maps maps maps. CTF is describing the C type system: it can already handle floats, for instance, and things like 'restrict', and I am definitely going to add everything in GNU C, including things like vector and fixed-point types.

(And, honestly, I see no reason why we can't translate one into the other anyway. If BTF can be translated from DWARF, I'm sure I can write a CTF->BTF translator :) ).

The Compact C Type Format in the GNU toolchain

Posted Jul 11, 2020 9:40 UTC (Sat) by nix (subscriber, #2304) [Link] (1 responses)

After a lot more false starts than I'd hoped, I posted the deduplicating linker upstream a week or so ago: <https://sourceware.org/pipermail/binutils/2020-June/11201...>. Git tree here: <https://github.com/oracle/binutils-gdb/commits/oracle/und...>. Compiler here: <https://github.com/oracle/gcc/tree/oracle/ctf-gen> (though one patch is needed on top of that which we'll be pushing soon, to get all types rather than just types with live global decls).

It's pretty fast and deduplicates pretty well :) except for GhostScript... I know how to fix that but it needs a file format rev, I think. I'll get to it :)

The Compact C Type Format in the GNU toolchain

Posted Sep 16, 2020 16:01 UTC (Wed) by nix (subscriber, #2304) [Link]

As another note, the deduplicating CTF linker was merged into binutils trunk a month or so back. (I should have noted it here back then, but forgot.)