|
|
Subscribe / Log in / New account

The Compact C Type Format in the GNU toolchain

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 0:40 UTC (Wed) by roc (subscriber, #30627)
In reply to: The Compact C Type Format in the GNU toolchain by nix
Parent article: The Compact C Type Format in the GNU toolchain

You're right, that's fair.

However, you are pretending it is *needed* when for most binaries it currently is not.


to post comments

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 10:52 UTC (Wed) by nix (subscriber, #2304) [Link] (11 responses)

I'm not talking about *now* -- and I'm also not suggesting that most binaries should be built with this. They probably shouldn't except if the distro knows it has tools that can use CTF on arbitrary programs (that's why you need a non-default compiler option, -gt, to turn CTF emission on), or unless the tool uses CTF to introspect itself. What I'm suggesting is that if you *have* built something with it, you probably did so because you *need* it -- and if you strip the CTF out of the binary there are literally zero libraries in existence that will know how to get at it unless you explicitly specify the path to the stripped-out CTF. And why would anyone compile with CTF only to render it immediately useless?

I have... painful experiences here. Back when we were converting DWARF to CTF at kernel link time and linking it into kernel modules, we had to actually *hack RPM at build time* via PATH shuffling and patching of /usr/lib/rpm/find-debuginfo.sh to even make it possible for RPM to not just strip out all non-loaded sections on the grounds that they must be unnecessary, no matter what size they were or whether RPM had never seen them before, including ripping all the CTF that we'd just gone to some lengths to link in.

To me that just seems like unwise behaviour on the part of a packaging system. RPM didn't know what that section was: why was it removing it? It might have been necessary. It *was* necessary for what we were doing, and RPM just removed it without so much as a by-your-leave. So... guess why strip(1) doesn't remove CTF? I don't want anyone who's actually using CTF to have to go through anything like that again just so they can package their own software without it being randomly broken by the packaging system.

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 11:45 UTC (Wed) by roc (subscriber, #30627) [Link] (1 responses)

That seems like a reasonable argument.

But it also seems like that would apply to DWARF debuginfo too. Why ask the compiler to generate DWARF if you're going to strip it out? Yet here we are.

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 12:19 UTC (Wed) by nix (subscriber, #2304) [Link]

I can only guess how this became the default.

I'd guess that debuginfo, in a world where debuggers are a special thing that is explicitly run by human beings when things go wrong, is something huge that is only *needed* when things go wrong, when there will be a human around who can install the necessary big packages. But you never want to compile something without any debug info for use in a production environment because if things go wrong you then have no debuginfo to use to diagnose it! So -g -O2 has become a sort of de facto standard for CFLAGS.

Of course the "you only need it when things go wrong" attitude has now been retarding the development of always-on systemwide debugging tools for something like fifteen years; but nobody wants to add extensive debuginfo shrinking machinery because it will slow down the link for something that is only rarely needed. It seems to me that the only *reason* debuginfo is only rarely needed is that tools that use debuginfo routinely cannot be developed because it can never be relied on to be present, because it is too big... it's a vicious circle.

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 12:19 UTC (Wed) by mjw (subscriber, #16740) [Link] (8 responses)

> I have... painful experiences here. Back when we were converting DWARF to CTF at kernel link time and linking it into kernel modules, we had to actually *hack RPM at build time* via PATH shuffling and patching of /usr/lib/rpm/find-debuginfo.sh to even make it possible for RPM to not just strip out all non-loaded sections on the grounds that they must be unnecessary, no matter what size they were or whether RPM had never seen them before, including ripping all the CTF that we'd just gone to some lengths to link in.

> To me that just seems like unwise behaviour on the part of a packaging system. RPM didn't know what that section was: why was it removing it? It might have been necessary. It *was* necessary for what we were doing, and RPM just removed it without so much as a by-your-leave.

I might be responsible for that. But it is simply that RPM follows normal ELF rules for stripping [*] (unless you give define macros to give find-debuginfo.sh additional arguments [**]). In general any non-allocated section can be stripped away (or put into a separate .debug file). Because that simply means that the section isn't needed at runtime.

> So... guess why strip(1) doesn't remove CTF? I don't want anyone who's actually using CTF to have to go through anything like that again just so they can package their own software without it being randomly broken by the packaging system.

Sorry, RPM uses elfutils eu-strip, which will not have special magic to treat .ctf sections specially.

But I do like CTF and I do hope it will become the default one day. Not to replace DWARF (it should be a companion to that), but to replace .gnu_debugdata [***]. Which is used by various tools now to have an "extra symbol table".

So lets talk how to integrate this with RPM/elfutils/systemtap/etc. Maybe on the elfutils and/or binutils mailinglist?

[*] http://www.linker-aliens.org/blogs/ali/entry/how_to_strip...
[**] https://gnu.wildebeest.org/blog/mjw/2017/06/30/fedora-rpm...
[***] https://fedoraproject.org/wiki/Features/MiniDebugInfo

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 21:48 UTC (Wed) by nix (subscriber, #2304) [Link]

In general any non-allocated section can be stripped away (or put into a separate .debug file). Because that simply means that the section isn't needed at runtime.
Well, it means it isn't needed by the executable loader. I was *forced* to make libctf non-loadable by internal constraints in ld (roughly, that you cannot simultaneously have an allocated section whose size is not known before bfd_elf_final_link() and that the symtab and strtab are not laid out until halfway through that function and that CTF needs to know the offsets of all strings in the strtab and the order of symbols, *and* it's compressed so its content affects its size: so by extension the section may not be allocated). That doesn't mean it's not going to be used by programs at runtime. It is. (Well, assuming anyone uses it at all. :) ). They load it out of the binary as needed using BFD.

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 21:55 UTC (Wed) by nix (subscriber, #2304) [Link] (6 responses)

Sorry, RPM uses elfutils eu-strip, which will not have special magic to treat .ctf sections specially.
I guess that means CTF will always be stripped out of RPMs, and libctf and the CTF format by extension will be useless on RPM systems. This seems unfortunate. Is it really so hard to mark .ctf sections as not stripped? If it takes more than a couple of lines, something seems to me to be wrong.

(This is not the only tool missing support right now, of course: gold can't link CTF sections either. But I plan to add that and I did also plan to submit changes to elfutils to stop eu-strip throwing the section out. I'm rather unhappy to discover that this is pre-emptively rejected.)

But I do like CTF and I do hope it will become the default one day. Not to replace DWARF (it should be a companion to that), but to replace .gnu_debugdata [***]. Which is used by various tools now to have an "extra symbol table".
That won't work, I'm afraid. CTF does not contain a symbol table, since that would be a waste of space since ELF already has one. Instead, it relies on the ELF symtab. Its function and data object sections are 1:1 ordered in the same order as the ELF symtab (basically, you traverse the ELF symtab and whenever you pass another function symbol, you match it to a function info section entry: whenever you pass another data symbol, you match it to another data object section entry). This saves quite a lot of space: data object section entries in particular are only four bytes each (one type ID).

(To deal with the problems of dynamic symbol tables getting stripped out of binaries, Solaris defined .ldynsym, which appears to be much what .gnu_debugdata is, only it's just a symbol table rather than a whole LZMA-compressed ELF object containing a symbol table.)

Plus of course there's not much chance of CTF becoming the default if you insist on stripping it out of executables so nothing that needs it can ever find it. ;)

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 22:28 UTC (Wed) by mjw (subscriber, #16740) [Link] (5 responses)

> I guess that means CTF will always be stripped out of RPMs, and libctf and the CTF format by extension will be useless on RPM systems. This seems unfortunate. Is it really so hard to mark .ctf sections as not stripped? If it takes more than a couple of lines, something seems to me to be wrong.

It shouldn't be hard to keep it, if a package or distro decides that is the thing they want.
For example rust packages do something like:
%global _find_debuginfo_opts --keep-section .rustc

So all we need to do is define some macro that packages can set for find-debuginfo.sh to do "the right thing" and then a package or distro can decide to make that the default.

> I did also plan to submit changes to elfutils to stop eu-strip throwing the section out. I'm rather unhappy to discover that this is pre-emptively rejected.

That is not my intention. Note that I am a not a native English speaker. My apologies if I seem to come over as negative.

> CTF does not contain a symbol table, since that would be a waste of space since ELF already has one. Instead, it relies on the ELF symtab. Its function and data object sections are 1:1 ordered in the same order as the ELF symtab (basically, you traverse the ELF symtab and whenever you pass another function symbol, you match it to a function info section entry: whenever you pass another data symbol, you match it to another data object section entry). This saves quite a lot of space: data object section entries in particular are only four bytes each (one type ID).
>
> (To deal with the problems of dynamic symbol tables getting stripped out of binaries, Solaris defined .ldynsym, which appears to be much what .gnu_debugdata is, only it's just a symbol table rather than a whole LZMA-compressed ELF object containing a symbol table.)

OK. So how do you deal with .symtab being stripped away by default then?
Would it be an idea to adopt the .ldynsym from Solaris?
.gnu_debugdata was defined before we had compressed ELF sections in the standard.
Now that we have it maybe we should make .symtab a compressed section?
https://gnu.wildebeest.org/blog/mjw/2016/01/13/elf-libelf...

> Plus of course there's not much chance of CTF becoming the default if you insist on stripping it out of executables so nothing that needs it can ever find it. ;)

Really, I don't understand why you think that is my intention. I might not fully understand all details yet. But I am actually interested in making CTF into something useful.

Will you be at the GNU Tools Cauldron in Montréal, Canada next week?
It might be easier to talk some ideas over in person.
https://gcc.gnu.org/wiki/cauldron2019

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 22:32 UTC (Wed) by mjw (subscriber, #16740) [Link] (1 responses)

> Will you be at the GNU Tools Cauldron in Montréal, Canada next week?

Sorry, next month. (September 12 to 15, 2019)

The Compact C Type Format in the GNU toolchain

Posted Aug 9, 2019 11:33 UTC (Fri) by nix (subscriber, #2304) [Link]

Ah. I thought the timing you gave was strange for the Cauldron! That *overlaps* with LPC so in the absence of a teleporter or military jet to get from Lisbon to Montreal in no time... :/

The Compact C Type Format in the GNU toolchain

Posted Aug 9, 2019 0:51 UTC (Fri) by himi (subscriber, #340) [Link] (1 responses)

Sorry, RPM uses elfutils eu-strip, which will not have special magic to treat .ctf sections specially.
This can be read as meaning "no version of eu-strip will ever have the special magic", rather than what I believe you meant: "any eu-strip you find in the world right now will not have the necessary special magic".

The Compact C Type Format in the GNU toolchain

Posted Aug 9, 2019 11:21 UTC (Fri) by nix (subscriber, #2304) [Link]

Oh. Yes, that is how I parsed it. I completely understand that *existing* strip tools will strip this out. This just means that if you upgrade binutils to a CTF-generating version, you'd have to upgrade elfutils to one that doesn't strip it out as well. Given that you're already stuck having to upgrade the compiler in synchrony too to get this stuff to work, adding one extra package doesn't sound like an intolerable administrative burden. (I hope.)

The Compact C Type Format in the GNU toolchain

Posted Aug 9, 2019 11:32 UTC (Fri) by nix (subscriber, #2304) [Link]

> %global _find_debuginfo_opts --keep-section .rustc

... now why didn't I think of that? Probably because when I was doing this back when I had *multiple* sections to deal with, with names like .ctf.*, so telling other things what the sections were called this time would have been quite painful. We have our own internal container format now precisely to avoid this sort of problem, so we could use this quite well.

> That is not my intention. Note that I am a not a native English speaker. My apologies if I seem to come over as negative.

Sorry, I completely misparsed your sentence! (See my comment a couple of hops down). Phew, that had me panicking a bit for a moment. :)

> Now that we have it maybe we should make .symtab a compressed section?

Compressed sections in GNU ld at least seem a bit ad hoc. I think you'd need to do quite a lot of work to bfd_elf_final_link and environs to make it possible to have allocated sections that other sections depend upon that are also compressed: every existing section with content that changes after layout time (earlier in bfd_elf_final_link than strtab / symtab layout time) either has unchanging size or is non-allocated (and even there, there are fairly dreadful hacks around .zdebug, which I'm afraid I made a little bit worse with .ctf :) ).

I'm not sure .symtab would compress terribly well, either -- it has a lot of fields with "ID-like" content that only appears once, and thus compresses rather badly. (CTF goes to some lengths to avoid content like this for just that reason). The strtab would certainly compress better, but I can see why you don't compress it -- you don't want to impose decompression costs on the whole strtab on every execution.

> Really, I don't understand why you think that is my intention.

A really terrible misparsing of an ambiguous sentence on my part, and you know how hard it is to find alternate meanings of a sentence when you've already fixated on one that is dreadful :) Sorry!

> Will you be at the GNU Tools Cauldron in Montréal, Canada next week?

Alas, I'm going to LPC instead. I'm spending the next two weeks listening to chamber music in the North York moors...


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds