The Compact C Type Format in the GNU toolchain

Posted Aug 6, 2019 23:02 UTC (Tue) by roc (subscriber, #30627)
In reply to: The Compact C Type Format in the GNU toolchain by nix
Parent article: The Compact C Type Format in the GNU toolchain

Indeed, almost every binary containing .eh_frame and .dynsym and RTTI breaks if they're not present so it almost never makes sense to strip them, and it never has.

OTOH it will be a long time, if ever, before almost every binary requires CTF to be present. In the meantime people will want to strip CTF and you won't be able to stop tools adding support for that and people configuring their builds to do so by default.

The Compact C Type Format in the GNU toolchain

Posted Aug 6, 2019 23:54 UTC (Tue) by nix (subscriber, #2304) [Link] (34 responses)

objcopy works. However, frankly, my attitude to people who try to rip random things out of binaries to save a few bytes in any but the most extreme embedded environments these days is to wonder if they're stuck in the 80s. Even if every binary in /usr/bin was a gigabyte in size we would *still* have huge oceans of untouched space on most current disks.

Fundamentally, there's a *reason* strip(1) doesn't strip CTF by default: it should hardly save any space and it rips out something that offers facilities not otherwise available. The format will be useless if it's stripped out routinely, and it should be small enough that *most* people don't bother. People who need to hunt for every last byte and are willing to use obscure options to do so probably both have a reason and are used to coping with the resulting breakage. (It will certainly break Objective Caml programs to strip out non-loaded sections that you don't recognise, for instance.)

(However... if you really want separated debugging information, we *do* have a CTF archive format that is specifically intended for sticking big piles of CTF into for later mmapping out. If people really want separated debug info, we could in theory arrange to dump all the CTF on the system into a .ctfa, and remove items from the CTFA on package uninstallation, and have libctf know to look there to pick it up -- or just look in /usr/lib/ctf/ -- a tree like /usr/lib/debug/ -- or whatever. My worry is that if you did that, people would soon say oh let's put it in a separate package! And now it's never present and it's useless. Having CTF in a separate file is not really a problem, though it doesn't buy you anything that I can see. Having it in a separate *package* that is not installed when the package is... that's a problem. That's what makes life so hard for systemwide debuggers now: the DWARF is never there.)

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 0:11 UTC (Wed) by roc (subscriber, #30627) [Link] (33 responses)

I think it's much more important for your goals to convince distro vendors to cooperate with you than to play tricks with header flags pretending CTF is not really debug info.

> That's what makes life so hard for systemwide debuggers now: the DWARF is never there.

It's not super hard to have debuggers automatically fetch and use system debuginfo packages. Pernosco does this. Even Fedora's gdb gets you most of the way there by telling you the command you need to run. We don't need new formats to solve this particular problem. (OK, to tell the truth, there is one other problem that needs to be fixed: you need an archive of all debuginfo for all versions of packages so you can debug the non-latest version of a package. We've built that for Pernosco too.)

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 0:30 UTC (Wed) by nix (subscriber, #2304) [Link] (15 responses)

I think it's much more important for your goals to convince distro vendors to cooperate with you than to play tricks with header flags pretending CTF is not really debug info.

It's not. It's type introspection info. It's no more debug info than C++ RTTI is. Programs can perfectly well introspect their own types without being debuggers in any sense.

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 0:40 UTC (Wed) by roc (subscriber, #30627) [Link] (12 responses)

You're right, that's fair.

However, you are pretending it is *needed* when for most binaries it currently is not.

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 10:52 UTC (Wed) by nix (subscriber, #2304) [Link] (11 responses)

I'm not talking about *now* -- and I'm also not suggesting that most binaries should be built with this. They probably shouldn't except if the distro knows it has tools that can use CTF on arbitrary programs (that's why you need a non-default compiler option, -gt, to turn CTF emission on), or unless the tool uses CTF to introspect itself. What I'm suggesting is that if you *have* built something with it, you probably did so because you *need* it -- and if you strip the CTF out of the binary there are literally zero libraries in existence that will know how to get at it unless you explicitly specify the path to the stripped-out CTF. And why would anyone compile with CTF only to render it immediately useless?

I have... painful experiences here. Back when we were converting DWARF to CTF at kernel link time and linking it into kernel modules, we had to actually *hack RPM at build time* via PATH shuffling and patching of /usr/lib/rpm/find-debuginfo.sh to even make it possible for RPM to not just strip out all non-loaded sections on the grounds that they must be unnecessary, no matter what size they were or whether RPM had never seen them before, including ripping all the CTF that we'd just gone to some lengths to link in.

To me that just seems like unwise behaviour on the part of a packaging system. RPM didn't know what that section was: why was it removing it? It might have been necessary. It *was* necessary for what we were doing, and RPM just removed it without so much as a by-your-leave. So... guess why strip(1) doesn't remove CTF? I don't want anyone who's actually using CTF to have to go through anything like that again just so they can package their own software without it being randomly broken by the packaging system.

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 11:45 UTC (Wed) by roc (subscriber, #30627) [Link] (1 responses)

That seems like a reasonable argument.

But it also seems like that would apply to DWARF debuginfo too. Why ask the compiler to generate DWARF if you're going to strip it out? Yet here we are.

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 12:19 UTC (Wed) by nix (subscriber, #2304) [Link]

I can only guess how this became the default.

I'd guess that debuginfo, in a world where debuggers are a special thing that is explicitly run by human beings when things go wrong, is something huge that is only *needed* when things go wrong, when there will be a human around who can install the necessary big packages. But you never want to compile something without any debug info for use in a production environment because if things go wrong you then have no debuginfo to use to diagnose it! So -g -O2 has become a sort of de facto standard for CFLAGS.

Of course the "you only need it when things go wrong" attitude has now been retarding the development of always-on systemwide debugging tools for something like fifteen years; but nobody wants to add extensive debuginfo shrinking machinery because it will slow down the link for something that is only rarely needed. It seems to me that the only *reason* debuginfo is only rarely needed is that tools that use debuginfo routinely cannot be developed because it can never be relied on to be present, because it is too big... it's a vicious circle.

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 12:19 UTC (Wed) by mjw (subscriber, #16740) [Link] (8 responses)

> I have... painful experiences here. Back when we were converting DWARF to CTF at kernel link time and linking it into kernel modules, we had to actually *hack RPM at build time* via PATH shuffling and patching of /usr/lib/rpm/find-debuginfo.sh to even make it possible for RPM to not just strip out all non-loaded sections on the grounds that they must be unnecessary, no matter what size they were or whether RPM had never seen them before, including ripping all the CTF that we'd just gone to some lengths to link in.

> To me that just seems like unwise behaviour on the part of a packaging system. RPM didn't know what that section was: why was it removing it? It might have been necessary. It *was* necessary for what we were doing, and RPM just removed it without so much as a by-your-leave.

I might be responsible for that. But it is simply that RPM follows normal ELF rules for stripping [*] (unless you give define macros to give find-debuginfo.sh additional arguments [**]). In general any non-allocated section can be stripped away (or put into a separate .debug file). Because that simply means that the section isn't needed at runtime.

> So... guess why strip(1) doesn't remove CTF? I don't want anyone who's actually using CTF to have to go through anything like that again just so they can package their own software without it being randomly broken by the packaging system.

Sorry, RPM uses elfutils eu-strip, which will not have special magic to treat .ctf sections specially.

But I do like CTF and I do hope it will become the default one day. Not to replace DWARF (it should be a companion to that), but to replace .gnu_debugdata [***]. Which is used by various tools now to have an "extra symbol table".

So lets talk how to integrate this with RPM/elfutils/systemtap/etc. Maybe on the elfutils and/or binutils mailinglist?

[*] http://www.linker-aliens.org/blogs/ali/entry/how_to_strip...
[**] https://gnu.wildebeest.org/blog/mjw/2017/06/30/fedora-rpm...
[***] https://fedoraproject.org/wiki/Features/MiniDebugInfo

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 21:48 UTC (Wed) by nix (subscriber, #2304) [Link]

In general any non-allocated section can be stripped away (or put into a separate .debug file). Because that simply means that the section isn't needed at runtime.

Well, it means it isn't needed by the executable loader. I was *forced* to make libctf non-loadable by internal constraints in ld (roughly, that you cannot simultaneously have an allocated section whose size is not known before bfd_elf_final_link() and that the symtab and strtab are not laid out until halfway through that function and that CTF needs to know the offsets of all strings in the strtab and the order of symbols, *and* it's compressed so its content affects its size: so by extension the section may not be allocated). That doesn't mean it's not going to be used by programs at runtime. It is. (Well, assuming anyone uses it at all. :) ). They load it out of the binary as needed using BFD.

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 21:55 UTC (Wed) by nix (subscriber, #2304) [Link] (6 responses)

Sorry, RPM uses elfutils eu-strip, which will not have special magic to treat .ctf sections specially.

I guess that means CTF will always be stripped out of RPMs, and libctf and the CTF format by extension will be useless on RPM systems. This seems unfortunate. Is it really so hard to mark .ctf sections as not stripped? If it takes more than a couple of lines, something seems to me to be wrong.

(This is not the only tool missing support right now, of course: gold can't link CTF sections either. But I plan to add that and I did also plan to submit changes to elfutils to stop eu-strip throwing the section out. I'm rather unhappy to discover that this is pre-emptively rejected.)

But I do like CTF and I do hope it will become the default one day. Not to replace DWARF (it should be a companion to that), but to replace .gnu_debugdata [***]. Which is used by various tools now to have an "extra symbol table".

That won't work, I'm afraid. CTF does not contain a symbol table, since that would be a waste of space since ELF already has one. Instead, it relies on the ELF symtab. Its function and data object sections are 1:1 ordered in the same order as the ELF symtab (basically, you traverse the ELF symtab and whenever you pass another function symbol, you match it to a function info section entry: whenever you pass another data symbol, you match it to another data object section entry). This saves quite a lot of space: data object section entries in particular are only four bytes each (one type ID).

(To deal with the problems of dynamic symbol tables getting stripped out of binaries, Solaris defined .ldynsym, which appears to be much what .gnu_debugdata is, only it's just a symbol table rather than a whole LZMA-compressed ELF object containing a symbol table.)

Plus of course there's not much chance of CTF becoming the default if you insist on stripping it out of executables so nothing that needs it can ever find it. ;)

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 22:28 UTC (Wed) by mjw (subscriber, #16740) [Link] (5 responses)

> I guess that means CTF will always be stripped out of RPMs, and libctf and the CTF format by extension will be useless on RPM systems. This seems unfortunate. Is it really so hard to mark .ctf sections as not stripped? If it takes more than a couple of lines, something seems to me to be wrong.

It shouldn't be hard to keep it, if a package or distro decides that is the thing they want.
For example rust packages do something like:
%global _find_debuginfo_opts --keep-section .rustc

So all we need to do is define some macro that packages can set for find-debuginfo.sh to do "the right thing" and then a package or distro can decide to make that the default.

> I did also plan to submit changes to elfutils to stop eu-strip throwing the section out. I'm rather unhappy to discover that this is pre-emptively rejected.

That is not my intention. Note that I am a not a native English speaker. My apologies if I seem to come over as negative.

> CTF does not contain a symbol table, since that would be a waste of space since ELF already has one. Instead, it relies on the ELF symtab. Its function and data object sections are 1:1 ordered in the same order as the ELF symtab (basically, you traverse the ELF symtab and whenever you pass another function symbol, you match it to a function info section entry: whenever you pass another data symbol, you match it to another data object section entry). This saves quite a lot of space: data object section entries in particular are only four bytes each (one type ID).
>
> (To deal with the problems of dynamic symbol tables getting stripped out of binaries, Solaris defined .ldynsym, which appears to be much what .gnu_debugdata is, only it's just a symbol table rather than a whole LZMA-compressed ELF object containing a symbol table.)

OK. So how do you deal with .symtab being stripped away by default then?
Would it be an idea to adopt the .ldynsym from Solaris?
.gnu_debugdata was defined before we had compressed ELF sections in the standard.
Now that we have it maybe we should make .symtab a compressed section?
https://gnu.wildebeest.org/blog/mjw/2016/01/13/elf-libelf...

> Plus of course there's not much chance of CTF becoming the default if you insist on stripping it out of executables so nothing that needs it can ever find it. ;)

Really, I don't understand why you think that is my intention. I might not fully understand all details yet. But I am actually interested in making CTF into something useful.

Will you be at the GNU Tools Cauldron in Montréal, Canada next week?
It might be easier to talk some ideas over in person.
https://gcc.gnu.org/wiki/cauldron2019

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 22:32 UTC (Wed) by mjw (subscriber, #16740) [Link] (1 responses)

> Will you be at the GNU Tools Cauldron in Montréal, Canada next week?

Sorry, next month. (September 12 to 15, 2019)

The Compact C Type Format in the GNU toolchain

Posted Aug 9, 2019 11:33 UTC (Fri) by nix (subscriber, #2304) [Link]

Ah. I thought the timing you gave was strange for the Cauldron! That *overlaps* with LPC so in the absence of a teleporter or military jet to get from Lisbon to Montreal in no time... :/

The Compact C Type Format in the GNU toolchain

Posted Aug 9, 2019 0:51 UTC (Fri) by himi (subscriber, #340) [Link] (1 responses)

Sorry, RPM uses elfutils eu-strip, which will not have special magic to treat .ctf sections specially.

This can be read as meaning "no version of eu-strip will ever have the special magic", rather than what I believe you meant: "any eu-strip you find in the world right now will not have the necessary special magic".

The Compact C Type Format in the GNU toolchain

Posted Aug 9, 2019 11:21 UTC (Fri) by nix (subscriber, #2304) [Link]

Oh. Yes, that is how I parsed it. I completely understand that *existing* strip tools will strip this out. This just means that if you upgrade binutils to a CTF-generating version, you'd have to upgrade elfutils to one that doesn't strip it out as well. Given that you're already stuck having to upgrade the compiler in synchrony too to get this stuff to work, adding one extra package doesn't sound like an intolerable administrative burden. (I hope.)

The Compact C Type Format in the GNU toolchain

Posted Aug 9, 2019 11:32 UTC (Fri) by nix (subscriber, #2304) [Link]

> %global _find_debuginfo_opts --keep-section .rustc

... now why didn't I think of that? Probably because when I was doing this back when I had *multiple* sections to deal with, with names like .ctf.*, so telling other things what the sections were called this time would have been quite painful. We have our own internal container format now precisely to avoid this sort of problem, so we could use this quite well.

> That is not my intention. Note that I am a not a native English speaker. My apologies if I seem to come over as negative.

Sorry, I completely misparsed your sentence! (See my comment a couple of hops down). Phew, that had me panicking a bit for a moment. :)

> Now that we have it maybe we should make .symtab a compressed section?

Compressed sections in GNU ld at least seem a bit ad hoc. I think you'd need to do quite a lot of work to bfd_elf_final_link and environs to make it possible to have allocated sections that other sections depend upon that are also compressed: every existing section with content that changes after layout time (earlier in bfd_elf_final_link than strtab / symtab layout time) either has unchanging size or is non-allocated (and even there, there are fairly dreadful hacks around .zdebug, which I'm afraid I made a little bit worse with .ctf :) ).

I'm not sure .symtab would compress terribly well, either -- it has a lot of fields with "ID-like" content that only appears once, and thus compresses rather badly. (CTF goes to some lengths to avoid content like this for just that reason). The strtab would certainly compress better, but I can see why you don't compress it -- you don't want to impose decompression costs on the whole strtab on every execution.

> Really, I don't understand why you think that is my intention.

A really terrible misparsing of an ambiguous sentence on my part, and you know how hard it is to find alternate meanings of a sentence when you've already fixated on one that is dreadful :) Sorry!

> Will you be at the GNU Tools Cauldron in Montréal, Canada next week?

Alas, I'm going to LPC instead. I'm spending the next two weeks listening to chamber music in the North York moors...

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 3:44 UTC (Wed) by josh (subscriber, #17465) [Link] (1 responses)

And today, people regularly compile out C++ RTTI.

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 10:19 UTC (Wed) by khim (subscriber, #9252) [Link]

And THAT is how CTF should be handled, too. If you *reaaaly* don't want it - ask compiler not to produce it. Most developers wouldn't care.

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 15:53 UTC (Wed) by luto (guest, #39314) [Link] (16 responses)

IMO what we need for locating debuginfo is a standard API by which a debugger can ask the distro to locate a debug info file. Distros patches to gdb are annoying, and they also preclude things other than gdb from reliably finding debug info files.

I would love for the kernel to be able to drop something in /usr/lib/debuginfo.d/find_vdso_debuginfo.sh, for example.

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 15:59 UTC (Wed) by fuhchee (guest, #40059) [Link] (15 responses)

Some of us are working on just such a thing, and will present our progress at the GNU Cauldron in Montreal next month.

https://sourceware.org/git/?p=elfutils.git;a=tree;f=dbgse...

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 16:34 UTC (Wed) by luto (guest, #39314) [Link] (3 responses)

I peeked a bit. Will this also handle debuginfo files that are directly installed on the filesystem?

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 16:42 UTC (Wed) by fuhchee (guest, #40059) [Link] (2 responses)

Yup!

% dbgserver -F /path/to/base/directory

should find executables / debuginfo / corresponding sources

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 16:53 UTC (Wed) by luto (guest, #39314) [Link] (1 responses)

How does this ending up working on a normal desktop or server? Is the idea that there would be a systemwide dbgserver instance, perhaps socket activated, or maybe several instances?

As an admin, I would much rather *not* have a systemwide daemon for this, since that implies a path by which one user can attempt to attack another user or the system as a whole. I'd rather if each user could, on demand, start up their own instance of whatever libraries and programs are needed to make debugging work. Nothing here should require any form of privilege.

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 17:47 UTC (Wed) by fuhchee (guest, #40059) [Link]

> Is the idea that there would be a systemwide dbgserver instance, perhaps socket activated, or maybe several instances?

Any of the above.

>As an admin, I would much rather *not* have a systemwide daemon for this, since that implies a path by which one user can attempt to attack another user or the system as a whole.

Fair enough, though DoS is perhaps the worst of the possible attacks.

> Nothing here should require any form of privilege.

Right.

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 18:05 UTC (Wed) by madscientist (subscriber, #16861) [Link] (6 responses)

I hope that consideration of non-packaged / generic sources for debuginfo is being considered / allowed for. I've really wanted to start stripping out debuginfo from our binaries on my development team since they're so huge, but we don't create RPM or DEB files. When I propose using external debuginfo people are not happy about the developer overhead (even scripted) so we've never done it.

If it were possible to easily "register" debuginfo files created through Jenkins or some other build service without having to turn them into distro packages, then allow tools (i.e., GDB) to download them more or less invisibly when needed, that would be really nice.

Also is this limited to just debuginfo files?

The other big problem we have is cores being generated on remote systems which are using system libraries other than the local ones: in this situation we need to obtain the remote system's libc.so and other necessary system libraries. It would be really, really nice if we could register shared libraries from different systems, perhaps indexed via a hash of some kind, then have GDB automatically download them as well.

Of course, before this can be done we need to ensure that the core file contains enough information about the shared libraries to perform the lookup, which I doubt it does today, so this is requires more work in other places... however it would be good if the design of this service was able to be extended in this way in the future if/when it becomes feasible. For example in our system we use Google coredumper library rather than the kernel to dump cores and this allows us to add "notes" into the generated core file, so we could take advantage of this without Linux kernel support.

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 19:18 UTC (Wed) by fuhchee (guest, #40059) [Link] (5 responses)

Compiling buildids into your binaries is enough for this widget to find system shared libraries & their debuginfo (and possibly their sources), and serve them to a remote debugger.

It seems like we're all thinking roughly alike. Exciting times!

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 19:41 UTC (Wed) by madscientist (subscriber, #16861) [Link] (4 responses)

Hm... we must be talking about something different. Let be more clear.

I compile a program on my build system (I use a sysroot to ensure that it links against sufficiently older system libraries that it can run "anywhere"). I send my program out to run tests some other system running some random distribution completely different than the one it was built on, which is using a different GNU libc, etc. Maybe Travis, or AWS, or just a local test farm.

It fails and a core is generated. To debug that core I need my program, the debuginfo from my program (if the program is stripped), the core file, and the system libraries from the system it was running on when the core is generated.

I can't see any way that a buildid compiled into my binary can be sufficient to retrieve the runtime system libraries.

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 19:52 UTC (Wed) by fuhchee (guest, #40059) [Link] (3 responses)

The runtime system shared libraries have their own buildids, and the relevant ELF note sections should show up in the core dump. From those buildids, the relevant binaries / debuginfo can be found.

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 20:22 UTC (Wed) by madscientist (subscriber, #16861) [Link] (2 responses)

I see. So dbgserver_find_executable() is intended to be used with shared libs as well? Or is this part not quite complete?

I think it would also be helpful if the client interface provided separate lookup and download methods rather than forcing them to both be a single method (there can be a simplified "do both" method as well if wanted). I can easily imagine situations where we want to know whether a given buildid exists on the server without actually downloading it.

For example, suppose I have a suite of test servers running random environments; during test runs a core is generated. I want to know if the program under test and/or system libraries for this system already exist in the debug server or not: I just want to look them up but not download them. If they don't exist perhaps I'll include them along with the core file when I bundle up the build results. If they do exist I don't need to add them.

Or perhaps I have an automated way for the test system to upload binaries and/or system libraries that aren't already on the debug server (I understand that upload is not in scope for this project and would need some other process) but I don't want to bother uploading things that I already have so I need to be able to check.

A simple program that uses the client interface to look up and/or download files would be very useful, as an example if nothing else (and probably for people who would like to add scripting to systems where it's not so simple to recode them to use it).

Cheers!

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 21:09 UTC (Wed) by fuhchee (guest, #40059) [Link] (1 responses)

> I see. So dbgserver_find_executable() is intended to be used with shared libs as well?

Yes.

> I think it would also be helpful if the client interface provided separate lookup and download methods

Will consider that ... though there may be better ways to service the needs you outline. Deduplication at upload time should be easy too. Re. optimizing packaging of core dumps ... not sure how much sense that makes. The core dump recipient could consult the same debuginfo servers too; or you could preemptively package all the files. Will think on it more.

> A simple program that uses the client interface to look up and/or download files would be very useful

It just appeared in the repo! We employ only the most talented psychics and keyboard monks.

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 21:29 UTC (Wed) by madscientist (subscriber, #16861) [Link]

> Deduplication at upload time should be easy too

If you mean deduplication by the server that's probably helpful but it's a lot of wasted effort to upload 10's or 100's of MB of libraries, binaries, etc., only to have it tossed on the floor as duplicate. Consider a build farm with 200 systems, which are upgraded via apt-get update or whatever at random intervals so they have different system libraries, different program instances, etc... having every system upload all its files for every core even though the system libraries might only change once every few weeks or less seems like overkill.

> Re. optimizing packaging of core dumps ... not sure how much sense that makes. The core dump recipient could consult the same debuginfo servers too; or you could preemptively package all the files.

For this I wasn't thinking that the dbgserver code would do that, I was thinking about scripting that users are using with their test clients to bundle results of failures so they can be uploaded to a test server for further investigation. Our current scripting already preemptively packages all the files: what I'd like to be able to do is detect when some/all of these items are not needed and skip that to reduce the size of uploaded artifacts.

When you're talking about moving content into/out of AWS or other cloud providers, the amount of data sent over the network directly equates to $$ spent and reducing it is always welcome.

Thanks for working on this, it'll be very cool!

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 22:01 UTC (Wed) by nix (subscriber, #2304) [Link] (1 responses)

Ooh this is very nice, very nice indeed.

I wonder if it can handle more than just debuginfo... musing about having libctf automatically launch dbgserver queries for missing CTF sections now -- so people can have separated CTF if they really want *and* it is as if it were always present. Best of all worlds! For that matter they can also do both -- perhaps an option at CTF generation time which automatically emits separated CTF *if* its size passes some threshold, or some percentage threshold of the total binary size, or the .text size, or something. Of course then you'd have to arrange for the dbgserver to see it, but presumably whatever method is used for separated debuginfo would work for this too.

The Compact C Type Format in the GNU toolchain

Posted Aug 8, 2019 12:28 UTC (Thu) by fuhchee (guest, #40059) [Link]

> I wonder if it can handle more than just debuginfo... musing about having libctf automatically launch dbgserver queries for missing CTF sections now -- so people can have separated CTF if they really want

It should be a small amount of extra effort to extend it sideways to a 'ctf' sibling to 'debuginfo'.

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 22:26 UTC (Wed) by thoughtpolice (subscriber, #87455) [Link] (1 responses)

This is very interesting. We have a tool for NixOS that basically does except it is based on a FUSE filesystem and a content addressable packaging system: https://github.com/edolstra/dwarffs

Essentially, every version of every package has a unique hash. We build a reverse mapping from the buildids of the binaries in a package to its unique hash, and upload that metadata along with the package to the package server. We then patch GDB (and elfutils) to look in a specific directory for debug info. This directory is a FUSE filesystem, and when any tool tries to look in `.build-id/...` for the debug info -- it does a query to the package server, obtaining the unique package ID containing the symbols, and transparently installs them through the package manager. It is effectively a version of Microsoft Symbol Server, which is basically what people want, from what I can tell...

Perhaps we could replace dwarffs with something like dbgserver, however. Or integrate them so there's a single UX. We could for instance, perhaps replace the client tooling with a separate "backend" for our case, and the tools can all just work around that instead...

The Compact C Type Format in the GNU toolchain

Posted Aug 7, 2019 23:25 UTC (Wed) by fuhchee (guest, #40059) [Link]

Yup, was aware of your server! We wanted something more http flavoured and a little more distro-independent. Joining forces would of course be wonderful!