Updates to pahole
Arnaldo Carvalho de Melo spoke at the 2024 Linux Storage, Filesystem, Memory Management, and BPF Summit about his work on Poke-a-hole (pahole), a program that has expanded greatly over the years, but which was relevant to the BPF track because it produces BPF Type Format (BTF) information from DWARF debugging information. He covered some small changes to the program, and then went into detail about the new support for data-type profiling. His slides include several examples.
BTF gradually evolves alongside BPF. Over time, Carvalho has been adding options to pahole to cope with the changes, but those options only make pahole more difficult to use. It is sometimes difficult to know what flag or combination of flags should be used for any given invocation. So recently he has added a --btf_features flag that takes a comma-separated list of features, in order to centralize the different flags. Any unknown flags are ignored, which could make emitting BTF using older versions of pahole less painful. During development, the --btf_features_strict can be used to produce warnings for misspellings. The new approach has slightly simplified the Makefile that the kernel uses to generate BTF information by replacing conditional statements with a static set of flags:
--btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func
Another recent change is the introduction of reproducible builds. Another developer had sent in a patch that disabled parallel BTF encoding, because the output could differ between runs. Ensuring that the output of pahole is reproducible is important because the BTF information gets encoded into the kernel image — so un-reproducible BTF meant un-reproducible kernel images. Now Carvalho has added code to ensure that parallel encoding threads emit information in the same order every time. New reproducibility tests confirm that users can now have both parallel encoding and reproducible builds, with minimal performance overhead.
BPF has had a "call by BTF ID" mechanism for kfuncs for some time, but previously there has not actually been a way to see which kfuncs are available in a given kernel. Now, pahole emits declaration information for kfuncs (when the feature decl_tag_kfuncs is enabled), so interested code can iterate over all the declarations. Carvalho has also been working on changing how BTF handles kernel modules. Right now, the debugging information in kernel modules references items in the kernel by BTF ID, so that debug information for the whole kernel doesn't need to be shipped with each module. This would not be a problem, except that the numbering changes for each build of the kernel. Normally, this necessitates recompiling modules alongside the kernel. But with a bit of extra effort, the kernel modules can be built with ELF relocations in the generated BTF, so that they don't always need to be rebuilt. An implementation of that is almost done and ready to be merged, he said.
Data-type profiling
BTF already provides an easy way for performance-monitoring tools like perf to trace which lines of code correspond to particular instructions. That doesn't always tell the full story, though. Modern CPUs have aggressive caching, and fetching values from memory is a serious performance hit; it can make sense to analyze performance issues in terms of the data a computation is interacting with, instead of the code being run.
Carvalho demonstrated two tools: perf mem, which displays the time spent accessing memory broken down by the individual members of each structure, and perf c2c, which tracks false sharing and cache evictions. For these tools to work, there needs to be some way to connect a memory access not just to a line of code, but to the particular type of the value involved. The original version of perf mem used DWARF debugging information to make that connection. Now, BTF has enough information to be used for that purpose as well. perf still prefers DWARF tables when they are available, but it does use BTF to display information on kfuncs.
Carvalho went into some detail about how perf handles disassembling programs; in short, it uses the Capstone library, with a fallback to objdump when Capstone is unavailable. Integration with objdump also means that it supports all of the architectures that are supported by GNU Binutils.
Daniel Borkmann asked whether these changes were already available, and Carvalho said that they were. He is continuing to work on improvements, but the basic functionality is usable now. José Marchesi asked whether it would cause a problem if the kernel changed to generate BTF directly, instead of being compiled with DWARF information and then using pahole to translate. Pahole has a BTF loader, Carvalho explained, so it would not need to change anything about how the tool is used. At that point, the session ran out of time.
Index entries for this article | |
---|---|
Conference | Storage, Filesystem, Memory-Management and BPF Summit/2024 |
Posted Jun 21, 2024 9:27 UTC (Fri)
by Sesse (subscriber, #53779)
[Link]
- It can indeed use Capstone, but only on x86 (and perf is frequently not built against Capstone)
To make things extra complicated, perf tries to re-interpret the disassembler output and add its own tweaks, but it only supports a subset of instructions, so you can end up with weird inconsistencies around e.g. space after comma.
Disassembly
- It can link to libbfd, but libbfd is GPLv3 and perf is GPLv2, so a distro build won't support this
- It can fall back to objdump, but this is usually slower
- Finally, there is support to linking to libllvm in review, which would give basically the same support as libbfd but without the license woes