Comparing SystemTap and bpftrace

Posted Apr 20, 2021 1:21 UTC (Tue) by ringerc (subscriber, #3071)
Parent article: Comparing SystemTap and bpftrace

Some key operational distinctions exist too, especially for userspace tracing, which is where my interests lie.

The TL;DR is that in practice the effective, reliable use of any of these tools tends to require that you be able to plan ahead in order to install necessary debuginfo, kernel headers, utilities etc well ahead of time, before they age out of repositories. For best results you'll want a much newer kernel than the enterprise-y distro defaults too. So they work best in tightly controlled farms of machines where the people who care about tracing can control how the systems are installed and updated.

* SystemTap kmod mode needs kernel headers and prefers kernel debuginfo. Both age out of repos quickly.
* SystemTap dyninst runtime requires restarting the target so it can LD_PRELOAD, and is more limited than kmod runtime
* Effective eBPF userspace tracing in practice requires quite new kernels, so bpftrace, bcc, etc are hard or impractical to apply to older kernels common in the wild
* SystemTap bpf runtime is limited by the same kernel version concerns as bcc etc *and* its own systemtap-specific limitations.
* SystemTap doesn't usually work on kernels newer than the systemtap release due to internal kernel API changes. It spews compile errors. So you usually need to get a newer systemtap to work with newer kernels.
* bpftrace and bcc don't currently handle detached DWARF debuginfo, and don't even handle binaries built with the x64 default -fomit-frame-pointer compile flag properly.
* The rich debuginfo based access to userspace state available in systemtap is mostly absent from bpf tooling targeting userspace, so access to your program state is very painful with bpf.

So on older kernels you can use systemtap, except you probably can't get the kernels headers and debuginfo installed. And you're generally not going to encounter newer bpf-friendly kernels in the wild on production systems unless you're managing your own clusters of systems. If you do have a newer kernel and want to use bpf, you get to fight with its lack of DWARF based unwinding and its primitive to nonexistent ability to understand userspace memory contents.

I find this intensely frustrating, as I get a great deal of value out of both tools in my own debugging and performance work. But systemtap sometimes breaks when I update my kernel on my laptop, and I need a bleeding-edge bcc for even some of the basic functionality I needed for simple userspace tracers.

Comparing SystemTap and bpftrace

Posted Apr 20, 2021 1:22 UTC (Tue) by ringerc (subscriber, #3071) [Link]

I split the details into a child comment so the main one wouldn't be too long.

SystemTap is widely available even for older systems, though the packaged versions are usually older so it's a bit of a pain to write scripts that work with them. It's easy to compile if you're allowed to install the needed toolchain and dependencies on the target and you have the time, but that adds to the hassle. Especially when you're not hands-on and you just want the other end (customer, or whatever) to run a tapscript for you.

Additionally, for its most fully featured and default runtime (kmod) SystemTap requires kernel headers and preferably debuginfo. These are frequently unavailable for whatever older kernel point release happens to be running on the target system at the time you need to run some tracing tools. Or at best you have to go digging manually through some archive of old packages that have aged out of the main repositories for the OS. The stap-prep tool can't usually find them for you. So to reliably use systemtap's kmod runtime you need to plan ahead and install kernel headers and debuginfo whenever you update the kernel, which nobody ever does. This drastically limits its practical utility.

But lots of eBPF features and helper functions are only available in much newer kernels. On widely deployed "enterprise" system kernels it's basically useless for nontrivial userspace tracing and analysis. eBPF is quite fragile in the face of kernel version changes as soon as you step outside the canned tracepoints, and the set of helper functions is extremely limited.

Even if you can run your bpf scripts, your userspace stacks are going to look like "-" most of the time, because everything is compiled with -fomit-frame-pointer. AFAICS most bpf tools don't handle external DWARF debuginfo or use tools like libunwind to help them out. So you land up having to recompile with -fno-omit-frame-pointer and use unstripped binaries with debuginfo in the main binary. This basically means you can't do much tracing of packaged userspace binaries as are the norm on production systems.

SystemTap on the other hand will not only get you your userspace stacks using DWARF detached debuginfo, it'll now even talk to a debuginfod to download symbols for you during probe compilation. It'll walk userspace pointers chains, examine struct members, recursively print structs, handle unions and so much more using simple built-in syntax. So it's currently infinitely more powerful for userspace probing and analysis ...

... or it would be if only you could find and install the kernel headers.

SystemTap also has 'dyninst' and 'bpf' runtimes, which entirely avoid the need for kernel headers and can often be used without kernel debuginfo. But a considerable number of the built-in systemtap "tapsets" rely on embedded-C code written for kernelspace, which simply won't work for a dyninst or bpf tapscript. Or they rely on helper functions exported by the kmod runtime that are not implemented for the dyninst or bpf runtimes. So in practice most of your existing systemtap scripts won't work, and scripts are more difficult to write for the dyninst or bpf runtimes.

Additionally, the dyninst runtime requires that you wrap the target using LD_PRELOAD. So it's cool for development and QA work but for a production system it's often impractical, as you frequently want to non-intrusively trace an already-server running process.

This means you can't usually apply eBPF or use SystemTap with any of its runtimes to any system you encounter in the wild.