|
|
Subscribe / Log in / New account

Tracepoints for the VFS?

By Jake Edge
April 18, 2025

LSFMM+BPF

Adding tracepoints to some kernel subsystems has been controversial—or disallowed—due to concerns about the user-space ABI that they might create. The virtual filesystem (VFS) layer has long been one of the subsystems that has not allowed any tracepoints, but that may be changing. At the 2025 Linux Storage, Filesystem, Memory Management, and BPF Summit (LSFMM+BPF), Ted Ts'o led a discussion about whether the ABI concerns are outweighed by the utility of tracepoints for the VFS.

[Ted Ts'o]

Ts'o began by noting that Al Viro, who has opposed VFS tracepoints over the years, was not present, but that VFS co-maintainer Christian Brauner was in attendance to give his opinions on the matter. Historically, there have been concerns about placing tracepoints at various places in the VFS, Ts'o said, such as for system calls like open() and rename(). One concern is about tracepoints in hot paths affecting performance, but he thinks that could be worked around by keeping the tracepoints at the system-call level. Another is that the tracepoints "might potentially constrain our implementation" because of the user-space interface question, with the powertop incident often cited as an example of the problem.

Today, developers and users seem to be less concerned about the ABI stability for tracepoints, he said. Many are using function tracing now to get the information they need from the VFS, which is even more implementation dependent; adding tracepoints will make things more usable and maintainable.

Brauner said that there is no real barrier to adding VFS tracepoints in his mind; the hot-path concern is valid, but that just means being careful. The VFS debug infrastructure, which is patterned on similar functionality in the memory-management subsystem, is queued up for 6.15. Adding tracepoints makes sense in that context, he said, it helps development. He does not believe there is a need to make them stable; "we are free to remove them, I think, we are free to move them around".

Ts'o said that system administrators also find tracepoints useful, not just developers. "If we put tracepoints, let's try to keep them stable; yeah we can change them, but in the ideal we don't".

Chuck Lever said that he has "been a gigantic proponent of tracepoints in the NFS subsystem". Some kind of observability is needed, but it is easy to go overboard and stick tracepoints in every function and on every return path. Since much of the filesystem community was gathered at the summit, he wanted to discuss what the goal of the effort should be. One possibility is to only add them to error paths; function tracing already exists and is easy to use for the arguments and return values. Determining which error return was taken would be a good use for tracepoints. In general, having a use case for a tracepoint, rather than just scattering them in the code, is important, he said.

David Howells cautioned that there is a need to ensure that new tracepoints do not themselves trigger tracepoints in, say, tracefs. Accessing the tracing information should not cause more tracepoints to fire. Along those lines, he suggested that some kind of filtering was needed to isolate the tracepoints for a particular filesystem. Tracepoints for the mount path should be fine, since those operations are not that frequent, but the kernel does a lot of reads and writes.

Brauner wondered if tracepoints for read and write were even all that interesting, but Howells said that the page flags of the buffers are. Brauner said that much of the filtering requested can already be done using bpftrace, kretprobes, and other kernel facilities. Ts'o said that the ext4 tracepoints have the dev_t of the filesystem being operated on as a parameter, so he can filter for a specific filesystem based on that value.

Tracepoints are more useful than other possibilities because those often depend on functions not getting inlined by the compiler or by a post-compilation optimization tool, such as BOLT, an attendee said. "We need way more tracepoints", he said. Brauner said that they will also be useful in the mount path, since there are so many ways to get an EINVAL return code; he has a friend with a "script called 'why-did-mount-fail'", but there are so many inlined functions that it can be difficult to determine.

Adding VFS tracepoints seems non-controversial, someone said, to general agreement. The specifics of the tracepoints and where they are placed may be controversial, however. Brauner said that times have changed in the almost 15 years since the powertop problem and the "observability game" has changed as well.

Mathieu Desnoyers, who was the original developer of tracepoints back in 2008, noted that there was another concern expressed when the question of adding tracepoints was raised in the past: tracepoints can be misused as an execution-hijacking mechanism. For example, a rootkit could potentially use a tracepoint to alter the behavior of a system call in a running kernel. Several people noted that there are other kernel mechanisms that could be used for that purpose, however, without the need for any tracepoints. It does not seem like something to worry about at this point.

Those in the room certainly seemed to be in favor of adding VFS tracepoints and no real barriers to doing so were raised. One would guess that patches to start adding them will be posted before long.


Index entries for this article
KernelFilesystems/Virtual filesystem layer
KernelTracing
ConferenceStorage, Filesystem, Memory-Management and BPF Summit/2025


to post comments

Don't we already have VFS tracepoints?

Posted Apr 19, 2025 5:13 UTC (Sat) by alison (subscriber, #63752) [Link] (2 responses)

What am I missing? Aren't these VFS syscall tracepoints?

$ sudo bpftrace -l | grep tracepoint:syscalls | grep read
tracepoint:syscalls:sys_enter_read
tracepoint:syscalls:sys_enter_readahead
tracepoint:syscalls:sys_enter_readlink
tracepoint:syscalls:sys_enter_readlinkat
tracepoint:syscalls:sys_enter_readv
. . .

$ uname -r
6.12.21-amd64

And there are kfuncs too.

Don't we already have VFS tracepoints?

Posted Apr 19, 2025 7:16 UTC (Sat) by kxxt (subscriber, #172895) [Link] (1 responses)

> What am I missing? Aren't these VFS syscall tracepoints?

These are syscall tracepoints, definitely not VFS tracepoints.

For example, if you make x32 or x86_32 syscalls on x86_64 linux, they won't hit those syscall tracepoints you mentioned. But they will hit the VFS tracepoints (in the future).

Don't we already have VFS tracepoints?

Posted Apr 19, 2025 18:23 UTC (Sat) by iabervon (subscriber, #722) [Link]

Presumably, "[T'so] thinks that could be worked around by keeping the tracepoints at the system-call level" means that they're in functions implementing VFS operations that each do the work that one syscall does, but it's easy to interpret as meaning that they'd be in the code that exposes that functionality as syscalls.

Enable CONFIG_TRACEPOINTS only?

Posted Apr 20, 2025 19:12 UTC (Sun) by meyert (subscriber, #32097) [Link]

is there a way to enable CONFIG_TRACEPOINTS only?

The commit 5f87f1121895dc09d2d1c1db5f14af6aa4ce3e94 seems to have removed the ability, for some reason.

The best I could come up with is:
- FTRACE = y
- ENABLE_DEFAULT_TRACERS = y (which selects TRACING which selects TRACEPOINTS, but has the side effect of also setting STACKTRACE = y)

Bug or feature?

Method tracepoints

Posted Apr 22, 2025 15:52 UTC (Tue) by willy (subscriber, #9762) [Link]

I wasn't able to be in the room for this one, sorry.

I would suggest that there's a win to be had in adding VFS tracepoints around the calls to filesystems, because we can then remove the tracepoints that individual filesystems have at the entry/exit points of those methods.

The problem will be those filesystem authors who insist on retaining their personal tracepoints because they don't want to change their workflow.


Copyright © 2025, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds