Monitoring mount operations
Amir Goldstein kicked off a session on monitoring mounts at the 2023 Linux Storage, Filesystem, Memory-Management and BPF Summit. In particular, there are problems when trying to efficiently monitor "a very large number of mounts in a mount namespace"; some user-space programs need an accurate view of the mount tree without having to constantly parse /proc/mounts or the like. There are a number of questions to be answered, including what the API should look like and what entity should be watched in order to get notifications of new mount operations.
It is trivial, he said, to add a notification for unmount events, but the corresponding event for a new mount is trickier, since it is not clear where, exactly, the watch for that event should be placed. It could be placed on the user or mount namespace of interest; another idea would be to choose a directory and monitor all of the mounts that happen on it and any of its subdirectories recursively. David Howells said that he has implemented something for getting mount notifications; the watch is placed on the mount namespace. Miklos Szeredi said that each namespace has its own mount tree and each mount has a 32-bit ID that gets assigned to it, but those cannot reliably be used to uniquely identify a particular mount because they can be reused during a given boot of the system. Howells said that he added a 64-bit counter that could be used for that purpose, though it will "eventually get reused" as well.
Howells was asked about patches, which he said he had posted a while back. Szeredi pointed out that those patches were not for fanotify support, but were instead for the watch queue; it is the same general concept, however, he said. Christian Brauner thought that the notification piece should be separated from the fsinfo() effort.
The problem, Howells said, is that the notification queue can overflow, which means that events, such as mount and unmount operations, would get lost. Howells mentioned that currently tools have to parse (and poll) /proc/mounts in order to find out the status of mounts and unmounts, which is not particularly efficient. Brauner noted that he had invited Lennart Poettering to the talk, since systemd would be one of the eventual users of any new feature of this sort, so he asked Poettering about systemd's needs in this area.
Right now, systemd parses /proc/self/mountinfo, "which, of course, is terrible", Poettering said. He is not particularly concerned if events get dropped, as long as there is a way to figure out what has happened; some kind of unambiguous indication that events have been dropped coupled with an API for systemd to get the current status when it needs to do so would be ideal. He would like a facility that provides the immediate child mounts for a given mount along with mount-related events for those children. Howells said that Ian Kent had created a patch set implementing mount watching for systemd using fsinfo() and the watch queue notifications.
Brauner asked if the feature needed to be added to fanotify for systemd's use, but Poettering said that he did not care. His main concern is in getting notified when events are lost, so that systemd can take some action to update its state; it would be great if the lost-event notification narrowed down where in the mount tree the lost event(s) came from. For systemd's use case, it would be better to get events for a particular subtree, rather than the whole system, because it normally is only concerned with a subset of the full mount tree.
Jeff Layton asked about the systemd use case for this information. Poettering responded that there are many systemd services that need to wait for mount activity of some form (e.g. at boot time, MySQL needs to wait for the filesystem where its files reside). Much of systemd's dependency processing for services depends on an accurate picture of the state of the system, including mounts.
Goldstein said that he was unsure how to report the occurrence of a tucked mount, which is a mechanism aimed at cleanly replacing an overlay mount. Brauner said that he was no longer "allowed to call it that"; there is another interpretation of that term, which he was unaware of until "friendly people on social media" pointed it out to him. They suggested using "beneath" to describe the type of mount. There is also, of course, the danger of mistyping the previous term, he said.
There was some discussion of a way to retrieve the immediate child mounts, as Poettering requested, but that will require a unique mount ID, Brauner said. After some roundabout discussion about mount-related APIs and the concerns that would need to be kept in mind, worries about a mount-ID overflow were raised. Layton pointed out that a 64-bit counter that gets incremented every nanosecond will take more than 500 years to overflow, so "we're never going to overflow at 64 bits".
There may be problems with exposing those 64-bit values to user-space programs that expect only a 32-bit mount-ID, however. In fact, Poettering checked the systemd code and it "knows" that the mount-IDs are 32-bits in size. Howells said that the existing mount-ID is "recycled, too small, people assume it is too small", so something new that is defined to be 64-bits is needed. Poettering suggested using UUIDs "and the problem goes away", he said, to chuckles around the room. As time expired, things kind of trailed off; it is clear that there is more work needed before anything is likely to go upstream.
Index entries for this article | |
---|---|
Kernel | fanotify |
Kernel | Filesystems/Mounting |
Conference | Storage, Filesystem, Memory-Management and BPF Summit/2023 |
Posted May 24, 2023 15:29 UTC (Wed)
by intelfx (subscriber, #130118)
[Link] (4 responses)
What's the "other" interpretation?
// here we go again...
Posted May 24, 2023 16:20 UTC (Wed)
by excors (subscriber, #95769)
[Link] (2 responses)
Posted May 25, 2023 6:04 UTC (Thu)
by smurf (subscriber, #17840)
[Link]
Given the absurd level of synonymal richness in the English language (check out the game "Synonymy") renaming this is stupid – esp. since we all know what "mount" can mean in the same context. Nobody's going to rename THAT.
Posted May 25, 2023 12:23 UTC (Thu)
by atnot (subscriber, #124910)
[Link]
That all said, given the difficulty of unlearning things I would understand if the author voluntarily preferred to name it something else now to avoid having that association in his head while working on the feature ;)
Posted May 24, 2023 18:47 UTC (Wed)
by jkingweb (subscriber, #113039)
[Link]
Posted May 24, 2023 21:09 UTC (Wed)
by bluca (subscriber, #118303)
[Link] (1 responses)
Posted May 25, 2023 15:27 UTC (Thu)
by brauner (subscriber, #109349)
[Link]
Posted May 24, 2023 21:25 UTC (Wed)
by shemminger (subscriber, #5739)
[Link] (2 responses)
If queue overflows, then userspace application will see an error on the netlink listening socket.
Posted May 24, 2023 23:58 UTC (Wed)
by Fowl (subscriber, #65667)
[Link] (1 responses)
Posted May 27, 2023 10:38 UTC (Sat)
by kaesaecracker (subscriber, #126447)
[Link]
Monitoring mount operations
Monitoring mount operations
Monitoring mount operations
Monitoring mount operations
Monitoring mount operations
Monitoring mount operations
Monitoring mount operations
Monitoring mount operations
Handling overflow in a safe manner is really hard. The application can recover by rescanning the space it is listening to (ie all routes) but then synchronizing incoming changes with current state gets messy.
Monitoring mount operations
Monitoring mount operations
I think a user space application could have a thread dedicated to listening to new messages and immediately copying them to another in-process queue for actual processing.