|
|
Log in / Subscribe / Register

Improving the reliability of file system monitoring tools (Collabora blog)

Gabriel Krisman Bertazi describes the new FAN_FS_ERROR event type added to the fanotify mechanism in 5.16.

This is why we worked on a new mechanism for closely monitoring volumes and notifying recovery tools and sysadmins in real-time that an error occurred. The feature, merged in kernel 5.16, won't prevent failures from happening, but will help reduce the effects of such errors by guaranteeing any listener application receives the message. A monitoring application can then reliably report it to system administrators and forward the detailed error information to whomever is unlucky enough to be tasked with fixing it.


to post comments

Improving the reliability of file system monitoring tools (Collabora blog)

Posted Mar 15, 2022 6:48 UTC (Tue) by pabs (subscriber, #43278) [Link] (12 responses)

I hope the desktops like GNOME get support for this feature. Currently having your rootfs go read-only is quite bewildering as things stop working and saving data is not possible but there is no notification about the situation, like there is when your disk fills up.

Improving the reliability of file system monitoring tools (Collabora blog)

Posted Mar 15, 2022 8:19 UTC (Tue) by Wol (subscriber, #4433) [Link] (9 responses)

Well, I suspect the correct answer to that is that root is meant to be read-only under normal circumstances, and /var should be another volume.

But then, if you're a desktop user, that's sysadmin details you don't want to worry about ...

Cheers,
Wol

Improving the reliability of file system monitoring tools (Collabora blog)

Posted Mar 15, 2022 10:23 UTC (Tue) by kronat (guest, #117266) [Link] (2 responses)

> But then, if you're a desktop user, that's sysadmin details you don't want to worry about ...

Please, go on and write documentation patches (that I'll gladly read to learn how to set up properly my system -- probably I'd need another volume for /usr as well if I ever want to update packages) instead of wasting your time replying in such patronizing tone.

Improving the reliability of file system monitoring tools (Collabora blog)

Posted Mar 15, 2022 11:05 UTC (Tue) by Wol (subscriber, #4433) [Link]

Well, you're unusual to start with! Someone who ACTUALLY READS documentation!? Whatever next!

And no, I don't mean to be patronising, but the majority of (l)users are people who (1) just want their computer to work, and (2) don't want / don't have time to play setting up their system "just right".

I'm one of those power users who knows just enough to be really dangerous :-)

Cheers,
Wol

Improving the reliability of file system monitoring tools (Collabora blog)

Posted Mar 15, 2022 13:14 UTC (Tue) by elel (guest, #100484) [Link]

I'll have to agree with the sentiment about not wanting to have to worry about those details. Even as someone who knows some sysadmin I'd much prefer not to have to mess with partitions or track down strange faults.

Improving the reliability of file system monitoring tools (Collabora blog)

Posted Mar 15, 2022 12:12 UTC (Tue) by ovitters (guest, #27950) [Link]

> But then, if you're a desktop user, that's sysadmin details you don't want to worry about ...

If you're using Linux at home (I guess a lot are) then you're automatically a sysadmin as well. Despite maybe not wanting that role. I'd prefer that I don't need to worry too much and that the distribution takes care of most things. Obviously that doesn't work that easily in practice due to a distribution needing to handle different types of users.

Improving the reliability of file system monitoring tools (Collabora blog)

Posted Mar 15, 2022 14:56 UTC (Tue) by eduperez (guest, #11232) [Link]

> Well, I suspect the correct answer to that is that root is meant to be read-only under normal circumstances, and /var should be another volume.
> But then, if you're a desktop user, that's sysadmin details you don't want to worry about ...

In many scenarios, the user and the sysadmin are the same person, and installing new software is considered "normal circumstances".

Improving the reliability of file system monitoring tools (Collabora blog)

Posted Mar 15, 2022 23:15 UTC (Tue) by gerdesj (subscriber, #5446) [Link] (3 responses)

Wol, how much of root should be r/o? Just the top level? I think there are bigger fish to fry. For starters, having root go r/o is an early sign of damage on a fs - I've abused enough VMs to be familiar with this. Then you'll need a lot more mounts for all the other bits and pieces, including /root which really ought to be available no matter what (for root) so that there is a local place to store stuff in extremis.

I've never seen a distro do a r/o / either. It's just too much of a fiddle. Start down that path and you'll be doing things like maintaining a set of hard links with immutable flags set on them to stop the baddies instead of tripwire type solutions and other mad ideas. You can always play with SE Linux and co instead to get the desired effect too.

That's for desk/lap tops. Your servers/containers/etc are a different kettle of bits.

Improving the reliability of file system monitoring tools (Collabora blog)

Posted Mar 16, 2022 8:27 UTC (Wed) by geert (subscriber, #98403) [Link]

I never made / read-only, but I did make /usr read-only, in the old days hard drives were small, and all of /, /usr, /var, /usr/local, /export/home1, /export/home2 (/home on autofs), ... were put on separate partitions. And after a while a bunch more, when you ran out of space.

Hence before and after doing "apt-get upgrade", I had to remount /usr read-write resp. read-only.

P.S. I'm not that old, as I'm mentioning /var ;-)

Improving the reliability of file system monitoring tools (Collabora blog)

Posted Mar 16, 2022 9:32 UTC (Wed) by ganneff (subscriber, #7069) [Link]

I actually had a server system (two, clustered with corosync/pacemaker) run with / fully ro. Only /home, subdirs (extra mountpoint) of /root and /var had been rw.
Login cluster for a server network. Back when even with grsecurity patches to lock down some more, including forbidden a remounting of things.

Worked pretty good, most of the time. Annoying to update (reboot into special rw-mode that didn't start services, update, reboot back to normal ro). Recently replaced with a new set of VMs and no more ro.

Sure nothing that one wants as standard, point is: It is actually *not* hard at all, to have it run so. And yes, it obviously depends on the exact usage of the system, as always.
(Server, not Desktop, that is).

Improving the reliability of file system monitoring tools (Collabora blog)

Posted Mar 17, 2022 10:00 UTC (Thu) by daniels (subscriber, #16193) [Link]

> I've never seen a distro do a r/o / either. It's just too much of a fiddle.

Apart from Fedora Silverblue, ChromeOS, SteamOS, etc.

> Start down that path and you'll be doing things like maintaining a set of hard links with immutable flags set on them to stop the baddies instead of tripwire type solutions and other mad ideas.

You're describing OSTree, used by Silverblue and SteamOS as well as many others.

Improving the reliability of file system monitoring tools (Collabora blog)

Posted Mar 15, 2022 22:03 UTC (Tue) by bartoc (guest, #124262) [Link] (1 responses)

Unfortunately, this work might not get there, consumer SATA drives basically never even report errors to the controlling computer until the damage is already done (they just correct them, and say the wrote with no error, how helpful of them!).

I think NVMe disks are a bit better, and, to be honest, if you need spinning disks SAS disks are usually less than $50 more than SATA disks. I guess you do need the HBA though.

(Note, ime even enterprise SATA disks don't tend to report errors very well)

Improving the reliability of file system monitoring tools (Collabora blog)

Posted Mar 16, 2022 0:28 UTC (Wed) by pabs (subscriber, #43278) [Link]

I thought this work was about filesystem errors, not about storage device errors, which are covered by the support for SMART in various places, although I think for GNOME you have to install smart-notifier to get SMART notifications. Filesystem errors can be caused be Linux kernel bugs or by storage device errors or silent storage device corruption (which happened to me), so they are a superset of storage device errors.


Copyright © 2022, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds