Trading off safety and performance in the kernel

Posted May 14, 2015 17:13 UTC (Thu) by marcH (subscriber, #57642)
In reply to: Trading off safety and performance in the kernel by neilbrown
Parent article: Trading off safety and performance in the kernel

> Any app worthy of the name will have called 'fsync' before giving you a visual indication that the save has completed. emacs certainly does. [...]
> And of course, any real app would have auto-saved every few minutes so even in a disaster you wouldn't lose more than a few minutes work.

"Don't break userspace" - even userspace bugs.

Trading off safety and performance in the kernel

Posted May 14, 2015 22:31 UTC (Thu) by neilbrown (subscriber, #359) [Link] (4 responses)

> "Don't break userspace" - even userspace bugs.

If userspace needs the kernel to call sync before crashing then the user-space is already broken. Systems can crash without entering suspend first.

But that is a big "if". Are there actually any non-trivial apps which don't save their data properly?

Trading off safety and performance in the kernel

Posted May 21, 2015 21:58 UTC (Thu) by Wol (subscriber, #4433) [Link]

> Are there actually any non-trivial apps which don't save their data properly?

Okay, it's not linux, but ... MS Word ?

(maybe it's changed, but I OFTEN lose data if I'm working on a document and it crashes - often it's the attempted auto-save that causes the crash :-(

Cheers,
Wol

Trading off safety and performance in the kernel

Posted May 23, 2015 16:20 UTC (Sat) by anton (subscriber, #25547) [Link] (2 responses)

If userspace needs the kernel to call sync before crashing then the user-space is already broken.

No, the file system is broken.

Are there actually any non-trivial apps which don't save their data properly?

No, there are just file systems (e.g., ext4) which do not provide decent guarantees and use this kind of rethoric to justify their poor behaviour. I expect that pretty much all non-trivial applications do not jump all the time through all the hoops that some developers of file systems expect of them; that's because they have no good way to test that they meet the expectations of these file system developers, and most application developers probably have many more urgent things to care about.

Anyway, one example of a broken file system losing data of a popular application (including the autosave files that the application produces regularly) is here.

Trading off safety and performance in the kernel

Posted May 25, 2015 6:52 UTC (Mon) by neilbrown (subscriber, #359) [Link] (1 responses)

> No, the file system is broken.

That makes no sense.

I agree that "If a filesystem needs the kernel to call sync before crashing then the filesystem in already broken" with the understanding that "needs" means "needs in order to protect the data that it is responsible for."
Data that has not yet been written to the filesystem is certainly not the filesystem's reponsibility.
Data that has been written but hasn't been the subject of 'fsync' is also not completely the filesystem's responsibility (unless you mount with '-o sync').

> one example of a broken file system losing data of a popular application

That is a filesystems from decades ago. Yes it was broken, no question. Linux filesystems aren't like that. All non-trivial Linux filesystems do journalling of metadata, which is much safer than synchronous metadata updates. I cannot promise they are all 100% bug free in every release, but I am certain that calling 'sync' in the suspend path isn't going to usefully fix any bug that they might have.

It also sounds like that "popular application", which was emacs, wasn't calling 'fsync' as it should and as it certainly now does.
Yes - bugs should be fixed. But let's not scatter "sys_sync" calls around and pretend that fixes them.

Trading off safety and performance in the kernel

Posted May 25, 2015 9:00 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

> That is a filesystems from decades ago. Yes it was broken, no question. Linux filesystems aren't like that. All non-trivial Linux filesystems do journalling of metadata, which is much safer than synchronous metadata updates. I cannot promise they are all 100% bug free in every release, but I am certain that calling 'sync' in the suspend path isn't going to usefully fix any bug that they might have.
As if on cue, today my laptop corrupted my filesystem during suspend/resume. I started synchronization of several large (~200G) directories with lots of small files from our local network and then totally forgot about it. Then I closed the laptop's lid and went home.

Resume failed (yet again) and after reboot my BTRFS filesystem refused to mount.