Trading off safety and performance in the kernel

Posted May 13, 2015 2:40 UTC (Wed) by neilbrown (subscriber, #359)
In reply to: Trading off safety and performance in the kernel by dlang
Parent article: Trading off safety and performance in the kernel

> Far better to generate some extra heat for a little bit than loosing hours of data because it didn't get flushed out.

What "hours of data" are you talking about?

Dirty pages get flushed after about 30 seconds, so if you don't 'sync' before suspend, then the most you could lose if resume fails is data that was written by an app in the 30 seconds before suspend.

I really don't think that is any significant data. Any apps that cares about data will 'fsync' at an appropriate time. Data that isn't fsynced, doesn't really matter ..e.g. logs.

Is there really *any* important data that becomes at-risk because of this change?

I think the greatest risk of data loss when resume fails is data in some application that hasn't been written to the filesystem yet, like the file you are in the middle of editing. That data isn't helped by sys_sync at alll. It requires pre-suspend notifications to apps so they can auto-save.

Trading off safety and performance in the kernel

Posted May 13, 2015 6:59 UTC (Wed) by marcH (subscriber, #57642) [Link] (15 responses)

> so if you don't 'sync' before suspend, then the most you could lose if resume fails is data that was written by an app in the 30 seconds before suspend.

How long can it take to sync that little?

Trading off safety and performance in the kernel

Posted May 13, 2015 7:21 UTC (Wed) by neilbrown (subscriber, #359) [Link] (10 responses)

> How long can it take to sync that little?

My "Open Phoenux" phone (www.gta04.org) runs a fairly ordinary Debian distro and sometimes has lots of kernel logging enabled.

When the logging is enable there is a very obvious lag on the way to suspend, such that trying to wake the phone again takes an annoyingly
long time (though probably less than 2 seconds).

I realise that might not be a common circumstance, but I'm also sure I'm not the only one who has a flash of insight the moment that I suspend the phone (or close the laptop lid, or submit the comment or seal the envelop).

But the point isn't really how long it takes. The point is that the 'sync' call really doesn't belong there. The benefit it provides is much more superstitious than scientific. If wanted, a sync-before-suspend is trivially performed in user-space, and if not wanted it currently requires a code edit to disable.

Trading off safety and performance in the kernel

Posted May 14, 2015 15:53 UTC (Thu) by marcH (subscriber, #57642) [Link] (5 responses)

> But the point isn't really how long it takes.

Yes it is, otherwise this entire discussion would not even exist.

No matter how you look at it, if syncing takes more than a few seconds on a system that regularly syncs every 30 seconds anyway, then there is something seriously weird or at the least very unusual about it.

> The benefit it provides is much more superstitious than scientific.

Except the suspend experience sometimes feels as safe as crossing the Atlantic in the 19th century. There are genuine reasons why so many people carry their Windows or Linux laptop open around the office despite safety rules (not Macs interestingly enough).

Power management has always been complex and does not look like it's getting simpler any time soon.

> If wanted, a sync-before-suspend is trivially performed in user-space,

Why not if this solves a problem for 1% of users or 1% of the time.

Why bother the change and disruption if it's only for 0.001%.

Trading off safety and performance in the kernel

Posted May 14, 2015 22:27 UTC (Thu) by neilbrown (subscriber, #359) [Link] (1 responses)

> Yes it is, otherwise this entire discussion would not even exist.

I'll be more precise. The point isn't the amount of time it takes, it is the fact that it takes any time at all.

You seem to be saying that it can't take enough time to bother you, and I suspect you are correct. Len Brown, by submitting the patch, is saying that it *does* take enough time to bother him. Are you saying he is wrong?

> Power management has always been complex and does not look like it's getting simpler any time soon.

Undoubtedly true. This has no effect on whether placing a 'sys_sync' at that point in the code actually provides any benefit. At all.

> Why bother the change and disruption if it's only for 0.001%.

Laying aside for the moment that 1% of 1% is 0.01%, these numbers are meaningless.

The vast majority of users get sync called (at least) twice on suspend - once by some user-space tooling and once by the kernel. Most (possibly all) distros already do this.
So removing the the sys_sync call from suspend in the kernel is already only going to affect few of the users that your are probably thinking of as the 99.99%.

But one of the drivers for this change is, apparently, android. I think the user-base there is a little more than 0.001%.

Trading off safety and performance in the kernel

Posted May 15, 2015 5:19 UTC (Fri) by marcH (subscriber, #57642) [Link]

> You seem to be saying that it can't take enough time to bother you, and I suspect you are correct. Len Brown, by submitting the patch, is saying that it *does* take enough time to bother him.

Whatever two particular individuals experience does not matter much, can we please go back to statistics?

> Are you saying he is wrong?

I really wonder where does that come from... are you related maybe? :-)

> Undoubtedly true. This has no effect on whether placing a 'sys_sync' at that point in the code actually provides any benefit. At all.

The connection is: reliability is inversely related to complexity and users simply want their data saved before their computer susp...crashes. See various horror stories in the other comments.

> Laying aside for the moment that 1% of 1% is 0.01%,

Agreed! Let's also lay aside that 0.1% of 1% is 0.001%. And a few others?!

> these numbers are meaningless.

They're semi-random examples, but not completely meaningless. The very simple point I was trying to drive (and hoping not to have to detail) is just: the kernel is never going to please every single use case for every single user. Proof: there are practically no hardware device shipping with an totally unpatched mainline kernel. The mainline only has code that has a significant number of actual users. So I think we all agree it's all about how [un]common is this or that use case. Statistics and trade-offs.

> So removing the the sys_sync call from suspend in the kernel is already only going to affect few of the users that your are probably thinking of as the 99.99%.
> But one of the drivers for this change is, apparently, android.

Thanks for the info and also the reminder; I got distracted by the laptop stories filling almost the entire comments space.

Trading off safety and performance in the kernel

Posted May 15, 2015 18:23 UTC (Fri) by tialaramex (subscriber, #21167) [Link] (1 responses)

> There are genuine reasons why so many people carry their Windows or Linux laptop open around the office despite safety rules (not Macs interestingly enough).

In my case it's company policy that VPN sessions don't survive a suspend. When you open the laptop back up, even after 30 seconds, the VPN client reminds you of the policy and prompts you to start over. You need to find your RSA dongle, go through the authentication again, reconnect, you'll get a new IP address and so of course all existing connections are dropped.

My company is a complete disaster for IT policy, but then, so are thousands of other large employers around the world. So this is a real scenario, even though it's an unnecessary and obnoxious one.

Trading off safety and performance in the kernel

Posted May 15, 2015 20:35 UTC (Fri) by marcH (subscriber, #57642) [Link]

> In my case it's company policy that VPN sessions don't survive a suspend.

Well I feel sorry for you but that's not our case; suspend issues are why they do it here.

Trading off safety and performance in the kernel

Posted May 18, 2015 19:32 UTC (Mon) by mathstuf (subscriber, #69389) [Link]

> There are genuine reasons why so many people carry their Windows or Linux laptop open around the office despite safety rules (not Macs interestingly enough).

Am I the only person who disables suspend-on-lid-close? For all OS variants. Never did like that behavior…

Trading off safety and performance in the kernel

Posted May 16, 2015 15:52 UTC (Sat) by ghane (guest, #1805) [Link] (3 responses)

neilbrown wrote:
> But the point isn't really how long it takes. The point is that the 'sync' call really doesn't belong there. The benefit it provides is much more superstitious than scientific.

Back on the late 80s, I was taught to type:
sync ; sync ; sync

before a shutdown or reboot. I assume this was so that the SVR5 kernel would know I really wanted to sync.

Trading off safety and performance in the kernel

Posted May 16, 2015 20:14 UTC (Sat) by dlang (guest, #313) [Link] (2 responses)

From what I understand, the issue was that a single sync could return to the command line before the data was actually on disk, but a second one couldn't start until the first finished.

that still doesn't justify three invocations, but would justify two.

Trading off safety and performance in the kernel

Posted May 16, 2015 20:42 UTC (Sat) by neilbrown (subscriber, #359) [Link] (1 responses)

Yes, the "sync" semantics are "wait for any pending writeout to complete, then start writeout on any dirty data", so 2 is sensible and 3 is superstitious.

http://pubs.opengroup.org/onlinepubs/7908799/xsh/sync.html

This is part of why calling sys_sync() once in the suspend path is wrong (twice has been suggested), though I'm not certain if the Linux implementation exactly matches the specification.

Trading off safety and performance in the kernel

Posted May 17, 2015 4:21 UTC (Sun) by neilbrown (subscriber, #359) [Link]

> Yes, the "sync" semantics are "wait for any pending writeout to complete, then start writeout on any dirty data",

Actually, I'll have to backtrack on this. I can find no evidence in historical Unix, all the way up to 4.3BSD, to suggest that the 'sync' system call would wait. It just initiated IO. So maybe calling it 3 times makes sense.

Linux (roughly) followed that approach until Linux 1.3.20. That version introduced the change:

--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -228,7 +228,7 @@ int fsync_dev(dev_t dev)
 
 asmlinkage int sys_sync(void)
 {
-       sync_dev(0);
+       fsync_dev(0);
        return 0;
 }

To give the fuller context:

void sync_dev(dev_t dev)
{
        sync_buffers(dev, 0);
        sync_supers(dev);
        sync_inodes(dev);
        sync_buffers(dev, 0);
}

int fsync_dev(dev_t dev)
{
        sync_buffers(dev, 0);
        sync_supers(dev);
        sync_inodes(dev);
        return sync_buffers(dev, 1);
}

asmlinkage int sys_sync(void)
{
        fsync_dev(0);
        return 0;
}

The second arg to sync_buffers() says whether it should 'wait'. So fsync_dev() waits, sync_dev() doesn't.

Exactly what is waited for when is hard to track. I'd need to read a lot more code to see what things sys_sync waits for today. But it does wait for something, but before 1.3.20, and all through "Unix", it certainly didn't wait for everything.

Trading off safety and performance in the kernel

Posted May 13, 2015 15:35 UTC (Wed) by zblaxell (subscriber, #26385) [Link] (3 responses)

> How long can it take to sync that little?

A class 4 SD/MMC card inside a laptop that has 16GB of RAM: one hour, four minutes.

A network filesystem that is broken because the network interface is down due to the suspend event: forever.

The latter case is the reason why I've not had that SYS_sync() line in my kernels for almost 10 years.

Trading off safety and performance in the kernel

Posted May 13, 2015 16:53 UTC (Wed) by pizza (subscriber, #46) [Link] (2 responses)

> A class 4 SD/MMC card inside a laptop that has 16GB of RAM: one hour, four minutes.

Come now, this use case is rather facetious.

Or are you seriously saying that someone who will configure a system with 16GB of RAM (presumably for performance) will be so cheap as to use a class 4 SD card for backing storage? (a 16GB UHS-1 card can be easily found for $11!)

> A network filesystem that is broken because the network interface is down due to the suspend event: forever.

Heh. Given that I literally haven't had this happen in literally over a decade, this makes me wonder what broken-ass distro you're using that included such screwed up suspend scripts. (And for the record, I do this multiple times a day with both CIFS and NFS mounts)

Then again, you did say that you replaced your distro's suspend/resume scripts with your own stuff, so I suppose you only have yourself to blame.

> The latter case is the reason why I've not had that SYS_sync() line in my kernels for almost 10 years.

Good for you; you worked around your own mistakes...Do you want a cookie or something?

Trading off safety and performance in the kernel

Posted May 13, 2015 19:29 UTC (Wed) by zblaxell (subscriber, #26385) [Link] (1 responses)

> Or are you seriously saying that someone who will configure a system with 16GB of RAM (presumably for performance) will be so cheap as to use a class 4 SD card for backing storage?

I'm saying that someone on a client site will use whatever horrible SD card they are given to transfer a crapton of data, then unexpectedly have to complete a suspend/resume cycle because they need to pack up the laptop so they can move to another building for a meeting with someone important who is only available now--not in an hour, when it's finally possible to remove the SD card without data loss.

This is a common case for me. I've modified both kernel and userspace to mitigate the problem, but anyone using default Linux distro and kernel behavior is screwed.

> Heh. Given that I literally haven't had this happen in literally over a decade, this makes me wonder what broken-ass distro you're using that included such screwed up suspend scripts. (And for the record, I do this multiple times a day with both CIFS and NFS mounts)

Debian gave a choice of two bad behaviors: fail to suspend, or forcibly umount filesystems to avoid suspend blocking on sync() (or read, or any other filesystem operation for that matter).

What I want is for the filesystem to stay mounted. If the suspend/resume is for a short walk to another building, there is no need to disrupt userspace with a umount. Any reads or writes in progress can be completed after resume using the network filesystem client's existing code for dealing with ordinary network interruptions. There should be no blocking code paths on suspend for such filesystems *at all*.

Trading off safety and performance in the kernel

Posted May 14, 2015 16:06 UTC (Thu) by marcH (subscriber, #57642) [Link]

> If the suspend/resume is for a short walk to another building,

Is there a user interface to express the difference between short walks versus long commute home + change of network?

Among the many things NFS was really not designed for, mobility must be very near the top of the list.

All traditional network filesystems suck and are on their way to the dustbin of history since they ignored Fallacy of Distributed Computing number 1 (and a few others). NFS just sucks more.

Trading off safety and performance in the kernel

Posted May 13, 2015 19:17 UTC (Wed) by dlang (guest, #313) [Link] (7 responses)

> What "hours of data" are you talking about?
>
> Dirty pages get flushed after about 30 seconds, so if you don't 'sync' before suspend, then the most you could lose if resume fails is data that was written by an app in the 30 seconds before suspend.

so you save the document you've worked for hours on, and close the lid.

the data was only written by the app a few seconds before suspending, but it represents hours of data for the user.

Trading off safety and performance in the kernel

Posted May 13, 2015 22:30 UTC (Wed) by neilbrown (subscriber, #359) [Link] (6 responses)

> the data was only written by the app a few seconds before suspending, but it represents hours of data for the user.

Any app worthy of the name will have called 'fsync' before giving you a visual indication that the save has completed. emacs certainly does.

If you close the lid before getting that visual notification, then you only have yourself to blame. In that case a sys_sync in the suspend path may not help anyway as the app may not have finished writing.

And of course, any real app would have auto-saved every few minutes so even in a disaster you wouldn't lose more than a few minutes work.

There is definitely a place to call fsync (rarely sys_sync) to make sure data is safe. The suspend path is not that place.

Trading off safety and performance in the kernel

Posted May 14, 2015 17:13 UTC (Thu) by marcH (subscriber, #57642) [Link] (5 responses)

> Any app worthy of the name will have called 'fsync' before giving you a visual indication that the save has completed. emacs certainly does. [...]
> And of course, any real app would have auto-saved every few minutes so even in a disaster you wouldn't lose more than a few minutes work.

"Don't break userspace" - even userspace bugs.

Trading off safety and performance in the kernel

Posted May 14, 2015 22:31 UTC (Thu) by neilbrown (subscriber, #359) [Link] (4 responses)

> "Don't break userspace" - even userspace bugs.

If userspace needs the kernel to call sync before crashing then the user-space is already broken. Systems can crash without entering suspend first.

But that is a big "if". Are there actually any non-trivial apps which don't save their data properly?

Trading off safety and performance in the kernel

Posted May 21, 2015 21:58 UTC (Thu) by Wol (subscriber, #4433) [Link]

> Are there actually any non-trivial apps which don't save their data properly?

Okay, it's not linux, but ... MS Word ?

(maybe it's changed, but I OFTEN lose data if I'm working on a document and it crashes - often it's the attempted auto-save that causes the crash :-(

Cheers,
Wol

Trading off safety and performance in the kernel

Posted May 23, 2015 16:20 UTC (Sat) by anton (subscriber, #25547) [Link] (2 responses)

If userspace needs the kernel to call sync before crashing then the user-space is already broken.

No, the file system is broken.

Are there actually any non-trivial apps which don't save their data properly?

No, there are just file systems (e.g., ext4) which do not provide decent guarantees and use this kind of rethoric to justify their poor behaviour. I expect that pretty much all non-trivial applications do not jump all the time through all the hoops that some developers of file systems expect of them; that's because they have no good way to test that they meet the expectations of these file system developers, and most application developers probably have many more urgent things to care about.

Anyway, one example of a broken file system losing data of a popular application (including the autosave files that the application produces regularly) is here.

Trading off safety and performance in the kernel

Posted May 25, 2015 6:52 UTC (Mon) by neilbrown (subscriber, #359) [Link] (1 responses)

> No, the file system is broken.

That makes no sense.

I agree that "If a filesystem needs the kernel to call sync before crashing then the filesystem in already broken" with the understanding that "needs" means "needs in order to protect the data that it is responsible for."
Data that has not yet been written to the filesystem is certainly not the filesystem's reponsibility.
Data that has been written but hasn't been the subject of 'fsync' is also not completely the filesystem's responsibility (unless you mount with '-o sync').

> one example of a broken file system losing data of a popular application

That is a filesystems from decades ago. Yes it was broken, no question. Linux filesystems aren't like that. All non-trivial Linux filesystems do journalling of metadata, which is much safer than synchronous metadata updates. I cannot promise they are all 100% bug free in every release, but I am certain that calling 'sync' in the suspend path isn't going to usefully fix any bug that they might have.

It also sounds like that "popular application", which was emacs, wasn't calling 'fsync' as it should and as it certainly now does.
Yes - bugs should be fixed. But let's not scatter "sys_sync" calls around and pretend that fixes them.

Trading off safety and performance in the kernel

Posted May 25, 2015 9:00 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

> That is a filesystems from decades ago. Yes it was broken, no question. Linux filesystems aren't like that. All non-trivial Linux filesystems do journalling of metadata, which is much safer than synchronous metadata updates. I cannot promise they are all 100% bug free in every release, but I am certain that calling 'sync' in the suspend path isn't going to usefully fix any bug that they might have.
As if on cue, today my laptop corrupted my filesystem during suspend/resume. I started synchronization of several large (~200G) directories with lots of small files from our local network and then totally forgot about it. Then I closed the laptop's lid and went home.

Resume failed (yet again) and after reboot my BTRFS filesystem refused to mount.