Trading off safety and performance in the kernel
Trading off safety and performance in the kernel
Posted May 13, 2015 1:32 UTC (Wed) by zblaxell (subscriber, #26385)In reply to: Trading off safety and performance in the kernel by dlang
Parent article: Trading off safety and performance in the kernel
...because it's *not* a problem.
Really, it's not. It's been years since I had a healthy laptop run out of battery. They last for hours at full load and days on suspend.
> Far better to generate some extra heat for a little bit than loosing hours of data because it didn't get flushed out.
No, it's not better.
If the sync takes longer than 20 seconds, the suspend fails completely and the laptop stays on (unless you've set up your ACPI scripts to forcibly kill the power at that point).
While the laptop is on, it's damaging its battery, reducing the charge it can hold *forever*. This also conveniently breaks the battery charge estimation function, so you get to be surprised when your battery abruptly shuts down at "40% charge" in the future.
There's no "hours of uncommitted data" either. There's one filesystem commit interval at most. If you're sane that's not more than 30 seconds or so. If you're not sane, you can configure laptop-mode-tools to run sync() from userspace.
Posted May 13, 2015 2:01 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link]
I've had resume problems with ALL laptops that I had. Including MacBook Pro with OS X. Linux and Windows laptops tend to be even crashier.
Posted May 13, 2015 3:02 UTC (Wed)
by pizza (subscriber, #46)
[Link] (3 responses)
Until you leave it in your backpack for *four* days instead of three, thanks to a long weekend.
Or the battery gets jostled. Or the battery isn't so healthy any more (do you get a nice email when it crosses that threshold?) Or you suspend it when the battery is relatively low. Or it doesn't resume properly because something plugged in was unplugged. Or one of many, many, many failure modes.
Or when the "low battery" threshold wakes the system up and the thing dies hard when trying to write the buffers back out.
Every single one of these situations has happened to me. (Funnily enough, in my experience, Linux suspend/resume is actually *more* reliable than Windows on the last couple of Tier-1 laptops I've owned.
I absolutely *want* any dirty buffers to be flushed to disk and the filesystems synced into a safe state before a system suspends. More than want; that is a hard requirement. Data loss is never acceptible, but when it's so damn easily preventable there is simply no excuse.
Posted May 13, 2015 6:14 UTC (Wed)
by tpo (subscriber, #25713)
[Link] (2 responses)
The single most frequent case "loss of data" occurring here is, when I work without the laptop being attached to power for some reason, then notice, that the end of battery power is near and close the lid.
At some point my laptop will wake up by itself and try to suspend to disk, which doesn't and hasn't ever worked here and will then run out of power hanging in that state.
I had managed to disable this behavior somehow in the past but some well meaning part of the system switched that on again and I am currently unwilling to spend my time to find out how to disable it again.
And the loss for me isn't usually the "data" but it is the state and context of the desktop: what shells did I have open with what content? Which files was I editing? Which applications were running? This can be more than annoying when I had my laptop prepared with all stuff needed open and at the right place to do a presentation only to find out when opening the laptop in front of people that its dead.
Of course syncing file buffers out to disk or not will not change anything wrt to the problem described in the last paragraph.
And it doesn't help that XFce's power indicator is too badly designed for me to be able to notice that the battery is running too low.
Posted May 13, 2015 13:40 UTC (Wed)
by jospoortvliet (guest, #33164)
[Link]
Posted May 13, 2015 16:03 UTC (Wed)
by zblaxell (subscriber, #26385)
[Link]
I fired my distro's ACPI event-handling code. Several years ago it started being not merely useless, but an active source of failure. After several rounds of patches that consisted only of deletions, I gave up and replaced the entire thing with:
#!/bin/sh
I have "find out where the acpi-support package is hiding today and kill it" on my to-do list for every dist-upgrade because the machine can be physically damaged if I don't.
Posted May 13, 2015 19:56 UTC (Wed)
by kleptog (subscriber, #1183)
[Link]
Aah, so that's what happens. So what I need is a script that does: if suspend fails and laptop lid is closed, start playing an alarm at maximum volume. And cuts power if no response within a few minutes.
Much better than hearing a loud whirring noise a few hours later and then pulling an overcooked laptop out of your bag.
> There's no "hours of uncommitted data" either. There's one filesystem commit interval at most.
Well, not everything is saved on disk. If you have a document open that isn't saved then sync won't help anyway. It'd be great if there was a way to announce to running programs that the system is being suspended and to dump state, but that doesn't exist or isn't widely supported. Currently suspend (for me) is primarily a way to avoid the startup time. It's not reliable enough to rely on.
Mind you, I just found a basic-pm-debugging.txt in the kernel documentation which describs steps that can be used to debug issues. My current problem is that ext4 is trying to read a directory inode on resume while the the disk is not ready, and it remounts the rootfs readonly. The machine is then essentially unrecoverable (neither su nor sudo work with a readonly fs).
Posted Jun 12, 2015 17:32 UTC (Fri)
by bluefoxicy (guest, #25366)
[Link] (1 responses)
SATA 3 6Gb/s: 15 gigabytes of recently-written dirty data to write
SATA 1 1.5Gb/s: 3.75 gigabytes
ATA100 100Mbit/s: 250 megabytes
I'm pretty sure this is a non-issue for any hardware made since 2003. /proc/meminfo shows 624kb of dirty pages on a big ass database server, 2712 on a busy Web server in a cluster. It's rare to have several gigabytes of unflushed disk just hanging around in memory; I've never seen more than a few megabytes.
Posted Jun 12, 2015 17:50 UTC (Fri)
by raven667 (subscriber, #5198)
[Link]
Your examples include a read heavy, write little web server and a db server which is probably explicitly flushing every IO to disk so that there is little data to write back in either case, neither of which is representative of how a laptop is used. It's easy to create a bunch of buffered writes, by copying a DVD image or compiling software or copying memory to disk for suspend, and on a laptop you may delay writes longer than normal to keep the disk subsystem in a low power state for as long as possible, leading to a storm of activity.
Trading off safety and performance in the kernel
Oh, what a BS.
Trading off safety and performance in the kernel
Trading off safety and performance in the kernel
Trading off safety and performance in the kernel
Trading off safety and performance in the kernel
echo mem > /sys/power/state
Trading off safety and performance in the kernel
Trading off safety and performance in the kernel
Trading off safety and performance in the kernel