Now what about shutdown? [LWN.net]

Now what about shutdown?

Posted Sep 22, 2008 22:48 UTC (Mon) by drag (guest, #31333) [Link] (4 responses)

Well if the applications are designed correctly then you can reduce the shutdown time to a few msec.

So the applications should be designed that at any time they can have the power cut and not lose data. You could have a 'shutdown' thread in the kernel that does the equivalent of (in psuedo-shell):

killall * && sync && acpi-poweroff

----------

For example; Say your editing a file and Vim caught a shutdown message then it wouldn't bother you with the details. It would simply double check that the last change you've made was committed to a temporary file on disk (which should of already been done if you were gone long enough to navigate to the shutdown button) and just die.

Next time you start up Vim you go right back to your edit. It does this mostly already.

I can send a killall -9 epiphany-browser and when I open the browser again it just starts me off were it left off. That's what it does now.

Similar things for OO.org and most other decent applications that I use. They don't need some complicated shutdown procedure.. just tell them to die, and then pull the power.

Just crash.

Posted Sep 23, 2008 1:16 UTC (Tue) by dmarti (subscriber, #11625) [Link]

Val Henson did a related LWN piece on Crash-only software. You have to write crash recovery anyway, so why not "crash" every time?

Now what about shutdown?

Posted Sep 23, 2008 1:16 UTC (Tue) by JoeBuck (subscriber, #2330) [Link] (2 responses)

Val Henson wrote an article about that for LWN: Link.

The observation was that it often takes less time to crash and restart a program than it does to shut it down cleanly. So why not always crash it?

Now what about shutdown?

Posted Sep 23, 2008 7:31 UTC (Tue) by nix (subscriber, #2304) [Link] (1 responses)

Because programs that crash are not necessarily in a consistent state, so
crash recovery can fail where ordinary bootups do not. (Of course, if you
can make crash recovery reliable with something like application-level
journalling, your point stands, unless like databases the post-crash
bootup is really expensive. But that's probably a rare case.)

Other fun features for "crashable" apps

Posted Sep 25, 2008 14:59 UTC (Thu) by dmarti (subscriber, #11625) [Link]

Programs that come back in a messed-up state after a crash make baby Jesus cry. The vi clone "elvis" had good recovery very early, and we have enough disk space now to do it even for very large media files.

A user actions log that an app could replay might have other uses, too. Of course there's deep undo and bug reporting, and automatic macro writing by identifying common steps might be useful. There was even a legal case a few years ago where a composer couldn't prove he had created a certain audio file, because he couldn't put the sliders of his GUI audio app on the exact right pixel.

Now what about shutdown?

Posted Sep 23, 2008 3:11 UTC (Tue) by arjan (subscriber, #36785) [Link] (4 responses)

So far we found one bottleneck in shutdown: all the dirty pagecache data needs to be flushed to the disk before we can power off, and some of the SSDs and disks out there are just not very fast.. this can take several seconds easily.

Now what about shutdown?

Posted Sep 25, 2008 15:13 UTC (Thu) by dmarti (subscriber, #11625) [Link] (3 responses)

If the app really needed that data it would have called fsync a while ago -- so just halt. Applications are going to have to be able to handle coming back after a kernel panic or a kicked-out power cord anyway.

Now what about shutdown?

Posted Sep 26, 2008 2:24 UTC (Fri) by njs (subscriber, #40338) [Link] (2 responses)

Uh... so if I hit "save" while using my app, that app should always fsync before returning? That sounds awful. (I guess the alternative is that we just throw away the files that users thought they had saved, but that seems suboptimal.)

IIRC emacs (used to?) do this by default, and until I disabled that it was unusable on a laptop, because hitting C-x C-s blocked everything for a few seconds waiting for the drive to spin up. ELISP SMASH

When is "save" really "maybe save"?

Posted Sep 26, 2008 15:05 UTC (Fri) by dmarti (subscriber, #11625) [Link] (1 responses)

Applications could be smart about this. The answer might be something like: fsync on save if 100 user actions or 10s of CPU time have been spent on the file since the last save. Or fsync on save if the file has gone from a broken state (invalid HTML, spelling errors, audio that clips, program that won't compile) to a fixed state.

(And you could always do the fsync in a separate thread or process, so the app is responsive again.)

When is "save" really "maybe save"?

Posted Sep 26, 2008 21:43 UTC (Fri) by njs (subscriber, #40338) [Link]

fsync doesn't mean "hey kernel actually write this to disk", it means "write this to disk *now*" (and in practice, "write everything to disk *now*", because our filesystems are not that great). If you're running it async, then you get exactly the same semantics as a plain write. The kernel already guarantees that stuff will be written to disk within n seconds, with n a tweakable parameter that's important for power saving. I guess I just don't see the advantages of moving that to a million per-app settings, all using an inappropriate interface.

If the goal is to reach a point where we can throw away a lot of data instead of flushing it to disk at shutdown, then this approach is making a classic mistake: it's trying to mark everything that *does* need to go to disk, and hoping that eventually everything will be marked and we'll be able to flip the switch and throw everything else away. The better approach is to mark stuff that isn't important, like some fcntl to request "power-loss semantics" for writes to some file; then you could get some win immediately, and expand it incrementally over time.

I doubt this is easy or important enough to actually get the coordinated effort needed to implement it, but that's how I'd do it...

Now what about shutdown?

Posted Sep 23, 2008 17:09 UTC (Tue) by s0f4r (guest, #52284) [Link]

So far we've seen large jitter in shutdown times - varying from 3 to well over 10 seconds. Shutdown is a really hard case for optimization as we have no idea what threads are going to be woken up and perform work when we killall9, and thus it's extremely unpredictable.

We so try to send a sync() as early as we possibly can to the system to remove some of this load, which seem to help.