|
|
Log in / Subscribe / Register

Garrett: ext4, application expectations and power management

Garrett: ext4, application expectations and power management

Posted Mar 15, 2009 20:19 UTC (Sun) by drag (guest, #31333)
In reply to: Garrett: ext4, application expectations and power management by kasperd
Parent article: Garrett: ext4, application expectations and power management

I was thinking a bit more about it.

Basically people want 3 levels of data integrity in applications (paraphrasing what you and other people are saying):

1. High priority: Write data _now_. All data is safe in case of system failure.

2. Normal priority: Ensure no corruption of existing data in case of system failure.

3. Low priority: temporary data that will get used for a session. No requirements for preserving data in case of system failure.

Ext4 (as it existed) can only provide 1 or 3, but not 2.


to post comments

Garrett: ext4, application expectations and power management

Posted Mar 15, 2009 20:32 UTC (Sun) by smoogen (subscriber, #97) [Link] (3 responses)

Of the file systems is EXT3 the only one that does give that promise (and only by accident as it was an unintended consequence)?

xfs would seem not to and btrfs not to (going from the original blog post. I don't know about all types of reiserfs or jfs.

I am not saying the 'promise' is not important.. but it might be one that file-system developers should be aware that people want versus what they think people should expect :)

Garrett: ext4, application expectations and power management

Posted Mar 15, 2009 23:41 UTC (Sun) by drag (guest, #31333) [Link] (2 responses)

Ya. It seems to me that Ext3 only works that way by accident.

But it seems that for consumer devices this sort of behavior could actually be a fundamental design improvement over the way file systems have traditionally worked and could be advertised as a actual selling point (that is being able to do promise # 2. reliably.)

Garrett: ext4, application expectations and power management

Posted Mar 16, 2009 16:12 UTC (Mon) by jspaleta (subscriber, #50639) [Link]

I've always wondered.. how many of the more important or more impactful improvements in technology in the long view of history were simply uncharacteristically happy accidents versus premeditated "design" decisions.

-jef

Garrett: ext4, application expectations and power management

Posted Mar 19, 2009 23:28 UTC (Thu) by jzbiciak (guest, #5246) [Link]

Not really by accident. I believe the necessary dependence is established by the "data=ordered" mount option. That's pretty much what we need to fix this issue: Make sure that the data is on the disk before you write the updated metadata.

That doesn't mean you need to flush things to the disk early. It just means that things have to happen in a particular order.

The three levels of write priority

Posted Mar 16, 2009 7:41 UTC (Mon) by rvfh (guest, #31018) [Link]

I like the three levels of commit priority you set, and I would rather this was an open() option than a application decision to call fsync() (and when in case 2.?)

1. O_COMMITQUICK commit to disk every 5 seconds
2. O_COMMITNORMAL commit to disk every 30 seconds
3. O_COMMITLAZY commit to disk only if need be, or maybe after 300 seconds

Just my 0.02€

Garrett: ext4, application expectations and power management

Posted Mar 16, 2009 9:58 UTC (Mon) by mjthayer (guest, #39183) [Link] (3 responses)

I have asked this a couple of times but not yet got a good answer. I presume that the kernel knows what has been written back and what not. Can't it optionally keep its own log - either in a file on the filesystem or in pre-allocated blocks on a swap device - where it writes details of any transaction which the target filesystem won't write back within a certain maximum timeframe. When the filesystem does do the writeback the transaction can be purged from the log. This could be enabled or disabled for the entire system, regardless of what filesystems are in use, and would not require Ted to add code he doesn't like.

Garrett: ext4, application expectations and power management

Posted Mar 16, 2009 10:52 UTC (Mon) by MathFox (guest, #6104) [Link] (2 responses)

Michael, Yes, the kernel could do it, but such a log would have to be written to disk... But then it would be more efficient to directly write that log directly to the file system.
You'll create similar issues wrt. performance and commit intervals with a kernel-based log, but with the added overhead of writing data twice.

Garrett: ext4, application expectations and power management

Posted Mar 16, 2009 10:54 UTC (Mon) by mjthayer (guest, #39183) [Link] (1 responses)

Would that apply even if the blocks for the log were reserved in advance and their location known to the kernel?

Garrett: ext4, application expectations and power management

Posted Mar 16, 2009 10:55 UTC (Mon) by mjthayer (guest, #39183) [Link]

I will answer my own question - presumably yes, because the kernel can't assume that the filesystem does a simple block to disk mapping.

Garrett: ext4, application expectations and power management

Posted Mar 18, 2009 17:19 UTC (Wed) by rich0 (guest, #55509) [Link]

I agree with your points. And putting fsyncs all over the place in applications is not very helpful.

My mythtv backend (which does a lot of other stuff as well) used to have lots of problems with ivtv buffer overruns. It turns out that mythtv users a fairly small cache, and when it writes to disk it does an fsync on every write. That means that the disk write cache is almost constantly getting flushed and the ability of the kernel to re-order writes is compromised, which then causes io waiting when the system is busy with other stuff as well.

When I increased the buffer moderately and got rid of the fsync everything worked great. So, if I lose power maybe I might lose an extra 10 seconds of the TV show I was recording. However, before the fix I was getting glitches in the video all the time due to overruns.

The role of the OS should be to allow applications to indicate the sensitivity of data and then the OS should figure out how to balance contention for the disk taking into account this kind of weighting. Applications should not be micro-managing the disk cache - that defeats the ability of the kernel to optimize the cache.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds