|
|
Log in / Subscribe / Register

Wishful thinking

Wishful thinking

Posted Mar 16, 2009 5:20 UTC (Mon) by alonz (subscriber, #815)
In reply to: Wishful thinking by bojan
Parent article: Garrett: ext4, application expectations and power management


> Yeah, it would be nice if the semantics of rename were defined
> that way, wouldn't it? Alas, they are not.

Unfortunately, POSIX doesn't actually define any relation between operations on file contents and those on file metadata. File-system developers prefer to pretend that this means the operations should be independent; most application developers prefer the interpretation that any file-system operations (whether they refer to data or metadata) are part of the same “transaction space”.

Personally, I believe the application developers have the saner perspective here, and file-system developers are taking the narrow view. But then, I'm a systems engineer, I like broad views :).


to post comments

Wishful thinking

Posted Mar 16, 2009 5:33 UTC (Mon) by bojan (subscriber, #14302) [Link]

POSIX rename defines that processes should see a consistent picture (which ext4 provides, with or without patches). Text from the rename man page:

> If newpath already exists it will be atomically replaced (subject to a few conditions; see ERRORS below), so that there is no point at which another process attempting to access newpath will find it missing.

Unfortunately, there is not interpretation here. The requirements are clearly set.

Wishful thinking

Posted Mar 16, 2009 7:04 UTC (Mon) by njs (subscriber, #40338) [Link]

>Unfortunately, POSIX doesn't actually define any relation between operations on file contents and those on file metadata. File-system developers prefer to pretend that this means the operations should be independent

There are a lot of misunderstandings about how filesystems actually work in these threads... hopefully I won't add to them :-)

But I think it's more like: POSIX doesn't actually define any relation between operations, whether on file contents or on file metadata or both. File-system developers tend to create a linear ordering on file metadata changes because that makes it easier to implement filesystems that can survive a crash without destroying your whole partition, but they prefer not to impose any other ordering guarantees, because when they do, the users whine about how unbearably slow the filesystem is. (Also, they've never made those guarantees before, and somehow computers have worked.)

In particular, note that when it comes to crash recovery, unless you use data=journal, there is no "transaction space" for data writes at all. You may find any arbitrary subset of your writes have completed, and some may have completed partially -- only the middle of your write buffer has made it to disk -- and etc. That's just how it works.

What we're seeing here is some very limited ordering guarantees being added in for particular heuristically defined sequences of operations, where it turns out they don't hurt performance much. But apps that rely on those guarantees will still be broken when running on any other filesystems. And that's going to bite the folks who develop the next round of filesystems, because they don't know what random non-standard guarantees apps will expect them to provide.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds