|
|
Subscribe / Log in / New account

Better get the basics straight

Better get the basics straight

Posted Mar 17, 2009 21:12 UTC (Tue) by job (guest, #670)
Parent article: Better than POSIX?

This discussion is made harder by statements like: "any application developer who wants to be sure that data has made it to persistent storage". That is far from the question here. Delayed allocation is good, for performance reasons. But delaying data more than the corresponding meta data creates problems with atomicity.

I would have appreciated if the article mentioned sed in-line editing or shell scripts using mktemp && mv as examples of potential problems. After all, no one expects to be forced to call sync after that, do they? Calling most shells examples of broken user space applications is a bit of a stretch.


to post comments

Better get the basics straight

Posted Mar 17, 2009 22:42 UTC (Tue) by butlerm (subscriber, #13312) [Link] (6 responses)

I agree. I am coming around to the position that all self respecting
filesystems should provide ordered (but not necessarily durable) renames by
default.

Then, as suggested by a commenter elsewhere, we could add something like a
"rename_unordered" function for those relatively unusual cases where a user
is willing to risk severe data loss to get better performance. In addition,
for portability reasons, we should have an option so that an application
can discover whether or not an fsync is required to get ordered renames.
fsync is very expensive when you don't need (synchronous) durability
semantics.


Better get the basics straight

Posted Mar 17, 2009 22:53 UTC (Tue) by quotemstr (subscriber, #45331) [Link] (5 responses)

POSIX has a mechanism to do precisely that:
SYNOPSIS
  long fpathconf(int fildes, int name);
  long pathconf(const char *path, int name);

DESCRIPTION

The fpathconf() and pathconf() functions shall determine the current value of a configurable limit or option (variable) that is associated with a file or directory.

I don't see a problem with adding platform-specific values for name: if an conscientious application asks for _LINUX_SAFE_RENAME on a system that doesn't even know it exists, pathconf will just return -1 and the application will say, "oh, okay. I need to use fsync."

If the POSIX people ever get around to standardizing safe semantics for rename, then Linux's pathconf can just support both the original and the standard name.

Better get the basics straight

Posted Mar 18, 2009 1:00 UTC (Wed) by bojan (subscriber, #14302) [Link] (2 responses)

You'll probably need this on a per-FS basis, because you could mount an FS on Linux that doesn't support this, although the system may generally support it.

Better get the basics straight

Posted Mar 18, 2009 1:04 UTC (Wed) by quotemstr (subscriber, #45331) [Link] (1 responses)

You'll probably need this on a per-FS basis

Of course it's per-filesystem. That's the whole point of using pathconf instead of sysconf.

Better get the basics straight

Posted Mar 18, 2009 1:11 UTC (Wed) by bojan (subscriber, #14302) [Link]

OOPS! My bad, sorry :-(

fpathconf

Posted Mar 18, 2009 4:41 UTC (Wed) by butlerm (subscriber, #13312) [Link]

Thanks for posting that, that is very helpful.

Better get the basics straight

Posted Mar 19, 2009 7:27 UTC (Thu) by job (guest, #670) [Link]

That would be one way of dealing with different filesystem semantics.

I would hope that the operation "write data to file" gets less complex, not more. There is already a little dance of calls to be made (Ted writes about it). If we add logic on the application level to handle that some filesystems expect fsync on the directory, some on only the file and some manage without, it becomes even more so. In tens of thousands of applications.

But this is only vaguely related to the data ordering issue. In an interactive program or where performance is critical you may not want to wait until data is commited to disk. Latency kills.

sed and shell

Posted Mar 18, 2009 5:13 UTC (Wed) by xoddam (subscriber, #2322) [Link]

Indeed. I use and maintain scripts using mktemp and mv on a regular basis. Some of them *also* use sed -i (shudder).

This is *the* way to achieve atomicity on Unix. It always was.

We didn't use to have journaling filesystems and we never used to expect anything at all to work after a crash. Crashes happened and they often meant hours of work, possibly reinstalling everything.

To discover that ext3 data=ordered just kept on working after a crash was a real eye-opener for me. I never realised before that such robustness was possible (just like I never realised prior to Linux that I could afford my own Unix box) and I am not about to relinquish it! I'm sure I speak for many users here. It *was* an unexpected nice-to-have when we first got it; but it has become a solid requirement.

Delayed allocation without an implicit write barrier before renaming a newly-written file virtually guarantees data loss after a crash with existing applications. It is therefore a regression from the status quo, albeit to something somewhat better than the status of a few years back.

Kudos and thanks to Ted for implementing this must-have write barrier (and also for improving the chances that unsafe truncate-and-write is less likely to hit the inevitable race condition, though IMO he's quite right that applications doing that are broken).

I just wish he wouldn't keep insisting that fsync at application level is the right way to achieve what we want.

POSIX and fsync have nothing to do with it (any journaling filesystem provides much more than POSIX), nor do application authors who forgot to think about recovery after an OS crash. A journaling filesystem *can* guarantee atomic renames, so it *should*, for the sake of users' sanity, not for the sake of a standards document.

Standards can be updated to follow best practice. They often do.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds