Better get the basics straight
Better get the basics straight
Posted Mar 17, 2009 21:12 UTC (Tue) by job (guest, #670)Parent article: Better than POSIX?
I would have appreciated if the article mentioned sed in-line editing or shell scripts using mktemp && mv as examples of potential problems. After all, no one expects to be forced to call sync after that, do they? Calling most shells examples of broken user space applications is a bit of a stretch.
Posted Mar 17, 2009 22:42 UTC (Tue)
by butlerm (subscriber, #13312)
[Link] (6 responses)
Then, as suggested by a commenter elsewhere, we could add something like a
Posted Mar 17, 2009 22:53 UTC (Tue)
by quotemstr (subscriber, #45331)
[Link] (5 responses)
I don't see a problem with adding platform-specific values for
If the POSIX people ever get around to standardizing safe semantics for
Posted Mar 18, 2009 1:00 UTC (Wed)
by bojan (subscriber, #14302)
[Link] (2 responses)
Posted Mar 18, 2009 1:04 UTC (Wed)
by quotemstr (subscriber, #45331)
[Link] (1 responses)
Of course it's per-filesystem. That's the whole point of using
Posted Mar 18, 2009 1:11 UTC (Wed)
by bojan (subscriber, #14302)
[Link]
Posted Mar 18, 2009 4:41 UTC (Wed)
by butlerm (subscriber, #13312)
[Link]
Posted Mar 19, 2009 7:27 UTC (Thu)
by job (guest, #670)
[Link]
I would hope that the operation "write data to file" gets less complex, not more. There is already a little dance of calls to be made (Ted writes about it). If we add logic on the application level to handle that some filesystems expect fsync on the directory, some on only the file and some manage without, it becomes even more so. In tens of thousands of applications.
But this is only vaguely related to the data ordering issue. In an interactive program or where performance is critical you may not want to wait until data is commited to disk. Latency kills.
Posted Mar 18, 2009 5:13 UTC (Wed)
by xoddam (subscriber, #2322)
[Link]
This is *the* way to achieve atomicity on Unix. It always was.
We didn't use to have journaling filesystems and we never used to expect anything at all to work after a crash. Crashes happened and they often meant hours of work, possibly reinstalling everything.
To discover that ext3 data=ordered just kept on working after a crash was a real eye-opener for me. I never realised before that such robustness was possible (just like I never realised prior to Linux that I could afford my own Unix box) and I am not about to relinquish it! I'm sure I speak for many users here. It *was* an unexpected nice-to-have when we first got it; but it has become a solid requirement.
Delayed allocation without an implicit write barrier before renaming a newly-written file virtually guarantees data loss after a crash with existing applications. It is therefore a regression from the status quo, albeit to something somewhat better than the status of a few years back.
Kudos and thanks to Ted for implementing this must-have write barrier (and also for improving the chances that unsafe truncate-and-write is less likely to hit the inevitable race condition, though IMO he's quite right that applications doing that are broken).
I just wish he wouldn't keep insisting that fsync at application level is the right way to achieve what we want.
POSIX and fsync have nothing to do with it (any journaling filesystem provides much more than POSIX), nor do application authors who forgot to think about recovery after an OS crash. A journaling filesystem *can* guarantee atomic renames, so it *should*, for the sake of users' sanity, not for the sake of a standards document.
Standards can be updated to follow best practice. They often do.
Better get the basics straight
filesystems should provide ordered (but not necessarily durable) renames by
default.
"rename_unordered" function for those relatively unusual cases where a user
is willing to risk severe data loss to get better performance. In addition,
for portability reasons, we should have an option so that an application
can discover whether or not an fsync is required to get ordered renames.
fsync is very expensive when you don't need (synchronous) durability
semantics.
POSIX has a mechanism to do precisely that:
Better get the basics straight
SYNOPSIS
long fpathconf(int fildes, int name);
long pathconf(const char *path, int name);
DESCRIPTION
The fpathconf() and pathconf() functions shall determine the current value of a configurable limit or option (variable) that is associated with a file or directory.
name
: if an conscientious application asks for _LINUX_SAFE_RENAME on a system that doesn't even know it exists, pathconf
will just return -1 and the application will say, "oh, okay. I need to use fsync."
rename
, then Linux's pathconf
can just support both the original and the standard name
.
Better get the basics straight
Better get the basics straight
You'll probably need this on a per-FS basis
pathconf
instead of sysconf
.
Better get the basics straight
fpathconf
Better get the basics straight
sed and shell