Atomicity
Atomicity
Posted Sep 17, 2009 8:17 UTC (Thu) by Nicolas.Boulay (guest, #59722)Parent article: POSIX v. reality: A position on O_PONIES
rename() is a kind of tricks to minimise the problem of empty file after a power failure.
But what an application writer really want is a fast file system that do _atomicity_ : that means he want the previous file states or the new content of the last sys_write() and nothing else.
At the time of fsync(), i think we better need a fdone() which should be a kind of "wait on complete transaction" instead of "flush everthing quickly".
If fdone() is too long, i could use threads. If fdone() take time, it's for bandwith optimisation. One of a great linux optimisation for system without important data is to map fsync() to a void function, then everything fly :)
Is it coslty to have the behavior of open()/write()/rename() for a single sys_write() ?
Posted Sep 25, 2009 3:39 UTC (Fri)
by xoddam (guest, #2322)
[Link] (2 responses)
Rename is *not* a 'kind of trick'. By specification, it is guaranteed to be atomic in the face of concurrent readers on a working system. Unfortunately the specification has nothing to say about it with respect to unclean shutdown.
Extending the atomicity of rename() so that it still applies in the face of a successful recovery (such as a journal replay) after an unclean shutdown is perfectly logical.
Posted Oct 26, 2009 10:09 UTC (Mon)
by Nicolas.Boulay (guest, #59722)
[Link] (1 responses)
KB is ok, MB is not.
It's typical in any data base work. In that case, rename() have no use.
Posted Oct 30, 2009 4:30 UTC (Fri)
by xoddam (guest, #2322)
[Link]
Databases traditionally use very large files because their implementors have chosen to re-implement filesystem functionality at the low level for performance reasons.
Most often they use their own journalling implementations and fsync(). This is of course legitimate. But using filesystem-level rename to provide atomicity would also be perfectly reasonable.
The size of the renamed and replaced file is an implementation detail only. Rename doesn't impose a requirement to copy large hunks of data only to throw it away. The unit of replacement might be a btree node, for example.
Nothing forces an implementor to use large files for any particular purpose.
Atomicity
Atomicity
Atomicity