Atomicity

Posted Sep 25, 2009 3:39 UTC (Fri) by xoddam (guest, #2322)
In reply to: Atomicity by Nicolas.Boulay
Parent article: POSIX v. reality: A position on O_PONIES

We already have an API for atomicity in POSIX. It is called rename().

Rename is *not* a 'kind of trick'. By specification, it is guaranteed to be atomic in the face of concurrent readers on a working system. Unfortunately the specification has nothing to say about it with respect to unclean shutdown.

Extending the atomicity of rename() so that it still applies in the face of a successful recovery (such as a journal replay) after an unclean shutdown is perfectly logical.

Atomicity

Posted Oct 26, 2009 10:09 UTC (Mon) by Nicolas.Boulay (guest, #59722) [Link] (1 responses)

You completly forget the case where the file is too big to be copied.

KB is ok, MB is not.

It's typical in any data base work. In that case, rename() have no use.

Atomicity

Posted Oct 30, 2009 4:30 UTC (Fri) by xoddam (guest, #2322) [Link]

Database implementors have many choices for implementing data stores and any transactional semantics that they need.

Databases traditionally use very large files because their implementors have chosen to re-implement filesystem functionality at the low level for performance reasons.

Most often they use their own journalling implementations and fsync(). This is of course legitimate. But using filesystem-level rename to provide atomicity would also be perfectly reasonable.

The size of the renamed and replaced file is an implementation detail only. Rename doesn't impose a requirement to copy large hunks of data only to throw it away. The unit of replacement might be a btree node, for example.

Nothing forces an implementor to use large files for any particular purpose.