Why, instead of trying to work with other file systems to come up with a universal API, are btrfs developers insisting on reinventing their own wheels for everything? Filesystem-level RAID, snapshot API, and now transaction API.
Posted Nov 20, 2009 9:23 UTC (Fri) by forthy (guest, #1525)
[Link]
The interface reminds me on one of the proposed syslet asynchronous IO
API, where you send a list of syscalls to the kernel (to be processed
asynchronously in the background). I thought this was a cool idea (not
new, active message passing is decades old, but not well understood by
most developers ;-), and it was a pity that it didn't make it (due to
being not well understood). For transactions, I don't think it's the
right way.
Chris Mason works at Oracle. He should ask some peers how to implement
transactions properly. Locking is not a proper way to implement
transactions. What you do is: you fork the file system (i.e. create a
writable snapshot and bind the calling process to that snapshot), an when
you are done, you try to merge it. If the merge is successful, continue,
if not, abort (and tell the caller, which can try again). If the system
crashes during the transaction, file system repair will purge that
snapshot. Unlike a git merge, a transaction should abort when a file that
was written to during the transaction had other writers outside. The
atomic part here is only the merge window, and this merge has complete
information available, and especially only has to update metadata.
This doesn't need a special syslet-like ioctrl, the only thing you need
to add is the btrfs_merge() call - creating writable snapshots is already
there. I'd still like to have syslets - but please with the complete set
of kernel calls and for asynchronous IO and such.