LWN.net Logo

Advertisement

Front, Kernel, Security, Distributions, Development. See your byline here on LWN.net.

Advertise here

JLS2009: A Btrfs update

JLS2009: A Btrfs update

Posted Oct 30, 2009 16:14 UTC (Fri) by giraffedata (subscriber, #1954)
Parent article: JLS2009: A Btrfs update

but there's no way in POSIX to tell the kernel to flush multiple files at once. Fixing that is likely to involve a new system call.

Well it doesn't have to be anything fancy like an fsync call with multiple file descriptors. It could be a new kind of fadvise() advice: "this file will be synchronized soon." Do that for every file in the set, then fsync them all one at a time.


(Log in to post comments)

JLS2009: A Btrfs update

Posted Nov 8, 2009 1:27 UTC (Sun) by butlerm (guest, #13312) [Link]

There would also need to be synchronous fadvise call or the equivalent that
had the semantics of "wait on all the pseudo-synchronous fsync operations
that were just initiated". Otherwise the semantics wouldn't be fsync like
at all.

For example suppose you want to do a write rename replace for a set of
files. On many filesystems, the rename meta data operation will commit
before the data from the previous write commits, so the only safe way to do
this is fsync the new version before calling rename. Otherwise, on a crash
you may get no version at all, not the old version, not the new version,
just a zero length file.

If you are doing this with lots of files, a synchronous commit (or the
equivalent) of the data for the whole group prior to the renames for the
whole group is the only efficient way to go. Short of that you would need
to spawn a large number of threads, issue fsync rename operations in each
one and wait for them all to finish.

JLS2009: A Btrfs update

Posted Nov 8, 2009 1:35 UTC (Sun) by giraffedata (subscriber, #1954) [Link]

There would also need to be synchronous fadvise call or the equivalent that had the semantics of "wait on all the pseudo-synchronous fsync operations that were just initiated"

All you need is fsync. Do it on each file in turn, after having done the fadvise on every file. The last fsync will complete at the same time as a single hypothetical "wait on all these files" would.

JLS2009: A Btrfs update

Posted Nov 8, 2009 2:52 UTC (Sun) by butlerm (guest, #13312) [Link]

I understand what you mean now, and that would be a considerable improvement
over serial fsyncs alone. I think you can more or less do the same thing now
on Linux with sync_file_range(...,SYNC_FILE_RANGE_WRITE). Without additional
flags that schedules asynchronous write out of the specified part of the
file. Then when you are all done, call fsync on every fd in the list, as you
say.

That is still somewhat problematic though, since sync_file_range will not
initiate write out of the metadata, which could be significant. Depending on
the way the filesystem handles metadata you could have a very similar
problem, with a journal write and synchronous wait for every fsync...So
something like fadvise options that schedules data and/or metadata for
immediate writeout would be helpful there.

Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds