What about other filesystems?
What about other filesystems?
Posted Jan 16, 2021 16:38 UTC (Sat) by Wol (subscriber, #4433)In reply to: What about other filesystems? by matthias
Parent article: Fast commits for ext4
AND YOU CAN'T EVEN RELY ON JOURNALLING because you don't know whether the file system has written the journal before, after, or in the middle of writing the data.
Really, all I want is something like fsbarrier(), which GUARANTEES that stuff written after it is written after stuff that was written before it. I don't give a monkeys whether the filesystem batches, parallelises, or what ever other O_PONIES writes, provided I can reason that this call makes sure my stuff hits the disk in the order I expect.
If I want to trash my application's performance with excessive use of fsbarrier(), that's my problem. If the OS expects me to trash EVERYONE ELSE'S performance with excessive use of fsync() or fsfsync(), then that's a BIG problem for the OS!
Oh - and wasn't advice about how to shut a system down always "# sync; sync; sync; halt"? So all of us old hands expect sync() to do a filesystem flush? And do you really expect me as a developer to do that after most writes when I expect something like that to bring the system to its knees?
Cheers,
Wol
Posted Jan 16, 2021 21:10 UTC (Sat)
by matthias (subscriber, #94967)
[Link] (5 responses)
Journalling was primarily invented to ensure the integrity of the filesystem. I.e., to avoid a total loss of the filesystem in case of power loss/crash.
> Really, all I want is something like fsbarrier(), which GUARANTEES that stuff written after it is written after stuff that was written before it.
This would be quite nice. fsync() only guarantees ordering for data written to the given file descriptor. fsbarrier() would probably be easier to use for the app developer. No need to call it for every involved file descriptor. And yes, in many cases guaranteeing ordering would be enough. No need to actually force the data to the disk before the syscall can return.
> I don't give a monkeys whether the filesystem batches, parallelises, or what ever other O_PONIES writes, provided I can reason that this call makes sure my stuff hits the disk in the order I expect.
Why should fsbarrier() be any different in this regard than fsync(). Neither of the two requires the system to cripple performance. And both of them can be implemented by just forcing a global filesystem sync. The performance of fsync is getting much better, as the developers actually use the freedom they have. But I am wondering why you expect filesystem developers to implement the (from a filesystem perspective) much harder fsbarrier() call more efficiently than the relatively straightforward fsync() call. fsbarrier() would probably require a major rewrite of the VFS layer to even be able to compute the list of files that are effected by such a call. Chances are good that developers will use similar shortcuts as they have done for fsync() for decades and performance of the whole system will cripple with such a call.
> Oh - and wasn't advice about how to shut a system down always "# sync; sync; sync; halt"? So all of us old hands expect sync() to do a filesystem flush? And do you really expect me as a developer to do that after most writes when I expect something like that to bring the system to its knees?
sync guarantees a full filesystem flush. No changes there. That is indeed a bit of overkill if you just require ordering. fsync used to be quite inefficient as well, but it is getting better in this regard. And I know nobody who suggests to use sync in normal apps. fsync should be enough if used correctly.
Best,
Posted Jan 16, 2021 21:42 UTC (Sat)
by Wol (subscriber, #4433)
[Link] (4 responses)
So let's say I want to guarantee - let's say ten or twenty - files have all flushed before I start writing the next file, can I do those fsync()s in parallel? Without having to spawn 20 threads and then wait on them all? Whatever, that's a lot of work.
And with an fsfsync, again does that provide the ordering guarantee? I've heard that yes it guarantees everything that's been written gets flushed, but does it put a hard barrier in (like my fsbarrier()), or does it just stall all new writes until all the old writes have been flushed, or does it just guarantee that everything written before the fsfsync is flushed but it doesn't stop newer writes being merged forwards and being caught up in the flush?
Because if fsfsync() puts that barrier in, I'm simply changing a synchronous fsfsync() to an asynchronous fsbarrier(), if it's the second option it's causing a performance impact on the system, and if it's the third option then my app has to do a synchronous call with the performance impact that implies.
Cheers,
Posted Jan 17, 2021 4:41 UTC (Sun)
by NYKevin (subscriber, #129325)
[Link] (3 responses)
- For fsync()'ing multiple files, the standard answer is "use a thread pool." This is also the standard answer to "I want asynchronous I/O like on Windows," so no surprise there.
Posted Jan 18, 2021 5:34 UTC (Mon)
by joib (subscriber, #8541)
[Link] (2 responses)
Posted Jan 18, 2021 7:16 UTC (Mon)
by NYKevin (subscriber, #129325)
[Link]
IMHO this is a broader issue with aio(7) and not a problem with fsync in particular.
Posted Jan 18, 2021 10:45 UTC (Mon)
by farnz (subscriber, #17727)
[Link]
io_uring does provide the primitives needed; there's IORING_OP_FSYNC (with IORING_FSYNC_DATASYNC to weaken from fsync to fdatasync) and IORING_OP_SYNC_FILE_RANGE for flushing caches asynchronously, and the IOSQE_IO_DRAIN and IOSQE_IO_LINK flags to order io_uring operations with respect to each other so that you can issue the fsync after all the related writes have been done.
Posted Jan 18, 2021 13:05 UTC (Mon)
by pbonzini (subscriber, #60935)
[Link] (1 responses)
The filesystem is going to write data before metadata, so that you won't have a file that's full of zeros (or worse, full of stale data including another user's cleartext password). With "old Unix" you could get a file that's full of trash after a power failure; I sure did. So if anything journalling makes things better.
Posted Jan 21, 2021 19:42 UTC (Thu)
by mstone_ (subscriber, #66309)
[Link]
Posted Jan 20, 2021 17:44 UTC (Wed)
by anton (subscriber, #25547)
[Link]
What about other filesystems?
> If I want to trash my application's performance with excessive use of fsbarrier(), that's my problem. If the OS expects me to trash EVERYONE ELSE'S performance with excessive use of fsync() or fsfsync(), then that's a BIG problem for the OS!
Matthias
What about other filesystems?
Wol
What about other filesystems?
- As the article mentions, they are discussing an "fsync multiple files" syscall, which will (probably) further alleviate this problem (if it actually happens).
- I'm not aware of any syscall called "fsfsync()," so I assume you meant syncfs(2). That function is not in POSIX, so all we have to go on is the note in that man page, which explicitly states that "sync() or syncfs() provide the same guarantees as fsync() called on every file in the system or filesystem respectively."
- POSIX says that sync(2) is not required to wait for the writes to complete before returning (unlike fsync()). As noted above, POSIX does not specify syncfs() at all.
- Arguably, a conforming implementation could implement sync() as a no-op, because POSIX says it causes outstanding data "to be scheduled for writing out" - but it was *already* scheduled for writing out.
- Therefore, if you want to be pedantically POSIX-correct, you should not use sync(2) at all, because it gets you exactly nothing according to the standard.
- Since syncfs() is already Linux-specific, you can rely on its Linux-specific guarantees, if you are in a position to call it in the first place.
What about other filesystems?
What about other filesystems?
What about other filesystems?
What about other filesystems?
What about other filesystems?
What about other filesystems?
Oh - and wasn't advice about how to shut a system down always "# sync; sync; sync; halt"?
No, the advice was to type
sync
sync
sync
halt
That's because sync did not block (unlike on Linux), so the time you needed to type the additional syncs and the halt was needed to finish the sync.