|
|
Log in / Subscribe / Register

bcachefs

bcachefs

Posted Feb 22, 2025 23:23 UTC (Sat) by koverstreet (✭ supporter ✭, #4296)
In reply to: bcachefs by NYKevin
Parent article: Filesystem support for block sizes larger than the page size

> Many applications which deal with "document-like" data tend to load the entire document into memory, operate on it, and then write it back out to save the changes. The needed atomicity here is that the changes either are completely applied or not applied at all, as an inconsistent state would render the file corrupt. A common approach is to write the document to a new file, then replace the original file with the new one. One method to do this is with the ReplaceFile API.

Yeah, I tend to agree with Microsoft :) I'm not aware of applications that would benefit, but if you do know of some please let me know.

I'm more interested in optimizations for fsync overhead.


to post comments

bcachefs

Posted Feb 22, 2025 23:45 UTC (Sat) by intelfx (subscriber, #130118) [Link] (1 responses)

> but if you do know of some please let me know.

Package managers? Text editors? Basically anything that currently has to do the fsync+rename+fsync dance?

Now, I'm not saying that someone should get on coding userspace transactions yesterday™, but at a glance, there are definitely uses for that.

bcachefs

Posted Feb 27, 2025 10:30 UTC (Thu) by koverstreet (✭ supporter ✭, #4296) [Link]

That fsync already isnn't needed on bcachefs (nor ext4, and maybe xfs as well) since we do an implicit fsync on an overwrite rename, where we flush the data but not the journal.

That is, you get ordering, not persistence, which is exactly what applications want in this situation.

bcachefs

Posted Feb 23, 2025 9:37 UTC (Sun) by Wol (subscriber, #4433) [Link]

Depends how much state, across how many files, but (if I understand correctly) I'm sure object based databases could benefit.

I would want to update part of a file (maybe two or three blocks, across a several-meg (or more) file) and the ability to rewrite just the blocks of interest, then flush a new inode or whatever, changing just those block pointers, would be wonderful.

Maybe we already have that. Maybe it's too complicated (as in multiple people trying to update the same file at the same time ...)

Cheers,
Wol

bcachefs

Posted Feb 24, 2025 18:52 UTC (Mon) by tim-day-387 (subscriber, #171751) [Link] (2 responses)

Lustre would benefit from a filesystem agnostic transaction API (at least, in kernel space). The OSD layer is essentially implementing that. We're making a push to get Lustre included upstream and the fate of OSD/ldiskfs/ext4 is one of the big open questions. Having a shared transaction API would make that much easier to answer.

bcachefs

Posted Feb 24, 2025 21:37 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

How does Lustre currently handle transactions? Especially rollbacks?

It looks like transactions in Lustre are more like an atomic group of operations, rather than something long-lived? I.e. you can't start a transaction, spend 2 hours doing something with it, and then commit it?

bcachefs

Posted Feb 25, 2025 16:49 UTC (Tue) by tim-day-387 (subscriber, #171751) [Link]

Currently, Lustre hooks into ext4 transactions in osd_trans_start() and osd_trans_stop() [1]. So the transactions aren't long-lived and are usually scoped to a single function. Lustre patches ext4 (to create ldiskfs) and interfaces with it directly. But it'd probably be better to have a generic way for filesystems to (optionally) expose these primitives. Infiniband has a concept of kverbs - drivers can optionally expose an interface to in-kernel users. We'd could do something similar for transaction handling.

[1] https://git.whamcloud.com/?p=fs/lustre-release.git;a=blob;...


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds