LWN: Comments on "Filesystem support for block sizes larger than the page size" https://lwn.net/Articles/1009548/ This is a special feed containing comments posted to the individual LWN article titled "Filesystem support for block sizes larger than the page size". en-us Tue, 11 Nov 2025 00:02:07 +0000 Tue, 11 Nov 2025 00:02:07 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net Can't actually use currently? https://lwn.net/Articles/1013470/ https://lwn.net/Articles/1013470/ bmenrigh <div class="FormattedComment"> On kernel 6.13.5 I was able to make an XFS filesystem (to a file) with 8192 byte sector sizes but when I try to mount it via loopback (either with losetup or mount -o loop) I run into:<br> <p> [ 3713.182037] XFS (loop0): Cannot set_blocksize to 8192 on device loop0<br> <p> Or trying to specify losetup -b 8192:<br> [ 3770.934627] Invalid logical block size (8192)<br> <p> Maybe I'm missing something? Otherwise the support is there in XFS but the support isn't in the the block layer (at least not the loopback part) to work.<br> </div> Sun, 09 Mar 2025 10:42:37 +0000 bcachefs https://lwn.net/Articles/1012076/ https://lwn.net/Articles/1012076/ koverstreet <div class="FormattedComment"> That fsync already isnn't needed on bcachefs (nor ext4, and maybe xfs as well) since we do an implicit fsync on an overwrite rename, where we flush the data but not the journal.<br> <p> That is, you get ordering, not persistence, which is exactly what applications want in this situation.<br> </div> Thu, 27 Feb 2025 10:30:55 +0000 bcachefs https://lwn.net/Articles/1011797/ https://lwn.net/Articles/1011797/ tim-day-387 <div class="FormattedComment"> Currently, Lustre hooks into ext4 transactions in osd_trans_start() and osd_trans_stop() [1]. So the transactions aren't long-lived and are usually scoped to a single function. Lustre patches ext4 (to create ldiskfs) and interfaces with it directly. But it'd probably be better to have a generic way for filesystems to (optionally) expose these primitives. Infiniband has a concept of kverbs - drivers can optionally expose an interface to in-kernel users. We'd could do something similar for transaction handling.<br> <p> [1] <a href="https://git.whamcloud.com/?p=fs/lustre-release.git;a=blob;f=lustre/osd-ldiskfs/osd_handler.c">https://git.whamcloud.com/?p=fs/lustre-release.git;a=blob;...</a><br> </div> Tue, 25 Feb 2025 16:49:30 +0000 bcachefs https://lwn.net/Articles/1011682/ https://lwn.net/Articles/1011682/ Cyberax <div class="FormattedComment"> How does Lustre currently handle transactions? Especially rollbacks?<br> <p> It looks like transactions in Lustre are more like an atomic group of operations, rather than something long-lived? I.e. you can't start a transaction, spend 2 hours doing something with it, and then commit it?<br> </div> Mon, 24 Feb 2025 21:37:00 +0000 bcachefs https://lwn.net/Articles/1011669/ https://lwn.net/Articles/1011669/ tim-day-387 <div class="FormattedComment"> Lustre would benefit from a filesystem agnostic transaction API (at least, in kernel space). The OSD layer is essentially implementing that. We're making a push to get Lustre included upstream and the fate of OSD/ldiskfs/ext4 is one of the big open questions. Having a shared transaction API would make that much easier to answer.<br> </div> Mon, 24 Feb 2025 18:52:28 +0000 bcachefs https://lwn.net/Articles/1011468/ https://lwn.net/Articles/1011468/ Wol <div class="FormattedComment"> Depends how much state, across how many files, but (if I understand correctly) I'm sure object based databases could benefit.<br> <p> I would want to update part of a file (maybe two or three blocks, across a several-meg (or more) file) and the ability to rewrite just the blocks of interest, then flush a new inode or whatever, changing just those block pointers, would be wonderful.<br> <p> Maybe we already have that. Maybe it's too complicated (as in multiple people trying to update the same file at the same time ...)<br> <p> Cheers,<br> Wol<br> </div> Sun, 23 Feb 2025 09:37:46 +0000 bcachefs https://lwn.net/Articles/1011445/ https://lwn.net/Articles/1011445/ Cyberax <div class="FormattedComment"> <span class="QuotedText">&gt; And then, once it exists and works correctly</span><br> <p> Except this never happened :) I wrote an application that actually used distributed transactions with NTFS and SQL Server, for video file management from CCTV cameras, some time around 2008.<br> <p> There were tons of corner cases that didn't work quite right. For example, if you created a folder and a file within that folder, then nobody else could create files in that folder until the transaction commits. Because the folder had to be deleted during the rollback.<br> <p> And this at least made some sense within Windows, as it's a very lock-heavy system. It will make much less sense in a Linux FS.<br> </div> Sun, 23 Feb 2025 01:45:50 +0000 bcachefs https://lwn.net/Articles/1011439/ https://lwn.net/Articles/1011439/ intelfx <div class="FormattedComment"> <span class="QuotedText">&gt; but if you do know of some please let me know.</span><br> <p> Package managers? Text editors? Basically anything that currently has to do the fsync+rename+fsync dance?<br> <p> Now, I'm not saying that someone should get on coding userspace transactions yesterday™, but at a glance, there are definitely uses for that.<br> </div> Sat, 22 Feb 2025 23:45:16 +0000 bcachefs https://lwn.net/Articles/1011438/ https://lwn.net/Articles/1011438/ koverstreet <div class="FormattedComment"> <span class="QuotedText">&gt; Many applications which deal with "document-like" data tend to load the entire document into memory, operate on it, and then write it back out to save the changes. The needed atomicity here is that the changes either are completely applied or not applied at all, as an inconsistent state would render the file corrupt. A common approach is to write the document to a new file, then replace the original file with the new one. One method to do this is with the ReplaceFile API.</span><br> <p> Yeah, I tend to agree with Microsoft :) I'm not aware of applications that would benefit, but if you do know of some please let me know.<br> <p> I'm more interested in optimizations for fsync overhead.<br> </div> Sat, 22 Feb 2025 23:23:45 +0000 bcachefs https://lwn.net/Articles/1011436/ https://lwn.net/Articles/1011436/ NYKevin <div class="FormattedComment"> To the best of my understanding, "NTFS style transactions" means, roughly speaking, "you expose full transaction semantics to userspace, so that userspace can construct transactions with arbitrary combinations of writes, including writes that span multiple files or directories." And then, once it exists and works correctly, you write documentation[1] telling userspace not to use it, supposedly because userspace never really wanted it in the first place (which I find hard to believe, personally).<br> <p> [1]: <a href="https://learn.microsoft.com/en-us/windows/win32/fileio/deprecation-of-txf">https://learn.microsoft.com/en-us/windows/win32/fileio/de...</a><br> </div> Sat, 22 Feb 2025 22:41:24 +0000 bcachefs https://lwn.net/Articles/1011353/ https://lwn.net/Articles/1011353/ koverstreet <div class="FormattedComment"> Yes, it could support arbitrary atomic writes (not of infinite size, bcachefs transactions have practical limits). If someone wanted to fund it - it's not a particularly big interest of mine.<br> <p> Unfamiliar with NTFS style transactions.<br> </div> Fri, 21 Feb 2025 20:05:27 +0000 bcachefs https://lwn.net/Articles/1011327/ https://lwn.net/Articles/1011327/ Cyberax <div class="FormattedComment"> Unlikely. And even NTFS' successor removed the transactional support.<br> <p> The problem is not in the filesystem itself, where transactions are reasonably easy, but in the VFS layer and the page cache. There's no easy way to impose isolation between parts of the page cache, and rollbacks are even more tricky.<br> </div> Fri, 21 Feb 2025 18:10:29 +0000 bcachefs https://lwn.net/Articles/1011305/ https://lwn.net/Articles/1011305/ DemiMarie <div class="FormattedComment"> Given that it is CoW, could bcachefs support arbitrary atomic writes, or even (gulp) transactions in the style of NTFS?<br> </div> Fri, 21 Feb 2025 15:16:10 +0000 Performance hit? https://lwn.net/Articles/1011193/ https://lwn.net/Articles/1011193/ mcgrof <div class="FormattedComment"> We did measurements of the impact to existing 4k workloads and found the impact to be within the noise. The more interesting things were about how larger block sizes perform against 4k block size workloads, and while that was also found to be within the noise I figured I'd share the old results here [0]. The more interesting results were found once one started to use unbounded IO devices such as pmem [0], part of which lead to last year's LSFMM topic to "Measuring and improving buffered I/O". One of the key conclusions of that discussion was the prospect of parallelizing writeback, a topic which has been proposed for this year's LSFMM [2] and for which there are RFC patches out to help review.<br> <p> [0] <a href="https://docs.google.com/presentation/d/e/2PACX-1vS6jYbdGDBxlN5fyIjEFAsMxZ5iHkqaXIW-K3-mWDyA4jL_ZzBdZkBe7TkzmZ-IVoSwrQPGQeqeXlkn/pub?start=false&amp;loop=false&amp;delayms=3000">https://docs.google.com/presentation/d/e/2PACX-1vS6jYbdGD...</a><br> [1] <a href="https://lwn.net/Articles/976856/">https://lwn.net/Articles/976856/</a><br> [2] <a href="https://lore.kernel.org/all/Z6GAYFN3foyBlUxK@dread.disaster.area/T/">https://lore.kernel.org/all/Z6GAYFN3foyBlUxK@dread.disast...</a><br> </div> Fri, 21 Feb 2025 12:18:40 +0000 Performance hit? https://lwn.net/Articles/1011173/ https://lwn.net/Articles/1011173/ bmenrigh <div class="FormattedComment"> Does this come at a performance cost (CPU-wise, not disk-wise). From the description of the details it sounds like it doesn’t, or the overhead is too tiny to matter.<br> </div> Fri, 21 Feb 2025 04:43:17 +0000 Shows how ahead of its time SGI really was https://lwn.net/Articles/1011172/ https://lwn.net/Articles/1011172/ bmenrigh <div class="FormattedComment"> XFS has reflinks. I’ve been de-duping files on it for a while.<br> </div> Fri, 21 Feb 2025 04:38:46 +0000 Shows how ahead of its time SGI really was https://lwn.net/Articles/1011165/ https://lwn.net/Articles/1011165/ gerdesj <div class="FormattedComment"> "but the kernel is only now able to offer all the features that XFS could rely on in IRIX almost 30 years ago."<br> <p> reflinks?<br> <p> XFS nowadays has holes in the knees of its jeans, which it mock derides as "ironic". For me XFS is a safe haven for data, especially for large files. <br> <p> Veeam has supported reflinks for some time now and it is a bit of a game changer when you are dealing with backups that have a full and incrementals. A "synthetic full" can be generated within seconds by shuffling reflinks instead of blocks. Add a flag to cp and you can create a sort of clone within seconds - a bit like a volume snapshot but for individual files.<br> <p> The real beauty of Linux and other promiscuous Unixes is that we have a lot of choice and sometimes we as sysadmins pick the right one for the job in hand. <br> <p> Windows has vFAT, NTFS and ReFS and that's your lot. To be fair ReFS is shaping up nicely these days (it doesn't kick your puppies quite so often) and NTFS is very, very stable - regardless of how much you abuse it. vFAT is FAT - keep it simples.<br> <p> Apples have files and I'm sure they are lovely.<br> </div> Fri, 21 Feb 2025 01:20:26 +0000 Thanks for the nudge https://lwn.net/Articles/1011146/ https://lwn.net/Articles/1011146/ koverstreet <div class="FormattedComment"> This was on my todo list, but I'd been too lazy to look up the relevant helper :)<br> <p> bcachefs now has it in for-next, so it should land in 6.15:<br> <a href="https://evilpiepirate.org/git/bcachefs.git/commit/?h=for-next&amp;id=77308424ba26e1b41a7db5d4eae121841a707c05">https://evilpiepirate.org/git/bcachefs.git/commit/?h=for-...</a><br> </div> Thu, 20 Feb 2025 21:44:11 +0000 Shows how ahead of its time SGI really was https://lwn.net/Articles/1011145/ https://lwn.net/Articles/1011145/ jmalcolm <div class="FormattedComment"> We tend to look back to the era of "proprietary" UNIX as a lesser time but the sentence "XFS will finally support all the features that it supported in IRIX before it was ported to Linux in 2001" really hits home. Not only do we have XFS itself only because it was gifted by SGI but the kernel is only now able to offer all the features that XFS could rely on in IRIX almost 30 years ago.<br> <p> My point is not that IRIX is better than Linux. Only that, if you consider what your home PC would have looked like in 2000, it is kind of mind blowing how advanced the software already was back then. Which makes sense of course because, though it would not hold a candle to today, the proprietary UNIX systems were designed for money-no-object hardware that pushed the technology of the time as far as it could go.<br> </div> Thu, 20 Feb 2025 21:16:49 +0000