Moving the kernel to large block sizes
Moving the kernel to large block sizes
Posted Sep 30, 2023 1:44 UTC (Sat) by willy (subscriber, #9762)Parent article: Moving the kernel to large block sizes
I kicked off the conversion of buffer heads from b_page to b_folio in December 2022 with a 12 patch series (LSFMM was 6 months later in May). Pankaj from Samsung did some work in April. I did some more in June, then another round in July, and another this month.
Buffer heads do not represent 512 byte blocks. They represent filesystem block size blocks (this may be >= device block size). Where 512 bytes is still used is in describing disc locations to the block layer. But that's just a shift; we might as well use bytes.
The important reason to need large folios to support large drive block sizes is that the block size is the minimum I/O size. That means that if we're going to write from the page cache, we need the entire block to be present. We can't evict one page and then try to write back the other pages -- we'd have to read the page we evicted back in. So we want to track dirtiness and presence on a per-folio basis; and we must restrict folio size to be no smaller than block size.
> the folios in the page cache can be handed to the block layer, which will enumerate them in 512-byte blocks, hand the results to the driver that will reassemble them into larger units.
That's not how it works. The writeback code will enumerate each dirty folio. The filesystem ends up calling bio_add_folio() (most currently call bio_add_page()) and the bio will contain the entire contents of the folio.
SLUB already uses higher order folios. There's a boot option to set the minimum order.
There's no immediate need to replace bio_vec. It handles multiple pages just fine. There are very good reasons to replace it though -- see my struct phyr proposal, that I don't have time to work on.
