|
|
Log in / Subscribe / Register

Sunsetting buffer heads

Sunsetting buffer heads

Posted May 23, 2023 2:02 UTC (Tue) by dgc (subscriber, #6611)
In reply to: Sunsetting buffer heads by heatd
Parent article: Sunsetting buffer heads

> Silly question but: why do we hate buffer_heads?

On my current debian distro kernel, a buffer head is 104 bytes in size.

Filesystems require one buffer head per filesystem block cached in the page cache. In cases where fs block size = PAGE_SIZE, the only non-redeundant information the bufferhead carries is the sector address of filesystem block the page maps to. i.e. 8 bytes of information. And, really, this can still be considered redundant because the canonical source of the mapping information is the filesystem, not the bufferhead.

Consider a typical modern server that has >1TB of memory in it. Say for a given workload half of that memory is page cache and we have one buffer head per 4kB page. 500GB of page cache -> ~125 million pages = ~125 million buffer heads = ~10GB of RAM just for bufferheads. IOWs, a machine who's memory is 50% full of cached file data is going to be using at least 1% of the entire machine's RAM just to store bufferheads.

If you have block size < page size, then you have multiple bufferheads per page, and then they typically each only carry an extra 2 bits more of information - per-block dirty and uptodate state. If you have a 1kB block size on a 64kB page size then there are 64 individual buffer heads attached to that page in a circular linked list. Iteration of the buffer heads (e.g. during writeback) can then cost a 50-100 cache misses depending on what fields in the bufferheads are being accessed....

iomap solves this problem (and others) by querying the filesystem only when mapping information is needed by the IO paths. The reduces the per-block state that needs to be carried in the page cache down to 2 bits - uptodate and dirty state. We currently only carry per-block uptodate state in a per-folio bitmap, but work is in progress to move from per-folio to per-block dirty state using the same technique.

At this point, a single 2MB folio in the page cache with a 4kB block size will only need to carry ~140bytes of state information to track all necessary state information. Using bufferheads would require the 2MB folio to carry a list of 512 individual bufferheads and hence would use ~50kB of memory to track the same state information. That's a pretty big difference in resource consumption and should also demonstrate why it was decided that bufferheads will only be used with PAGE_SIZE sized folios...

There's plenty of other more complex/subtle reasons for not using bufferheads, but the compelling reason for modern systems is simply that per-block information is expensive to maintain. Filesystems have used extents for over 3 decades for this reason, and the iomap infrastructure leverages the efficiency of extent based mapping indexes already implemented in the filesystems themselves to minimise the memory footprint and CPU overhead of caching file data....

-Dave.


to post comments


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds