|
|
Log in / Subscribe / Register

Postgres, FPW=off and DIO

Postgres, FPW=off and DIO

Posted Apr 4, 2025 14:53 UTC (Fri) by mcgrof (subscriber, #25917)
In reply to: Postgres, FPW=off and DIO by andresfreund
Parent article: Supporting untorn buffered writes

We have no semantics today defined for buffered IO for RWF_ATOMIC, and so it can't be evaluated directly. At this stage the goal was to garner kernel community appreciation over it's potential, and discuss possible kernel level filesystem and block semantics. Since there seems to now be better appreciation over it's potential on the kernel, and the possible kernel semantics have been discussed the next goal would be to tailor a use case for databases that could leverage it such as PostgreSQL, and for that it's best to collaborate with the db community.

Its also correct that the RWF_ATOMIC atomic semantics today require single writes, that's not because of the requirements of direct IO but rather because at least from an NVMe perspective, a write IO size must not cross a boundary size, and if that's 16k an atomic write cannot be larger than 16k, ie it's a hardware requirement. And so software must also adhere to tailor atomic writes hardware needs, and the goal of RWF_ATOMIC is to help facilitate the requirements. Although NVMe MAM in theory could help large IO RWF_ATOMIC, and wrinkle to that it only works if a large write succeeds. If a large NVMe atomic MAM write fails filesystems today on Linux have no way of telling what block was valid and which is incorrect, the atomic block which failed is not communicated back. And so the entire range would need to be invalidated, which defeats the purpose. Reflininks *may* help here to support that limitation, but that'd require some evaluation and development.


to post comments


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds