What about other filesystems?

Posted Jan 17, 2021 17:25 UTC (Sun) by Wol (subscriber, #4433)
In reply to: What about other filesystems? by NYKevin
Parent article: Fast commits for ext4

> 2. I did not claim that old code was "broken," merely that it was at risk of losing data. My point is that both the application developer and the sysadmin would have been aware of that problem, and would take appropriate steps to remediate it (such as making regular backups, building a RAID, or whatever else makes sense). Everyone should still be taking those steps today, because as you say, nothing is 100% reliable.

RAID is useless if it can't guarantee that stuff has been safely saved to disk ... which it can't if the linux layers provide no guarantees ...

Backups are pretty much useless BY DEFINITION, because if the data is corrupted while saving to disk (which is what we're discussing here), then it's not been around long enough to be saved to backup.

Come on, all I'm asking for is the ABILITY TO REASON about what is happening, so I can provide my own guarantees. "The system may or may not have saved this data in the event of a crash" is merely the filesystem guys saying "not our problem", and the references to the SQLite guys jumping through hoops to make certain is the perfect example of them having to do somebody else's job, because surely it's the filesystem's guys' job to make sure that data entrusted to the filesystem is actually safely saved by the filesystem.

If I can have some guarantee that "this data is saved before that data starts to be written", then at least I can reason about it.

And yes, I know making all filesystems provide these sort of guarantees may be fun - I'm on the raid mailing list - I know - because I read all the messages and glance at all the patches and all that stuff (and don't understand much of it :-) - but when (I know, I know) I find the time to start really digging in to it, I want the raid layer to provide exactly those guarantees.

And why can't we say "these are the guarantees we *intend* to provide", and make it a requirement that anything new *does* provide them! If I provide a "flush" in the raid code, I can then pass it on to the next layer down, and then when it says it's done it I can then pass success back up (or E_NOTSUPPORTED if I can't pass it down). But this is exactly another of those *new* things they're trying to get into the linux block layer, isn't it - the ability to pass error codes back to the caller other than the most basic of "succeeded" or "failed", isn't it? If they can get that in, surely they can get my "flush" in, can't they?

Cheers,
Wol

What about other filesystems?

Posted Jan 18, 2021 17:47 UTC (Mon) by hkario (subscriber, #94864) [Link]

precisely, the issue is not that hardware can fail and that the file system can't promise anything in such case, the problem is that there is no specification common _to all file systems_ that says what is expected to happen under such and such scenarios

or to put it other way round: every file system will exhibit different behaviour on power failure and every file system requires slightly different handling to get something you can reasonably expect (like, when the file system says it committed data to disk, the data is committed to disk)

that's no way to program stuff when dealing with such fundamental thing in computing as data storage