What about other filesystems?

Posted Jan 16, 2021 19:15 UTC (Sat) by NYKevin (subscriber, #129325)
In reply to: What about other filesystems? by dvdeug
Parent article: Fast commits for ext4

Under old-school Unix, if power dropped at the wrong moment, you could lose the whole filesystem. In practice, you would often be able to recover some or most of it with fsck (or an fsck-like process), but there was never any guarantee that a system crash was a recoverable event. As a result, it has always been the case that "If you don't call fsync, then you might lose data." It's just that this used to be vacuously true (you couldn't call fsync, because it didn't exist).

On the other hand, an orderly shutdown has never lost data on any (reasonable, properly-engineered)Unix (that I'm aware of), whether you fsync or not. This is still true today.

What about other filesystems?

Posted Jan 17, 2021 8:16 UTC (Sun) by dvdeug (guest, #10998) [Link] (7 responses)

Linux can't save you if the computer fails due to any number of physical problems. It has always been the case that "you might lose data". The change is that older Unixes don't require you to do anything special to achieve maximal data safety offered, whereas modern systems require you to do something special for the OS to try its best. Arguably (as Wol does), going from ordered behavior to complex reordering is a downgrade in the promised level of support.

Taking code that didn't require fsync (because it doesn't exist) and, in the words of zlynx, saying that "it's broken" makes all ISO C code that needs data safety broken, which seems extreme. From my perspective, filesystem developers got the ability to increase safety by default or speed by default, and chose speed. That doesn't really upset me, so much as the fact that the word "pony" gets pulled out and one side gets painted as unreasonable, instead of it getting painted as a tradeoff and argued on that basis.

What about other filesystems?

Posted Jan 17, 2021 9:33 UTC (Sun) by NYKevin (subscriber, #129325) [Link] (2 responses)

1. Modern filesystems are much safer than old filesystems, by default. When was the last time you had to run fsck on boot?
2. I did not claim that old code was "broken," merely that it was at risk of losing data. My point is that both the application developer and the sysadmin would have been aware of that problem, and would take appropriate steps to remediate it (such as making regular backups, building a RAID, or whatever else makes sense). Everyone should still be taking those steps today, because as you say, nothing is 100% reliable.
3. Safety and speed are a tradeoff. But since we can't get to 100% safety, the primary value of safety is extrinsic: a safer system causes us to spend less time and resources on recovery (e.g. sitting around waiting for fsck to complete so I can boot my machine). So safety is itself a form of speed, and we can directly compare the time spent on recovery to the time spent on disk I/O - and as it turns out, once you make fsck obsolete, the disk I/O is a lot bigger for most people under most circumstances.

What about other filesystems?

Posted Jan 17, 2021 17:25 UTC (Sun) by Wol (subscriber, #4433) [Link] (1 responses)

> 2. I did not claim that old code was "broken," merely that it was at risk of losing data. My point is that both the application developer and the sysadmin would have been aware of that problem, and would take appropriate steps to remediate it (such as making regular backups, building a RAID, or whatever else makes sense). Everyone should still be taking those steps today, because as you say, nothing is 100% reliable.

RAID is useless if it can't guarantee that stuff has been safely saved to disk ... which it can't if the linux layers provide no guarantees ...

Backups are pretty much useless BY DEFINITION, because if the data is corrupted while saving to disk (which is what we're discussing here), then it's not been around long enough to be saved to backup.

Come on, all I'm asking for is the ABILITY TO REASON about what is happening, so I can provide my own guarantees. "The system may or may not have saved this data in the event of a crash" is merely the filesystem guys saying "not our problem", and the references to the SQLite guys jumping through hoops to make certain is the perfect example of them having to do somebody else's job, because surely it's the filesystem's guys' job to make sure that data entrusted to the filesystem is actually safely saved by the filesystem.

If I can have some guarantee that "this data is saved before that data starts to be written", then at least I can reason about it.

And yes, I know making all filesystems provide these sort of guarantees may be fun - I'm on the raid mailing list - I know - because I read all the messages and glance at all the patches and all that stuff (and don't understand much of it :-) - but when (I know, I know) I find the time to start really digging in to it, I want the raid layer to provide exactly those guarantees.

And why can't we say "these are the guarantees we *intend* to provide", and make it a requirement that anything new *does* provide them! If I provide a "flush" in the raid code, I can then pass it on to the next layer down, and then when it says it's done it I can then pass success back up (or E_NOTSUPPORTED if I can't pass it down). But this is exactly another of those *new* things they're trying to get into the linux block layer, isn't it - the ability to pass error codes back to the caller other than the most basic of "succeeded" or "failed", isn't it? If they can get that in, surely they can get my "flush" in, can't they?

Cheers,
Wol

What about other filesystems?

Posted Jan 18, 2021 17:47 UTC (Mon) by hkario (subscriber, #94864) [Link]

precisely, the issue is not that hardware can fail and that the file system can't promise anything in such case, the problem is that there is no specification common _to all file systems_ that says what is expected to happen under such and such scenarios

or to put it other way round: every file system will exhibit different behaviour on power failure and every file system requires slightly different handling to get something you can reasonably expect (like, when the file system says it committed data to disk, the data is committed to disk)

that's no way to program stuff when dealing with such fundamental thing in computing as data storage

What about other filesystems?

Posted Jan 17, 2021 17:28 UTC (Sun) by Wol (subscriber, #4433) [Link] (3 responses)

> Taking code that didn't require fsync (because it didn't (sic) exist) and, in the words of zlynx, saying that "it's broken" makes all ISO C code that needs data safety broken, which seems extreme.

Actually, I think that's called a regression, is it not? And one of Linus' absolute rules is "no regressions", isn't it?

Cheers,
Wol

What about other filesystems?

Posted Jan 17, 2021 20:31 UTC (Sun) by matthias (subscriber, #94967) [Link] (2 responses)

>> Taking code that didn't require fsync (because it didn't (sic) exist) and, in the words of zlynx, saying that "it's broken" makes all ISO C code that needs data safety broken, which seems extreme.
>Actually, I think that's called a regression, is it not? And one of Linus' absolute rules is "no regressions", isn't it?
There is no regression. The code works as good as back in the days. Back in the days it was clear, that the data is only safe is the system is working properly, including no power outages. If you make sure that your system never crashes, the old code will work fine. If the system crashes, the old code might loose data, but this was always the case with this code. If you want additional guarantees (like no data loss in case of power loss), you have to use fsync.

Best,
Matthias

What about other filesystems?

Posted Jan 17, 2021 21:23 UTC (Sun) by Wol (subscriber, #4433) [Link] (1 responses)

mmmm

The risk of a corrupted filesystem hasn't changed.

But if the application writes a journal before doing an update, then provided there's no collateral damage it can recover from a crash mid transaction on an old unix system.

On a new system, it can't be sure whether the transaction log is okay and the update is damaged, or the transaction log is damaged and the transaction is lost, or even worse the transaction log is damaged and the transaction is partially complete!

Cheers,
Wol

What about other filesystems?

Posted Jan 18, 2021 10:29 UTC (Mon) by farnz (subscriber, #17727) [Link]

No, because even on ancient systems, you had elevator reordering for performance, and no guarantees about metadata writes; in the event of a crash, you simply did not know the state of the update or the transaction log, as even if you wrote them in a careful order, the elevator could reorder writes to disk, and the metadata writes might be reordered, too.

In other words, as soon as there's a kernel panic or a power failure, all bets are off on an old UNIX system. This wasn't an issue with reliable systems, but as reliability went down (no dedicated power supplies, no UPSes etc), it became an issue again.