Ensuring data reaches disk
Ensuring data reaches disk
Posted Sep 16, 2011 10:35 UTC (Fri) by andresfreund (subscriber, #69562)In reply to: Ensuring data reaches disk by scheck
Parent article: Ensuring data reaches disk
For that you need to issue some special commands - which e.g. fsync() knows how to do.
Besides an O_DIRECT write doesn't guarantee that metadata updates have reached stable storage.
Posted Nov 8, 2020 21:55 UTC (Sun)
by yzou93 (guest, #142976)
[Link] (5 responses)
Thank you.
Posted Nov 8, 2020 23:15 UTC (Sun)
by Wol (subscriber, #4433)
[Link]
Cheers,
Posted Nov 9, 2020 9:56 UTC (Mon)
by farnz (subscriber, #17727)
[Link] (3 responses)
Yes, such a command is needed, and the various interface specs (ATA, SCSI, NVMe) all have standardised commands for flushing the cache.
At a minimum, you get a FLUSH CACHE or SYNCHRONIZE CACHE type command, which is specified as not completing until all data in the cache is in persistent storage; this is enough to implement fsync() behaviour; beyond that, you can also have forced unit access (FUA) commands, which do not complete until the data written is on the persistent media, and even partial flush commands that only affect some sections of the drive.
There's an added layer of complexity in that some standards have queued flushes which act as straight barriers (all commands before the flush complete, then the flush happens, then the rest of the queue); others have queued flushes that only affect commands issued before the flush in this queue (and can over-flush by flushing data from later commands in the queue), and yet others only have unqueued flushes which require you to idle the interface, wait for the flush to complete, and then resume issuing commands.
Posted Nov 9, 2020 10:17 UTC (Mon)
by Wol (subscriber, #4433)
[Link] (2 responses)
If you can't be sure what has or hasn't hit the disk - the nightmare scenario is "part of the log, and part of the data" - then you get the hoops that I believe SQLite and PostgreSQL go through :-(
Cheers,
Posted Nov 9, 2020 17:11 UTC (Mon)
by zlynx (guest, #2285)
[Link] (1 responses)
I had to rebuild a btrfs volume because my laptop battery ran down in the bag and on reboot the drive contained blocks saying writes had completed, but those data blocks had old data in them. In other words, data that had been committed to physical storage (or that was CLAIMED by the drive) was no longer present after power-loss. It probably had to fsck or equivalent on the Flash FTL and lost some bits.
btrfs gets very upset about that.
I guess this behavior is still better than some older SSDs which had to be secure-erased and reformatted after losing their entire FTL? I guess.
Posted Nov 9, 2020 18:27 UTC (Mon)
by farnz (subscriber, #17727)
[Link]
To be fair to btrfs, that's it's USP compared to ext4 - when hardware fails, it lets you know that your data has been eaten at the time of the issue, and not months down the line.
And knowing consumer hardware, chances are very high that it did commit everything properly, and then had a catastrophic failure when there was a surprise power-down. Unfortunately, unless you have an acceptance lab verifying that kit complies with the intent of the spec, it often complies with the letter of the spec (if you're lucky) and no more :-(
Ensuring data reaches disk
My question about fsync() is how the OS could control/know the device-internal caching behavior.
When designing a block device hardware, for example if Samsung wants to design a new SSD, is a cache control support for fsync() command issued from OS required?
Ensuring data reaches disk
Wol
Ensuring data reaches disk
Ensuring data reaches disk
Wol
Ensuring data reaches disk
Ensuring data reaches disk