LWN.net Logo

Stable pages - is this "racy" ?

Stable pages - is this "racy" ?

Posted May 12, 2011 18:35 UTC (Thu) by davecb (subscriber, #1574)
In reply to: Stable pages by djwong
Parent article: Stable pages

I had a look at the paper the work I measured was based on, and wonder if we're really looking at a race condition: we take a checksum, queue the data for I/O and compare the data as part of or after the I/O to see if an error has occurred.

Delaying, duplicating or COWing allows us to survive or avoid the data changing while the I/O is queued, which is a pretty long time compared to anything happening in main memory. The speed difference gives us a relatively large period in which a program can race ahead of the disk.

If the purpose is to validate the disk write, one would want to do the checksum as late as possible before the write, and verify it either as part of hardware write or via a read-after-write step. That keeps the time period tiny.

If the purpose is to validate it from end to end, I suspect you need more than one check. One check would need to be done as the data is queued, to be sure it made it to the queue ok, which would need to be amended if the page in queue is coalesced with a later write. In the latter case you have a new, amended checksum to check as-or-after the write.

Alas, I'm not following the main list these days, so I'm unclear of the fine details of the requirements you face!

--dave


(Log in to post comments)

Stable pages - is this "racy" ?

Posted May 15, 2011 20:44 UTC (Sun) by giraffedata (subscriber, #1954) [Link]

The race is between Linux and the disk drive. No matter when Linux computes the checksum, if the data in the buffer changes while the disk drive is transferring the data from the buffer to itself, Linux cannot ensure that the checksum the disk drive gets is correct for the rest of the data that the disk drive gets.

It's always been pretty dicey to have the disk drive get a mixture of older and newer data for a single write, but we've always arranged it so that in the cases where than can happen, it doesn't matter that we end up with garbage. But it's a lot harder to ignore a checksum mismatch, which is designed to indicate lower level corruption.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds