Posted May 12, 2011 6:07 UTC (Thu) by djwong (subscriber, #23506)
In reply to: Stable pages by jzbiciak
Parent article: Stable pages
Yes, I'm working towards a COW solution. However, I first need to quantify the impact of wait-on-writeback on a wider variety of workloads so that I have a better idea of what I'd be changing and what good that would do. :)
Posted May 12, 2011 13:00 UTC (Thu) by smurf (subscriber, #17840)
[Link]
One kind of workload that's affected negatively would be any low-latency process which writes to disk.
When I do that, in order to guarantee that the main program responds immediately I lock the whole application in memory and use a separate writing thread.
But if you lock a couple of my process' pages when writing, that lock will affect unrelated data structures which simply happen to be on the same meory page. I can thus no longer guarantee that my main task will no longer block on random memory writes. That's not acceptable.
Stable pages
Posted May 12, 2011 13:28 UTC (Thu) by jzbiciak (✭ supporter ✭, #5246)
[Link]
Well, for one, you could allocate your write buffers in dedicated pages with "memalign". That might not be a bad idea anyway.
Now, on a separate note: One thing that wasn't clear to me was why this blocking only applies to file backed pages. Wouldn't anonymous pages headed toward swap also be subject to this if swap was on an integrity-checked volume?
Stable pages
Posted May 12, 2011 20:01 UTC (Thu) by djwong (subscriber, #23506)
[Link]
I _think_ the swap cache tries to erase all the mappings to a particular page just prior to swapping the page out to disk, and doesn't write the page if it can't. I'm not 100% sure, however, that there isn't a race between the page being mapped back in while the swapout is in progress, so I'll check.
Stable pages, posible corner cases
Posted May 12, 2011 18:07 UTC (Thu) by davecb (subscriber, #1574)
[Link]
Perhaps I'm misunderstanding, but won't a series of small sequential writes trigger wait-on-writeback? Or does this not apply to appending to a file-backed page?
In a previous life I was involved in the performance measurement of coalescing disk writes, and we found a very large number of sequential writes could be coalesced into single writes, and then adjacent blocks coalesced into larger singe writes. This paid off particularly well when a disk was being handed writes at or beyond it's capacity, by removing unneeded writes. I think I still have the graphs somewhere (;-))
I'll comment on the non-sequential case in a sec, after I look at my archive.
--dave
Stable pages, posible corner cases
Posted May 12, 2011 18:49 UTC (Thu) by jzbiciak (✭ supporter ✭, #5246)
[Link]
It seems like it should, but only if the page starts getting flushed to disk during the series of writes. Dirty pages don't get flushed to disk immediately unless there's memory pressure, too many dirty pages, they've been sitting around awhile, or you've asked them to be flushed. All those thresholds are defined throughout here:
That's what makes it so hard (at least for me) to reason about what workloads would get hurt, since there's not a simple, immediate relationship between "application dirtied a page" and "page got scheduled for writeback." You need both of those things to happen *and* the application must subsequently try to dirty the page further before you hit the page-block.
I guess you could get some negative interactions more immediately if a 'write' call scheduled a writeback for part of a page, and then the app immediately resumed filling the rest of the page. Still, I don't think a write() syscall triggers an immediate writeback on most calls. Take a look at 'queue_io' around line 277:
Only the oldest dirtied pages get flushed, as I read that.
Stable pages, posible corner cases
Posted May 12, 2011 19:23 UTC (Thu) by davecb (subscriber, #1574)
[Link]
Excellent, thanks! --dave
Stable pages - is this "racy" ?
Posted May 12, 2011 18:35 UTC (Thu) by davecb (subscriber, #1574)
[Link]
I had a look at the paper the work I measured was based on, and wonder if we're really looking at a race condition: we take a checksum, queue the data for I/O and compare the data as part of or after the I/O to see if an error has occurred.
Delaying, duplicating or COWing allows us to survive or avoid the data changing while the I/O is queued, which is a pretty long time compared to anything happening in main memory. The speed difference gives us a relatively large period in which a program can race ahead of the disk.
If the purpose is to validate the disk write, one would want to do the checksum as late as possible before the write, and verify it either as part of hardware write or via a read-after-write step. That keeps the time period tiny.
If the purpose is to validate it from end to end, I suspect you need more than one check. One check would need to be done as the data is queued, to be sure it made it to the queue ok, which would need to be amended if the page in queue is coalesced with a later write. In the latter case you have a new, amended checksum to check as-or-after the write.
Alas, I'm not following the main list these days, so I'm unclear of the fine details of the requirements you face!
--dave
Stable pages - is this "racy" ?
Posted May 15, 2011 20:44 UTC (Sun) by giraffedata (subscriber, #1954)
[Link]
The race is between Linux and the disk drive. No matter when Linux computes the checksum, if the data in the buffer changes while the disk drive is transferring the data from the buffer to itself, Linux cannot ensure that the checksum the disk drive gets is correct for the rest of the data that the disk drive gets.
It's always been pretty dicey to have the disk drive get a mixture of older and newer data for a single write, but we've always arranged it so that in the cases where than can happen, it doesn't matter that we end up with garbage. But it's a lot harder to ignore a checksum mismatch, which is designed to indicate lower level corruption.