Concurrent page-fault handling with per-VMA locks

Posted Sep 6, 2022 2:20 UTC (Tue) by calumapplepie (guest, #143655)
In reply to: Concurrent page-fault handling with per-VMA locks by developer122
Parent article: Concurrent page-fault handling with per-VMA locks

Perhaps it's handled by, in the numbers-are-equal case, attempting to acquire mmap_lock. if, on acquisition, we see that the numbers are still equal, we know a rollover must have happened (since the VMA is locked for modification and yet the thread that should be modifiying it... isn't). so we can decrement the VMA sequence number, solving the issue until the next rollover without making it come sooner (as one would if we incremented the central sequence number).

Or maybe the authors didn't think of this case, which would be concerning, since there exists a potential for a deadlock if they did it wrong. We probably can't assume that there even is a current write operation which can complete, since I thin the number is only incremented on write operation completion. If it was incremented on both operation completion and initiation, you'd get that property of no-deadlocks. All VMAs would be unlocked if there is no write operation, regardless of rollover, because the mmap sequence number would be odd if there are modifications occurring and even otherwise, and VMA sequence numbers would only ever be set to odd values.

Both of these are fairly inexpensive solutions, as an extra increment isn't much compared to handling a page fault, and the structure in question will already be loaded into the L1 cache for modification. Similarly, I believe the first solution boils down to an additional decrement as well, since the mmap_lock must be acquired anyways in the fallback case. They also aren't mutually exclusive; just make the fallback path decrement the VMA sequence number by 2 to keep the pattern of odds/evens. I'd encourage the devs to implement both, given the impact that a bug in this could have: at worse, an unpredictable deadlock that only occurs in long-running processes and is virtually impossible to reproduce; at best, degraded performance in long-running processes due to constantly falling back on the worst-case code. I'm willing to bet that *some* process's architecture involves setting up a few large VMAs full of pages being faulted in and out but which never itself gets modified.

Concurrent page-fault handling with per-VMA locks

Posted Sep 6, 2022 6:28 UTC (Tue) by kilobyte (subscriber, #108024) [Link]

If you can't use a large enough counter, what about reads checking if you're dangerously close to a rollover, and if so doing a fake write?

Concurrent page-fault handling with per-VMA locks

Posted Sep 6, 2022 10:56 UTC (Tue) by developer122 (guest, #152928) [Link]

I guess if you want to catch potential bugs like this, you need to init the counter to be close to rollover. I remember some wisdom once (dunno if it's still done) that timeouts in the kernel by default should be set to 10 minutes after boot, to help catch issues early.