|
|
Log in / Subscribe / Register

Concurrent page-fault handling with per-VMA locks

Concurrent page-fault handling with per-VMA locks

Posted Sep 5, 2022 16:36 UTC (Mon) by developer122 (guest, #152928)
Parent article: Concurrent page-fault handling with per-VMA locks

How does the kernel deal with sequence number rollover? ie in this example when the global sequence number rolls around to be equal to the stale numbers on a few of the thousands of VMAs?

Or in this implementation are they just taken as collateral damage, and unnecessarily locked until the current write operation ends (finishes it's work on the VMAs of interest)?


to post comments

Rollover

Posted Sep 5, 2022 18:49 UTC (Mon) by corbet (editor, #1) [Link] (1 responses)

This is just a guess but ... I would expect that, if a VMA has been entirely idle for long enough for the sequence number to roll over and completely back around, the chances of an access coming at just the right time to cause a false positive are pretty tiny. But if that occurs, the worst that happens is that a fault is handled a bit more slowly that it otherwise would be.

Rollover

Posted Oct 12, 2022 16:46 UTC (Wed) by surenb (guest, #114881) [Link]

Mr Corbet is right as usual. The comment in https://lwn.net/ml/linux-kernel/c84136d3-703a-0e57-20ce-5... indicates " Overflow might produce false locked result but it's not critical.". This is because we will simply fallback to taking mmap_lock in the case of the false positive. Per Laurent's comment I'm going to expand this comment to explain the details.

Concurrent page-fault handling with per-VMA locks

Posted Sep 5, 2022 21:05 UTC (Mon) by Homer512 (subscriber, #85295) [Link] (7 responses)

Do we even care about rollover? If the sequence number is 64 bit, a 5 GHz machine cannot roll it over in over 100 years, even if it could increment the number once per clock cycle.

Concurrent page-fault handling with per-VMA locks

Posted Sep 6, 2022 15:31 UTC (Tue) by calumapplepie (guest, #143655) [Link] (6 responses)

Hmm, good point... the downside would be memory consumption. 64-bit sequence numbers would increase the size of every vma_struct by about 5% (by my estimate of scanning the code). `sudo cat /proc/*/maps | wc` tells me I have 146,368 VMAs right now on my laptop; 8 extra bytes each would add up to around a megabyte, which is fairly small, especially considering what I have running right now (Firefox, Chrome, Libreoffice, Calibre, and Evolution, plus a few other apps). That being said, I remember seeing an article on here about an effort to share vma_struct instances between processes, so there might some group that's seeing much larger numbers.

Concurrent page-fault handling with per-VMA locks

Posted Sep 6, 2022 16:20 UTC (Tue) by adobriyan (subscriber, #30858) [Link] (1 responses)

> 64-bit sequence numbers would increase the size of every vma_struct by about 5% (by my estimate of scanning the code)

VMAs are 200 bytes (on my F35 system) and on most systems they will be allocated from kmalloc-256,
so there is plenty of space.

Concurrent page-fault handling with per-VMA locks

Posted Sep 6, 2022 16:27 UTC (Tue) by adobriyan (subscriber, #30858) [Link]

Hmm... Fedora disabled SLAB merging.

Concurrent page-fault handling with per-VMA locks

Posted Sep 6, 2022 16:35 UTC (Tue) by Wol (subscriber, #4433) [Link] (2 responses)

> Hmm, good point... the downside would be memory consumption. 64-bit sequence numbers would increase the size of every vma_struct by about 5% (by my estimate of scanning the code).

Could you make the sequence number include the pid?

What are the chances of a 16-bit rolling number concatenated with the low order 16-bit pid as your sequence number colliding?

Cheers,
Wol

Concurrent page-fault handling with per-VMA locks

Posted Sep 6, 2022 20:42 UTC (Tue) by WolfWings (subscriber, #56790) [Link] (1 responses)

...hasn't the PID been quite a bit larger than 16-bit for some time now? Just software limited to only use 16-bit PIDs by default for compatibility?

Concurrent page-fault handling with per-VMA locks

Posted Sep 7, 2022 6:24 UTC (Wed) by Wol (subscriber, #4433) [Link]

I know. Which is why I said use the low-order 16 bits ...

Cheers,
Wol

Concurrent page-fault handling with per-VMA locks

Posted Sep 6, 2022 20:48 UTC (Tue) by Homer512 (subscriber, #85295) [Link]

True. All in all, corbet's answer is probably the better one: A false-positive due to rollover is harmless and just results in semi-random slow-paths. So we might even use a tiny 8 bit counter if it makes struct packing better.

Concurrent page-fault handling with per-VMA locks

Posted Sep 6, 2022 2:20 UTC (Tue) by calumapplepie (guest, #143655) [Link] (2 responses)

Perhaps it's handled by, in the numbers-are-equal case, attempting to acquire mmap_lock. if, on acquisition, we see that the numbers are still equal, we know a rollover must have happened (since the VMA is locked for modification and yet the thread that should be modifiying it... isn't). so we can decrement the VMA sequence number, solving the issue until the next rollover without making it come sooner (as one would if we incremented the central sequence number).

Or maybe the authors didn't think of this case, which would be concerning, since there exists a potential for a deadlock if they did it wrong. We probably can't assume that there even is a current write operation which can complete, since I thin the number is only incremented on write operation completion. If it was incremented on both operation completion and initiation, you'd get that property of no-deadlocks. All VMAs would be unlocked if there is no write operation, regardless of rollover, because the mmap sequence number would be odd if there are modifications occurring and even otherwise, and VMA sequence numbers would only ever be set to odd values.

Both of these are fairly inexpensive solutions, as an extra increment isn't much compared to handling a page fault, and the structure in question will already be loaded into the L1 cache for modification. Similarly, I believe the first solution boils down to an additional decrement as well, since the mmap_lock must be acquired anyways in the fallback case. They also aren't mutually exclusive; just make the fallback path decrement the VMA sequence number by 2 to keep the pattern of odds/evens. I'd encourage the devs to implement both, given the impact that a bug in this could have: at worse, an unpredictable deadlock that only occurs in long-running processes and is virtually impossible to reproduce; at best, degraded performance in long-running processes due to constantly falling back on the worst-case code. I'm willing to bet that *some* process's architecture involves setting up a few large VMAs full of pages being faulted in and out but which never itself gets modified.

Concurrent page-fault handling with per-VMA locks

Posted Sep 6, 2022 6:28 UTC (Tue) by kilobyte (subscriber, #108024) [Link]

If you can't use a large enough counter, what about reads checking if you're dangerously close to a rollover, and if so doing a fake write?

Concurrent page-fault handling with per-VMA locks

Posted Sep 6, 2022 10:56 UTC (Tue) by developer122 (guest, #152928) [Link]

I guess if you want to catch potential bugs like this, you need to init the counter to be close to rollover. I remember some wisdom once (dunno if it's still done) that timeouts in the kernel by default should be set to 10 minutes after boot, to help catch issues early.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds