|| ||Daniel Phillips <firstname.lastname@example.org>|
|| ||Andrew Morton <email@example.com>,
Linus Torvalds <firstname.lastname@example.org>,
Marcelo Tosatti <email@example.com>|
|| ||[RFC] Alternative raceless page free|
|| ||Thu, 5 Sep 2002 06:42:12 +0200|
|| ||firstname.lastname@example.org, Christian Ehrhardt <email@example.com>|
For completeness, I implemented the atomic_dec_and_test version of raceless
page freeing suggested by Manfred Spraul. The atomic_dec_and_test approach
eliminates the free race by ensuring that when a page's count drops to zero
the lru list lock is taken atomically, leaving no window where the page can
also be found and manipulated on the lru list. Both this and the
extra-lru-count version are supported in the linked patch:
The atomic_dec_and_test version is slightly simpler, but was actually more
work to implement because of the need to locate and eliminate all uses of
page_cache_release where the lru lock is known to be held, as these will
deadlock. That had the side effect of eliminating a number of ifdefs vs the
lru count version, and rooting out some hidden redundancy.
The patch exposes __free_pages_ok, which must called directly by the
atomic_dec_and_lock variant. In the process it got a less confusing name -
recover_pages. (The incumbent name is confusing because all other 'free'
variants in addition manipulate the page count.)
It's a close call which version is faster. I suspect the atomic_dec_and_lock
version will not scale quite as well because of the bus-locked cmpxchg on the
page count (optimized version; unoptimized version always takes the spinlock)
but neither version really lacks in the speed department.
I have a slight preference for the extra-lru-count version, because of the
trylock in page_cache_release. This means that nobody will have to spin when
shrink_cache is active. Instead, freed pages that collide with the lru lock
can just be left on the lru list to be picked up efficiently later. The
trylock also allows the lru lock to be acquired speculatively from interrupt
context, without a requirement that lru lock holders disable interrupts.
Both versions are provably correct, modulo implementation gaffs.
The linked patch defaults to atomic_dec_and_lock version. To change to
the extra count version, define LRU_PLUS_CACHE as 2 instead of 1.
Christian, can you please run this one through your race detector?
 As a corollary, pages with zero count can never be found on the lru list,
so that is treated as a bug.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to firstname.lastname@example.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/