LWN.net Logo

KS2012: memcg/mm: Dirty/writeback LRU

By Michael Kerrisk
September 17, 2012
2012 Kernel Summit

During the 2012 Kernel Summit memcg/mm minisummit, Fengguang Wu discussed an adjustment to the current LRU implementation, whereby the LRU (least recently used) list is split into anonymous and file LRU lists. Fengguang has a proposal to split the file LRU list into clean and dirty lists. The proposed idea would not necessarily enforce 100% clean or dirty pages on the lists, but that is a minor detail that does not affect his goal. The general objective is that, in the presence of a process that is dirtying a large number of pages, other processes do not pay a high cost for scanning dirty pages on the list. This would be particularly true for cgroups, where it can happen that 100% of the inactive list is dirty pages.

There was not much support in the room for splitting the LRU in this manner, since doing so loses aging information. There are already problems selecting the ratio for scanning anonymous pages and file-backed pages and some developers felt that adding yet another LRU list would compound the problem. It would require very compelling evidence to merge such a feature.

However, there was support for adding a discard list to which pages marked PageReclaim get moved. (If the page-reclaim algorithm encounters a page at the end of the LRU that is currently being written to disk, then it marks it PageReclaim instead of waiting on the I/O to complete. When the I/O completes, the page is now considered clean and is immediately discarded. For the curious, the number of pages that are treated this way is recorded in the nr_vmscan_immediate_reclaim counter in /proc/vmstat.) The pages would be scanned only once by reclaim and the aging information would no longer be relevant, since the pages are going to be reclaimed immediately when writeback completes.

A patch exists that implements this separate LRU list. Unfortunately the patch is buggy, and gets the accounting of how many pages are on each LRU incorrect. To properly implement it would require a page flag, but there are currently no free flags available. The feature is not considered important enough to justify increasing the size of the page-flags bit mask to 64 bits. Johannes Weiner is going to check to see if he can free up one of the existing page flags, so as to allow this patch to be finished and merged.

Next: Proportional I/O controller


(Log in to post comments)

Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds