By Michael Kerrisk
April 3, 2013
Minchan Kim's recent patch series to
provide user-space-triggered reclaim of a process's pages represents one
more point in a spectrum that increasingly sees memory management on Linux
as a task that is indirectly influenced or even directly controlled from
user space.
Approaches such as Android's low-memory killer represent one end of the
page-reclaim spectrum, where memory management is primarily under kernel
control. When memory is low, the low-memory killer picks a victim process
and kills it outright; applications that live in such an environment have
to work with the possibility that they may disappear from
one moment to the next. As Minchan pointed out via an amusing example, the
effects of the low-memory killer's approach to page reclaim can be extreme:
[Having a process killed to free memory] was really terrible
experience because I lost my best score of game I had ever after I
switch the phone call while I enjoyed the game
Jon Stultz's volatile ranges work and Minchan's own work on a similar
feature (both described in this article)
represent a middle point in the spectrum. The volatile ranges approach,
inspired by Android's ashmem, provides a process with a way to inform the
kernel that a certain range of its own virtual address space can be
preferentially reclaimed if memory pressure is high. Under this approach,
the kernel takes no responsibility for the contents of the reclaimed pages:
if the kernel needs the memory, the page contents are discarded, and it is
assumed that the application has sufficient information available to
re-create those pages with the right contents if they are needed. As with
the low-memory killer, the decision about if and when to reclaim the pages
remains with the kernel.
By contrast, Minchan's patch set places the decision about when to
reclaim pages directly under the control of user space. The interface provided by
these patches is simple. A /proc/PID/reclaim file is
provided for each process. A process with suitable permissions—that
is, a process owned by root or one with the same user ID as the
process PID—can write one of the following values to
the file, to cause some or all of the process's pages to be reclaimed:
-
Reclaim all process pages in file-backed mappings.
-
Reclaim all process pages in anonymous
(MAP_ANONYMOUS) mappings.
-
Reclaim all pages of the process (i.e., the combination of 1 and 2).
As currently implemented, all of the process's pages that match the
specified criterion are reclaimed. Your editor
wondered
whether there might be benefit in allowing some control over the
range of pages that are reclaimed from the target process, by allowing an
address range to be written to the /proc/PID/reclaim file.
By contrast with volatile ranges and the low-memory killer,
modifications in pages reclaimed via /proc/PID/reclaim are
not lost. Modified pages are written to the underlying file in the case of
shared (MAP_SHARED) file mappings or to swap in other cases. Thus,
if the process touches the reclaimed page later, it will be faulted into
memory with the contents at the time it was reclaimed. The patches also
include some logic to handle the case where multiple processes are sharing
the same pages; in that case, the pages are reclaimed only after all of the
processes have marked them for reclaim. Like the low-memory killer,
/proc/PID/reclaim can be used to reclaim all of the pages
in a process, but without needing to kill the process to do so.
The idea behind Minchan's proposal is that a user-space task manager
could take over some part of the job of memory management. In some cases,
this may be more effective than allowing the kernel to make
memory-management decisions, since the user-space task manager can bring
application-specific intelligence to decisions about whether to reclaim a
process's pages. For example, some application processes may be in the
foreground while others are in the background. It may desirable to
preferentially reclaim pages from one of the background processes, even if
it has some frequently accessed pages. Of course, the task manager would
somehow need to know when the system is under memory pressure. To that end,
a mechanism like Anton Vorontsov's proposed vmpressure_fd() API might be useful.
Minchan's patches apply against Michal Hocko's MMOTM
(memory management of the moment) tree. The patches came out on March
25, but have so far seen little review. Nevertheless, they present an idea
that will probably be of particular interest for the developers of mobile
and embedded devices and thus it seems likely that they will get some
attention at some point in the future.
(
Log in to post comments)