ELC: How much memory are applications really using?
[Posted April 18, 2007 by corbet]
Anybody who has tried to figure out why a Linux system is running short of
memory can attest that the memory usage information made available by the
kernel is, at best, difficult to use. Matt Mackall has recently been
working on
a
set of patches aimed at improving this situation. Given the
constraints imposed by embedded Linux systems, it is not surprising that
Matt chose the Embedded Linux Conference to present his work (which, incidentally, was funded by the
Consumer Electronics Linux Forum).
Matt pointed out that the currently-available information is confusing at
best. The page cache muddies the situation, and the sharing of pages
between applications complicates things even more. The result is that it
is hard to say where memory is being used; one can't even get a definitive
answer to the question of how big a specific application is. More detailed
questions - such as which parts of an application are using the most memory
- are even harder to answer. Trying to answer questions of interest to
embedded systems developers - how many applications can run on a specific
device without pushing it into thrashing, for example - is nearly
impossible without simply running a test.
The problem is that the numbers exported by the current kernels are nearly
meaningless. The reported virtual size of an application is nearly
irrelevant; it says nothing about how much of that virtual space is
actually being used. The resident set size (RSS) number is a little
better, but there is no information on sharing of pages there. The
/proc/pid/smaps file gives a bit of detail, but also lacks
sharing information. And the presence of memory pressure can change the
situation significantly.
The Linux virtual memory system, in other words, is a black box which
provides too little information on what is going on inside. Matt's project
is to open up that box and shine some light inside.
The first step is to add a new file (pagemap) in each process's
/proc directory. It is a binary file containing the page frame
number for each page in the process's address space. The file can be read
to see where a process's pages have been placed and, more interestingly, it
can be compared between processes to see which pages are being shared.
Matt has a little graphical tool which can display this file, showing the
patterns of which pages are present in memory and which are not.
Then, there is a file (/proc/kpagemap) which provides information
about the kernel's memory map. For each physical page in the system,
kpagemap contains the mapping count and the page flags. This
information can be used to learn about sharing of pages and about how each
page is being used. There were a couple of graphical applications using
this file as well; one showed the degree to which each page is being
shared, while the other showed the use of each page as determined by its
flags.
Once this information is available, one can start to generate some useful
numbers on memory use. Matt is proposing two new metrics. The
"proportional set size" (PSS) of a process is the count of pages it has in
memory, where each page is divided by the number of processes sharing it.
So if a process has 1000 pages all to itself, and 1000 shared with one
other process, its PSS will be 1500. The unique set size (USS), instead,
is a simple count of unshared pages. It is, for all practical purposes,
the number of pages which will be returned to the system if the process is
killed.
These numbers are relatively expensive to calculate, since they required a
pass through the process's address space. So they will not be something
which is regularly exported from the kernel. They can be calculated in
user space using the pagemap files, though. Matt demonstrated a couple of
tools to do these calculations. Using "memstats" on a galeon
process, he supplemented the currently-available virtual size and resident
set size numbers (105MB and 41MB, respectively) with a PSS of 26MB and a
USS of 20MB. There is also a "memrank" tool which lists processes
in the system sorted by decreasing PSS. With a tool like that, finding the
memory hogs on the system becomes a trivial task.
Matt pointed out that these numbers, while useful, will change depending on
the amount of memory pressure being experienced by the system. It would be
nice to be able to figure out how much memory a given process truly needs
before it will begin to thrash. To this end, his patch creates a new
clear_refs file for each process; this file can be used to reset
the "referenced" flag on each page in the process's working set. After the
process runs for a bit, one can look at which pages have had their
referenced bits set again; those are the pages it actually needed to run
during that time.
The patches are in the -mm tree currently; it's possible that they could
find their way into the mainline once the 2.6.22 merge window opens up.
Those who would like to play with Matt's scripts can find them in this directory; the slides from
his talk are packaged there as well. With luck,
understanding system memory usage will require far less guesswork in the
near future.
(
Log in to post comments)