Our bloat problem

Posted Aug 7, 2005 20:56 UTC (Sun) by oak (guest, #2786)
In reply to: Our bloat problem by hp
Parent article: Our bloat problem

I don't know how to get very useful numbers out of top myself - what one would want for a bloat metric is "malloc'd RAM unique to this process" or something,

Malloc => heap. If the program has written to the allocated heap page, it's private. I don't see why program would allocate memory without writing to it, so in practice all of heap can be considered private to the process.

You can already see heap usage from /proc and with Valgrind you can actually get a graph where it goes.

perhaps "plus the size of each in-use shared page divided by number of apps currently sharing it,"

During his Guadec 2005 speech, Robert Love mentioned a kernel patch which will produce the information about how much memory is private (dirty = allocated heap that has been written to, shared library relocation tables etc.) to a process. He promised to add a link to it on to the Gnome memory reduction page.

perhaps "plus resources allocated on the X server side on behalf of this app."

XresTop tells this. Some programs can push amazing amounts of memory to the Xserver (the huge number number shown in 'top' for Xserver comes from memory mapping the framebuffer though I think).

Our bloat problem

Posted Aug 7, 2005 21:59 UTC (Sun) by hp (guest, #5220) [Link] (1 responses)

Of course you can figure this out painstakingly. What you can't do though is get all that info aggregated for 1) each process and 2) a current snapshot of all processes, in an easy way.

Aggregating this stuff into a snapshot of the whole system at a point in time would let you really point fingers in terms of bloat and figure out where to concentrate efforts.

It's not easy enough now, which is why people just use "top" and its misleading numbers.

Even better of course would be to take multiple snapshots over time allowing observation of what happens during specific operations such as log in, load a web page, click on the panel menu, etc.

A tool like this would probably be pretty handy for keeping an eye on production servers, as well.

Our bloat problem

Posted Aug 18, 2005 20:10 UTC (Thu) by oak (guest, #2786) [Link]

> Aggregating this stuff into a snapshot of the whole system at a point in
> time would let you really point fingers in terms of bloat and figure out
> where to concentrate efforts.

If the patch Robert mentioned is the same which results I have seen
(post-processed), you can already get that information from Linux kernel
by patching it a bit. The results I saw included machine wide statistics
and process specific stats of how many pages reserved for the process were
ro/rw/dirty, how much each linked library accounts for the process memory
usage (relocation table sizes etc., not how much each library allocates
heap for the process[1]). This is quite useful for system overview,
whereas valgrind/Massif/XResTop/top tell enough of individual
applications.

[1] that you can get just with a simple malloc wrapper that gets stack
traces for each alloc and some data post-processing heuristics to decided
which item in the stack trace to assign the guilt. The hard part is
actually the post-processing, deciding who in e.g. the allocation chain of
App->GtkLabel->Pango->Xft2->FontConfig->Freetype should be blamed for the
total of the allocations done at any point in the chain as you don't know
whether the reason for the total allocations is valid or not without
looking at the code...

Best would be an interactive allocation browser similar to Kcachegrind
with which one could view also the source along with accumulated
allocation percentages.

Our bloat problem

Posted Aug 11, 2005 12:24 UTC (Thu) by anton (subscriber, #25547) [Link]

I don't see why program would allocate memory without writing to it

I do this when I need contiguous memory, but don't know how much. Then I allocate lots of space; unused memory is cheap. The result looks like this:

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
anton    17100  0.0  0.2   6300  1196 pts/0    S+   14:16   0:00 gforth

The large VSZ is caused mainly by unused allocated memory.

So if you want to know the real memory usage, counting private anonymous mappings is not good enough.