LWN.net Logo

Inside memory management

Jonathan Bartlett discusses memory management on IBM developerWorks. "Get an overview of the memory management techniques that are available to Linux programmers, focusing on the C language but applicable to other languages as well. This article gives you the details of how memory management works, and then goes on to show how to manage memory manually, how to manage memory semi-manually using referencing counting or pooling, and how to manage memory automatically using garbage collection."
(Log in to post comments)

Inside memory management

Posted Nov 18, 2004 5:16 UTC (Thu) by kyle_hayes (guest, #7904) [Link]

Good points about the article:

- great bibliography. Many people seem to miss Lins&Jones.
- isn't too preachy about any one technique (mostly). Usually articles
like this have some axe to grind.
- there is actual experience showing. Often these articles are written
by someone in an ivory tower who hasn't actually written production code
using any of the things s/he writes about.
- it introduces a number of different techiques and terms.
- the article mentions cache effects. Thank you! These can be very
important in some cases and are something that any new code should take
into account. However, I think that the cache effects for different
kinds of memory managment are not correct in some cases in the article.

Unfortunately, this article conflates GC types (generational,
mark-and-sweep) with implementations (incremental etc.). It also
rehashes the tired old saws of:

1) GC is slower. Really? Hmm, doesn't seem to measure up to actual
measurements of actual programs. See Boehm's papers on this. Others can
be found in Citeseer. GC generally does not have much of a performance
penalty and wins over ignorant manual memory handling practically all the
time. You can always write a program that will defeat a given GC method.
Usually you have to try fairly hard.

2) GC cannot be used for real-time. Since I did my Master's thesis with
a group that had developed a generational hard real-time collector for
writing hard real-time programs in Smalltalk, I have to say that there is
an existence proof that this statement is false. GC is certainly not
widely used in the real-time field, yet.

Perhaps these things were true twenty or more years ago, but they have
not been for some time. What is true (and oddly I don't see it stated as
often) is that several GC techniques take a lot of memory or can have
pathological cache behavior. Can != always do. Fortunately, the article
mentions this.

The article misses that pure two-space copying collectors are far from
common today and that generational collectors have extremely fast memory
allocation speed and are used widely. I consider two-space copying
collectors to be the lowest level of generational collectors.

I do not mean to dump too harshly on the article. It is good that
someone is writing about this because so few people understand memory
management today. It seems that those being trained in programming have
received different messages over time (highly generalized for amusement
value):

1960's -- FORTRAN/COBOL is wonderful. Wait. What are you doing reading
this? You probably actually know what you are doing. Quick, leave the
industry so that the rest of us have a chance to look good!

1970's -- FORTRAN/COBOL is wonderful. Memory allocation? What's that?
If you are advanced, you've heard of C and malloc/free. (MIT LISP/Scheme
people do not count :-) Enlightened CS departments start teaching in
Pascal sometime in here.

Lesson=>there is no memory management. You just don't allocate at run
time.

1980's -- Pascal is wonderful. Allocate in advance and do not try to be
tricky. By the end of this decade, replace Pascal with C++. C is also
very popular, but most teaching seems to be in Pascal during this decade.
A few lucky people got Ada, the Newer, Better Pascal.

Lesson=>Memory management is a bit tricky to get right. Be careful and
do not try to show off. Keep it Simple, Stupid.

1990's -- C++ is wonderful. Teaching shifts to the wonders of operator
overloading and truly byzantine memory management tricks. Most student
programs do not run long enough to leak enough memory to be noticed.
Prozac use among CS students increases noticeably. Let a million
(incompatible and bad) implementations of reference counting bloom! Ever
wondered why there are a million really good C libraries out there and
remarkably few C++ libraries? (This isn't fair, the Boost and Qt
libraries are very good, but they are the exception to the rule.)

Lesson=>Memory management looks simple and requires deep black magic to
understand at all. Smart pointers and threads are better than hard
drugs! You only get to pick one library to use, just one.

2000's -- Java is wonderful. Memory management? We don' need no
stinkin' memory management. Ignore the allocator. Ignore the collector.
Java magically takes care of it all for you. Remember -mx1024 is needed
for most of your programs. Java, the kindler, gentler C++ (tm).

Lesson=>Memory management is so 90's. Real programmers using real
languages worry about more important things like whether to use CamelCase
or u_n_d_e_r_s_c_o_r_e_s in their names.

Memory managment techniques that are in use today mostly date from years
ago. A few techniques (e.g. generational) really saw good
implementations in the late 1980's and early 1990's (see Urs Hoelzle's
papers for examples and references). Smalltalk/Self, LISP and functional
languages have polished and honed these techniques over a period of
decades.

Memory management is such an important part of program and system design
that I find it nearly criminal that it is ignored so often. This article
brings it back to the surface again with a good overview and clear
descriptions.

Best,
Kyle

Copyright © 2004, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds