LWN.net Logo

Garbage collection and MM

Garbage collection and MM

Posted Mar 8, 2007 15:25 UTC (Thu) by pflugstad (subscriber, #224)
In reply to: Garbage collection and MM by aanno
Parent article: Short topics in memory management

But as far as I understand, in many GC algorithms, there is also a 'walker' that marks memory that is not reachable any more. It is a bad idea to walk onto a swapped-out page, though.

This has not been the case for several years now. Most GCs (certainly in Java) systems use a generational/compacting collector. As such, dead objects aren't touched at all. And this was 3+ years ago - it's gotten even better since then. When you do Java, you really do need to re-think how you program.

As someone else said, GC is everywhere these days, even embedded. The ease and clarity of developing in a GC language (Python, Perl, Java, etc), far outweigh the performance penalty you may see with GC. This is especially true for the vast majority of programs where performance is not seriously a concern, such as those with human interactions. I've done a lot C. I've done a lot of C++. I've done Python. I've done Java. I'll take Python/Java 6 days a week (but not twice on Sundays - sometimes you do need performance :-).


(Log in to post comments)

Garbage collection and MM

Posted Mar 8, 2007 16:32 UTC (Thu) by aanno (subscriber, #6082) [Link]

Right. I oversimplified this. Generational/compacting collectors have no walker but (a) have a second indirection (that slows down things) and (b) are not memory efficient (as they roughly use twice the space that is theoretically needed). Certainly Hertz compares his GC against this type. Refer to the paper to get all the details.

Garbage collection and MM

Posted Mar 8, 2007 20:35 UTC (Thu) by ncm (subscriber, #165) [Link]

Python is not, at present, a GC language. Neither is Perl.

However, people are working on GC implementations. Expect complaints about cache abuse by scripts, too, soon.

Garbage collection and MM

Posted Mar 9, 2007 0:04 UTC (Fri) by njs (subscriber, #40338) [Link]

Python is certainly a GC language (since version 2.0). It happens to optimize that GC by using reference counting to catch the easy cases, and only doing mark/sweep type stuff occasionally to catch reference cycles, but it definitely has a full GC.

I'm told that some of the hottest Java GC techniques actually involve reference counting these days, because you can massively optimize your actual walking -- the only time a cycle can be created is when a reference count is decremented, and thus achieves a number greater than 0. This is pretty rare, and it also tells you that any cycle that was just created must involve that object in particular, so you don't have to tromp through all memory either.

None of this affects your original point, though, because reference counting already trashes caches by itself -- especially in the multiprocessor case, where supposedly read-only access to variables is suddenly triggering cache flushes...

Depending on your cache hierarchy and the characteristics of your GC, you can minimize its impact, though. E.g., in gcc, I thought I remember some trick where you only run the collector between passes, since you know already that that's when everything becomes garbage, and also where it doesn't matter if you trash the cache? Similarly, Graydon was saying something about in initial implementations of firefox's GC, they would just run it after page load, because no-one notices if the browser pauses for 400 milliseconds then, they're just looking at the page.

Long run: build garbage collection into the RAM hardware! That'll work around those pesky cache issues ;-)

Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds