User: Password:
|
|
Subscribe / Log in / New account

1978 called, wants its garbage collectors back

1978 called, wants its garbage collectors back

Posted Sep 4, 2008 9:51 UTC (Thu) by rwmj (subscriber, #5474)
Parent article: The Kernel Hacker's Bookshelf: UNIX Internals

Newsflash!

Properly written garbage collectors don't cause "unpredictable system-wide performance slowdowns during garbage collection" any more.

Since the 1980s people have understood how to limit the amount of work done by running the garbage collector in "slices". It is also now well-understood how to write efficient concurrent garbage collectors (well-suited to the multicore systems of today). We also now know how to trade off performance vs wasted memory in garbage collection. Some of the most advanced garbage collectors can make real time guarantees.

Given that the Linux kernel is doing a lot of reference counting, which is the worst performing, most wasteful form of garbage collection there is, perhaps it is time to look again at doing real GC.

Rich.


(Log in to post comments)

1978 called, wants its garbage collectors back

Posted Sep 4, 2008 13:04 UTC (Thu) by i3839 (guest, #31386) [Link]

I thought Python's GC also used reference counting. What better alternatives are there? Scanning memory for anything that looks like a pointer doesn't seem to be one. Perhaps built a reference graph of all allocated memory, and remove all disjunct pieces or something? But that seems to have more overhead than reference counting. I don't know much about GC...

1978 called, wants its garbage collectors back

Posted Sep 4, 2008 13:22 UTC (Thu) by knobunc (subscriber, #4678) [Link]

The "What the Heck Is?" series of posts by Dan Sugalski was good:
http://www.sidhe.org/~dan/blog/archives/cat_what_the_heck...

In particular he has one about GC:
http://www.sidhe.org/~dan/blog/archives/000200.html

He was posting good overviews of CS topics so that people could work on Parrot (the proposed Perl VM).

But a more thorough review of GC technology on LWN would be appreciated!

-ben

1978 called, wants its garbage collectors back

Posted Sep 4, 2008 14:06 UTC (Thu) by rwmj (subscriber, #5474) [Link]

I don't know much about GC...

So it seems. Have a look at Wikipedia then for the serious end of garbage collection take a look at Richard Jones [no relation]'s garbage collection resources page and in particular his book.

Rich.

1978 called, wants its garbage collectors back

Posted Sep 5, 2008 13:12 UTC (Fri) by i3839 (guest, #31386) [Link]

Umm, Wikipedia just mentions two basic types of GC, a reachability based one and a reference counting based one, and I mentioned both methods without knowing a thing about GC. Same for the "What the heck is: Garbage Collection" article mentioned in the comment above, and the book doesn't seem to mention anything fundamentally different either, just goes into details (same ideas, different algorithms, though interesting ones. Especially the copying one is smart, as it solves the fragmentation problem). Isn't there another paradigm of doing GC out there? Or is the little I know really all that's to know about GC on a high level?

I was totally wrong on the overhead/cost though. ;-)

1978 called, wants its garbage collectors back

Posted Oct 11, 2008 9:24 UTC (Sat) by anomalizer (guest, #53112) [Link]

If you are interested in GC options in general, try reading up about the various GC developments that have been happening in the Java world. Things have moved a long way from reference counting algoithms. The caveat of course is that what works for Java is not going to work for the kernel.

But still, no point in using GC if not needed

Posted Sep 4, 2008 15:45 UTC (Thu) by PaulMcKenney (subscriber, #9624) [Link]

There are indeed a number of high-performance garbage collectors, even real-time garbage collectors. However, even these garbage collectors impose significant overhead and slowdown -- the overhead/slowdown is not eliminated, but rather spread out in bite-sized chunks over time. Alternatively, high-priority tasks are permitted to preempt the garbage collector, which has its own set of advantages and disadvantages. And the key point is that KMAs can be constructed without requiring garbage collection in the common case.

Less-common cases where KMA garbage collection can be helpful include: (1) when flat out of memory, reclaiming blocks from the per-CPU caches lets the system creep along a bit farther, perhaps giving OOM a chance to do its job, (2) when shipping blocks from one NUMA node to another upon kfree(), one might use a timeout to avoid leaving blocks pending for KMAs that care about segregating memory on a NUMA basis (and DYNIX/ptx needed to care, given the huge remote latencies of the NUMA-Q hardware), (3) where real-time response is required, high-priority processes might just dump memory onto a per-CPU list and let low-priority processes clean up after them, and (4) if the Linux kernel ever supported real garbage collection. I don't expect this latter any time soon, especially given that a recent attempt to write a kernel in a garbage-collected language encountered severe GC-related overhead (http://web.cecs.pdx.edu/~mpj/pubs/plos07.html). :-)

1978 called, wants its garbage collectors back

Posted Sep 4, 2008 16:17 UTC (Thu) by BrucePerens (guest, #2510) [Link]

This is a matter for much religious argument, although in cases like the kernel it's probably not a significant use of time or resources. Reference counting can be bug-prone, if you aren't using a language like C++ where it can be entirely automated. It is deterministic, uses very small slices per iteration, and works well at interrupt level. Even with reference counting, GC may be necessary to take care of a circular reference.

I prefer a double-indirect system with reference counting and occasional GC, because it can resolve the fragmentation problem. CPUs could be built to optimize this similarly to the way they handle TLBs.

1978 called, wants its garbage collectors back

Posted Sep 4, 2008 16:36 UTC (Thu) by rwmj (subscriber, #5474) [Link]

It is deterministic, uses very small slices per iteration [...]

But is it? Is there not a case where you're freeing up blocks "deterministically" and then the block you just freed happens to complete a large hole and kfree/free suddenly goes off and does a lot of work to update its internal structures? (Or the corresponding case when allocating).

These costs are often minimized by proponents of manual allocation. There's a rather well-known paper about this which goes into the subject in a lot of detail, and comes to some mixed conclusions. (Mind you, the benefit of GC is it massively simplifies programming and removes a huge source of error, which they don't take into account).

Rich.

1978 called, wants its garbage collectors back

Posted Sep 4, 2008 17:10 UTC (Thu) by vaurora (guest, #38407) [Link]

Your description made me curious so I skimmed the paper. I wouldn't describe this as a "mixed conclusion":

When garbage collection has five times as much memory as required, its runtime performance matches or slightly exceeds that of explicit memory management. However, garbage collectionÂ’s performance degrades substantially when it must use smaller heaps. With three times as much memory, it runs 17% slower on average, and with twice as much memory, it runs 70% slower.

I'm afraid five times the memory consumption is not an option for the kernel. Your point about removing a source of programming error is well taken, but more applicable to random user space widgets.

1978 called, wants its garbage collectors back

Posted Sep 4, 2008 17:12 UTC (Thu) by BrucePerens (guest, #2510) [Link]

Is there not a case where you're freeing up blocks "deterministically" and then the block you just freed happens to complete a large hole and kfree/free suddenly goes off and does a lot of work to update its internal structures?

I don't see how GC would improve that situation. It's reasonably trivial to code a kernel allocator that doesn't go into some loosely-bounded task at interrupt level.

GC has its own errors, especially if it's the kind that assumes any word might be a pointer.

1978 called, wants its garbage collectors back

Posted Oct 19, 2008 6:00 UTC (Sun) by rarecactus (guest, #51688) [Link]

Reference counting has a number of advantages. Namely:

1) Lowest space overhead of any garbage collection scheme.
Memory is released immediately after it is no longer used, not "whenever I get around to it."

I would rather not see the OOM killer kill my processes just because someone in the kernel was slow to release what he allocated.

2) The best cache locality of any garbage collection scheme.
In the kernel, we always have to explicitly say when we're done with an object. That will always involve making some change to the object in memory. While it's loaded into the CPU's cache, why not free it as well?

If you sweep the trash under the rug, you just have to spend more effort puling it out later. Why not put it in the trash can the first time around?

Have you looked at a chart of gates per CPU versus DRAM clock lately? They are growing at different rates-- those curves have different big-Os with respect to time. Algorithms like copying garbage collecting which involve touching lots and lots of memory look increasingly old-fashioned. They are guaranteed to ruin the cache. In userspace you also have the problem of the GC accessing memory paged out to disk, which leads to terrible performance.

3) It allows the programmer to specify a "destructor" type callback which can run when the resource is no longer needed.
This is a nice bonus.

This was one of the best features of C++. The fact that Java lacks real destructors is an endless source of pain for people working in that language, and motivated the inclusion of a similar feature in C#.


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds