User: Password:
|
|
Subscribe / Log in / New account

Garbage collection and MM

Garbage collection and MM

Posted Mar 7, 2007 20:11 UTC (Wed) by aanno (guest, #6082)
Parent article: Short topics in memory management

There are indeed indications for the opinion that collaboration of kernel MM and user space tools could improve performance. Matthew Hertz did some academic research on the topic. His bookmarking collector, a garbage collection algorithm for the Java VM is impressing.

I might be that garbage collection languages will predominant computer programming soon. I could soon be annoying that MM of many OS (including Linux) does not handle garbage collection as good as could be imagined.


(Log in to post comments)

Garbage collection and MM

Posted Mar 8, 2007 8:26 UTC (Thu) by ncm (subscriber, #165) [Link]

It would be most unfortunate if GC-ridden languages came to predominate. It is practically impossible for an OS memory manager to "handle" GC well, just as it is practically impossible for the program itself to do so. It is in the nature of GC to interact badly with caches, and we are ever more dependent on an increasing variety of caches.

Every time the above is pointed out, somebody pops up and says that some new or old wrinkle has potential to mitigate the problems. Invariably there's a paper with lots of artificial benchmarks, running on a machine dedicated to nothing but running those benchmarks. Invariably such programs interact badly with real programs on real machines.

Besides its fundamental problems, GC never advances much because it can't be encapsulated. For every place that needs it, it must be re-done from scratch. The cunning tricks of the last implementation don't work in the next.

Academia can't see limitations of GC because the ideal academic program only ever manages memory. Real-world programs must manage other much more limited resources -- network sockets, database connections, disks -- and any method sufficient to manage them suffices for memory as well. No current language threatens C++ for serious programming, however useful such a language would be. In large part this is because any such rival would first need to impress academics who, for the most part, have no clue about what makes a language actually useful.

Garbage collection and MM

Posted Mar 8, 2007 10:55 UTC (Thu) by aanno (guest, #6082) [Link]

I completly agree with you, mcn. Cache locality will probably never be fixed with GC based languages. But as far as I understand, in many GC algorithms, there is also a 'walker' that marks memory that is not reachable any more. It is a bad idea to walk onto a swapped-out page, though.

And I could imagine that this problem could be diminished. The paper I linked to takes this direction.

Even if GC based language will not predominate, they have to be taken into account more than let's say 20 years ago. The fact that many Linux users don't like Java or .NET will not make this langauges to disappear. And even if they would, what's about Python, Ruby, Perl and the like?

GC is even mentioned in this discussion to be used within firefox (see above). And I heart that the GCC suite also uses GC (beginning with version 3.0).

GC will always be problematic. But it is a thing that could be improved by collaboration of kernel and user land knowledge. My point of view is that the research of Hertz points to the right direction. I was interesting to read that this sort of collaboration could also solve other MM related problems.

Garbage collection and MM

Posted Mar 8, 2007 12:22 UTC (Thu) by nix (subscriber, #2304) [Link]

Yes, GCC 3.0 did start using GC (because the lifetime rules of objects in the compiler were unfathomably complex). It fixed an unknown but large number of bugs... and slowed down the compiler a *lot* due to cache locality issues.

Years ago now, someone (Mike Stump?) ran some benchmarks that showed GCC incurring cache stalls every *twenty instructions* or thereabouts. Small wonder that it slowed down!

Careful moves are now underway (and have been for a while) to migrate objects with simple lifetime rules back into obstacks. The obstacks are still garbage-collected, but the obstack is a *single* GCed object with good cache locality, where the myriads of objects it replaces were not.

I doubt GCC will ever leave GCC, either: for objects with complex lifetime rules, there's really no maintainable alternative. But for the simple ones, using suballocators with better cache locality (like obstacks) is a good idea.

Garbage collection and MM

Posted Mar 8, 2007 15:25 UTC (Thu) by pflugstad (subscriber, #224) [Link]

But as far as I understand, in many GC algorithms, there is also a 'walker' that marks memory that is not reachable any more. It is a bad idea to walk onto a swapped-out page, though.

This has not been the case for several years now. Most GCs (certainly in Java) systems use a generational/compacting collector. As such, dead objects aren't touched at all. And this was 3+ years ago - it's gotten even better since then. When you do Java, you really do need to re-think how you program.

As someone else said, GC is everywhere these days, even embedded. The ease and clarity of developing in a GC language (Python, Perl, Java, etc), far outweigh the performance penalty you may see with GC. This is especially true for the vast majority of programs where performance is not seriously a concern, such as those with human interactions. I've done a lot C. I've done a lot of C++. I've done Python. I've done Java. I'll take Python/Java 6 days a week (but not twice on Sundays - sometimes you do need performance :-).

Garbage collection and MM

Posted Mar 8, 2007 16:32 UTC (Thu) by aanno (guest, #6082) [Link]

Right. I oversimplified this. Generational/compacting collectors have no walker but (a) have a second indirection (that slows down things) and (b) are not memory efficient (as they roughly use twice the space that is theoretically needed). Certainly Hertz compares his GC against this type. Refer to the paper to get all the details.

Garbage collection and MM

Posted Mar 8, 2007 20:35 UTC (Thu) by ncm (subscriber, #165) [Link]

Python is not, at present, a GC language. Neither is Perl.

However, people are working on GC implementations. Expect complaints about cache abuse by scripts, too, soon.

Garbage collection and MM

Posted Mar 9, 2007 0:04 UTC (Fri) by njs (guest, #40338) [Link]

Python is certainly a GC language (since version 2.0). It happens to optimize that GC by using reference counting to catch the easy cases, and only doing mark/sweep type stuff occasionally to catch reference cycles, but it definitely has a full GC.

I'm told that some of the hottest Java GC techniques actually involve reference counting these days, because you can massively optimize your actual walking -- the only time a cycle can be created is when a reference count is decremented, and thus achieves a number greater than 0. This is pretty rare, and it also tells you that any cycle that was just created must involve that object in particular, so you don't have to tromp through all memory either.

None of this affects your original point, though, because reference counting already trashes caches by itself -- especially in the multiprocessor case, where supposedly read-only access to variables is suddenly triggering cache flushes...

Depending on your cache hierarchy and the characteristics of your GC, you can minimize its impact, though. E.g., in gcc, I thought I remember some trick where you only run the collector between passes, since you know already that that's when everything becomes garbage, and also where it doesn't matter if you trash the cache? Similarly, Graydon was saying something about in initial implementations of firefox's GC, they would just run it after page load, because no-one notices if the browser pauses for 400 milliseconds then, they're just looking at the page.

Long run: build garbage collection into the RAM hardware! That'll work around those pesky cache issues ;-)

GC languages and domination

Posted Mar 8, 2007 14:46 UTC (Thu) by kevinbsmith (guest, #4778) [Link]

For web servers, GC languages already dominate. Java, obviously, plus PHP, perl, python, and ruby. Who writes CGI in C or C++ any more? Sure, a few folks, but not many.

On the desktop, I can't think of a single GUI app that I would rather write (or see written) in C or C++ instead of one of the languages mentioned above. Heck, even command-line utilities are often (usually?) written in perl or some other scripting language (not to mention bash). I guess it depends on your definition of "serious" programming.

Like it or not, GC is pervasive, and still increasing in popularity. It just makes sense to have the inexpensive computers do the extra work instead of the expensive programmers.

GC languages and domination

Posted Mar 8, 2007 19:41 UTC (Thu) by nevyn (guest, #33129) [Link]

For web servers, GC languages already dominate.

Err, no. Yaws is about the only major Web server that isn't written in C. And that's entirely because C is the only thing that performs well enough.

For the CGI like "backend", yes GC languages dominate mainly because performance isn't as big a problem and the real web server can take actions to limit the performance problem of the GC'd language code. Also the person running the application is often closely tied to the person writting it, and so they can throw money at their users performance problems.

On the desktop, I can't think of a single GUI app that I would rather write (or see written) in C or C++ instead of one of the languages mentioned above.

There are still very few GUI applications that aren't written in C, and again it's mainly because of performance and memory usage. For instance the "revelation GNOME applet" is currently ~125MB big, with an RSS of 32MB; this is a python application that provides a single text entry and an icon on my panel ... it is far from unique. About the only major GC'd application I use is xemacs, and it's all too often that I reboot it due to memory usage spiraling out of control (and I wouldn't call it fast).

It just makes sense to have the inexpensive computers do the extra work instead of the expensive programmers.

That is wrong in two ways: 1) The computers are now not doing real work for their users, instead they are doing busy work for the programers (on the users time). 2) Doing it properly is often not that expensive for a good programer, who already has to manage other reasources. But, yes, users are often still letting programers charge them millions of units of work in exchange for not having to do a single unit themself. I doubt any economy can make this sustainable, long term, and you only have to look at people using dillo and/or lighttpd to see the choices being made.

GC languages and domination

Posted Mar 9, 2007 11:45 UTC (Fri) by aanno (guest, #6082) [Link]

This is a very biased opion. Dynamic web content is often delivered by J(2)EE applications - especial in (big) enterprise environments. The infrastructure for this is also Java based, like like JBoss or Tomcat. There are also desktop application that uses GC based languages: Eclipse, NetBeans, beagle.

In enterprise environments Eclipse RCP has become the plattform for fat client programming.

On the other hand there are applications written in C/C++ that waste tons of memory, like Firefox or Gaim.

GC languages and domination

Posted Mar 12, 2007 11:04 UTC (Mon) by ekj (guest, #1524) [Link]

1) The computers are now not doing real work for their users, instead they are doing busy work for the programers (on the users time).

Users are free to choose. If programs written in non-GCed languages where enough faster that this mattered to the users, they'd be perfectly free to use those programs then. For some kinds of programs this *is* the case. The inner loop of a FPS-game is probably better written with explicit memory-handling.

For other uses, this doesn't seem to be the case. Most web-apps are infact written in GCed languages. There is absolutely *nothing* stopping you from developing competing programs in say C, and if you're rigth, that the users really would prefer this, you'd make billions. I somehow think that'll fail to be the case though.

The thing you're missing is that computing-power really is cheap, and often cheap enough to be almost completely ignorable. The company I work for, for example, spend on the order of $1million/year on developing web-applications. The hardware for running all of this costs something like literally 5% of this, and that is *including* backups, sysadmin-stuff and the like.

Even if we could run the same stuff on a 486 if it was written in C, it wouldn't be worth it if that meant more than 3-4% extra development-time. 2) Doing it properly is often not that expensive for a good programer, who already has to manage other reasources.

Doing software-development "properly" is very expensive. So expensive that if you do custom-development it is going to completely -dwarf- the hardware-requirements in 95% of the cases.

Spending a year of work and $5000 of hardware for doing the same thing that could be done with 6 months of work and $10.000 of hardware is the completely unsustainable choice -- You save $5000 in hw and spend ten times that in extra development-costs.

If you can do in 12 months in C the same job that require 11 months in Python/Ruby/Php/whatever, then more power to you. Most people can't though, not even smart people.

Garbage collection and MM

Posted Mar 9, 2007 21:39 UTC (Fri) by cpeterso (guest, #305) [Link]

Perhaps this is an argument for putting GC in the kernel itself? Instead of userspace programs solving the same complicated memory management problems again and again, consolidate the solution in the kernel, close it the OS's MM code.

Garbage collection and MM

Posted Mar 12, 2007 10:50 UTC (Mon) by ekj (guest, #1524) [Link]

But the thing is, it appears, in practice, really really hard for human brains to handle memory-allocation well too.

Sure, sure, "just do it correctly" would work, in principle. Except that in *practice* we've been using C for like forever in computer-terms, and *still* the classical memory-managment problems keep coming up, even in well-audited clueful code. So, obviosuly, "just do it correctly" isn't going to solve the problem.

So, who is most likely to improve their ability of handling memory-allocation? Computers (who grow in various ways by leaps and bounds) or human beings (who's been struggling with manual memory-managment in C for decades, and this far seems to be making very very little, if any, progress.)

Also, the overwhelming part of code written does not care about performance. They don't care *enough* to be willing to take the extra hit on development-time needed to do manual memory-managment anyway.

It's not about laziness. I don't particularily care if my employer wishes to hire me for a week to do something in Python, or if he prefers paying me for 2 weeks to solve the same problem in C. (or for that matter a month and solve it in assembler)

My employer cares though. He wants a problem solved. He'll probably opt for the python-version, even if it runs 3 times slower. Especially since he knows that it's a simple thing to re-write any routines that *do* need performance in C if that should turn out to be nessecary.

Garbage collection and MM

Posted Mar 13, 2007 19:24 UTC (Tue) by pimlott (guest, #1535) [Link]

You make a good case, but ...
Real-world programs must manage other much more limited resources -- network sockets, database connections, disks -- and any method sufficient to manage them suffices for memory as well.
comparing the problem of managing memory with managing other resources is off-base. Other resources almost always have simple, obvious lifetime rules that make explicit management straight-forward. Furthermore, there are many fewer of them (so fewer places to make mistakes, easier to audit), and errors are usually detected quickly because the resource has an externally-visible behavior. And if still explicit management is too difficult, reference counting solves the problem neatly because there are not enough objects to cause a performance impact, and there is no possibility of reference loops.

Memory management is fundamentally much harder.

Garbage collection and MM

Posted Mar 15, 2007 11:21 UTC (Thu) by renox (subscriber, #23785) [Link]

Another paper on VM aware GCs: first link when looking for 'vm aware garbage collector' on Google.

They got significant improvement on their benchmark by making the VM and the GC communicate.
Of course whether this show real like improvement is anyone guess..

One annoying thing with these papers is that they use copying GCs which doesn't interact well with C-based libraries: they're useful only for Java not the other scripting language which tend to reuse C-based libraries..


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds