Not logged in
Log in now
Create an account
Subscribe to LWN
LWN.net Weekly Edition for June 20, 2013
Pencil, Pencil, and Pencil
Dividing the Linux desktop
LWN.net Weekly Edition for June 13, 2013
A report from pgCon 2013
Posted Jun 17, 2010 4:24 UTC (Thu) by ncm (subscriber, #165)
GC subsystems promoted as solving all crippling problems have existed long enough that if they really were usable, they would be used everywhere. Apparently, either such GC solutions cannot be packaged in such a way that they can be used in each new project, or they impose constraints incompatible with other system-level requirements. Or both. Or, maybe, no GC system actually does solve all the crippling problems, in real programs that run on real machines
Posted Jun 17, 2010 5:22 UTC (Thu) by flewellyn (subscriber, #5047)
Posted Jun 17, 2010 6:49 UTC (Thu) by ncm (subscriber, #165)
Posted Jun 17, 2010 14:01 UTC (Thu) by nix (subscriber, #2304)
Part of the problem is that writing a GC that tracks stuff residing on the C stack requires either pervasive changes to how your program does *all* its memory management, or (for things like the Chicken collector) pervasive changes to how your functions call each other and how the stack is managed. (The Boehm GC, via a triumph of inspired hackery, manages to do the first without source code changes: but it's computationally expensive as GCs go, not type-accurate, and can sometimes leave extra garbage lying around. I don't really understand why more C programs don't use the Boehm GC, though, it's a hell of a lot better than no GC at all. Perhaps C programmers *like* manual memory management? :/ )
If you can guarantee that you'll never have pointers to GCed objects anywhere on the C stack when a GC happens, it gets much much simpler, but this is a really rather harsh requirement and most systems using a GC cannot satisfy it.
Posted Jun 17, 2010 14:18 UTC (Thu) by Tet (subscriber, #5433)
Yes, some of us do. Or rather, I don't like doing it in and of itself. But I love the determinism of it.
Posted Jun 17, 2010 15:16 UTC (Thu) by nix (subscriber, #2304)
But when you're dealing with complex enough data structures (lots of cycles, trees with pointers to each other and so on), explicit memory management moves from 'deterministic' to 'lethally complex and horrifically annoying'. That's when GC really starts to be useful (and that's why GCC has a special-purpose garbage collector of its very own.)
Posted Jun 18, 2010 13:47 UTC (Fri) by i3839 (guest, #31386)
Seriously, you have to keep track of all links and pointers and datastructures anyway, memory management is only very little additional work. If it's lethally complex and horrifically annoying then your other code is too.
Posted Jun 20, 2010 20:11 UTC (Sun) by nix (subscriber, #2304)
Posted Jun 22, 2010 13:48 UTC (Tue) by i3839 (guest, #31386)
For complication/translation/optimization stuff you generally can just
allocate everything you need and throw everything away afterwards, e.g.
at program exit. Usually the optimization complexity explodes, so you
want to limit the size of the problem anyway.
It's like with locking; You don't have to always use the same granularity.
Posted Jun 22, 2010 20:30 UTC (Tue) by nix (subscriber, #2304)
For complication/translation/optimization stuff you generally can just allocate everything you need and throw everything away afterwards, e.g. at program exit.
Compilation always creates many intermediate data structures, many of which are elaborate graphs or have pointers to each other. You really do want to throw them away when you can, and you really do not want to do this by hand.
Posted Jun 22, 2010 22:53 UTC (Tue) by i3839 (guest, #31386)
The thing is, you probably can't throw away much in each intermediate step,
because you do need all that info, but you can easily throw away data
generated by a whole intermediate step and keep only the intermediate
results. So the main way to limit memory usage is by limiting the working
set. GC doesn't help with that.
Anyway, gcc already uses GC it seems, so if it uses too much memory then
adding a GC won't help. (http://gcc.gnu.org/wiki/Memory_management)
That it would use much more without GC seems unlikely.
I guess my argument boils down to: Managing object lifetime is only a
small part of memory management, and GC only helps with that.
Posted Jun 23, 2010 13:10 UTC (Wed) by nix (subscriber, #2304)
In my experience most memory is used when linking, not when compiling
And your assumption about being able to throw away whole intermediate steps at once -- that's how GCC used to work when it kept everything in obstacks. It's often correct, but if it's wrong only in part you have to keep a lot around because you have no way to tell which bits you're using and which you're not.
(actually, to be maximally pedantic it's almost never correct because most passes annotate or shuffle some tree somewhere: if it did nothing all the time it would be a pretty useless pass.)
But it seems you don't get it. I know GCC already has a GC: I'm using it as a counterexample to your claim that GC is unnecessary because you can always simplify your datastructures. You cannot always do that (although you often can and it should be the first thing you try, if practicable), and doing it by hand (i.e. explicitly triggering GC cycles) is often impractical too.
Managing (or rather mismanaging) object lifetime is the part of memory management that is most likely to cause bugs in C programs (less so in C++); GC largely eliminates that as a cause of bugs.
Posted Jun 25, 2010 14:57 UTC (Fri) by i3839 (guest, #31386)
That's why I was talking about compilation stages, not optimisation
stages. I don't think you can free any significant amount of memory
between the latter ones, because you never know what may be needed
in the future. Even if you just replace multiple nodes with fewer ones
you can't save much memory because of fragmentation. At best it's a
factor 2, in which case you have to manage overall memory consumption
by reducing your working set anyway.
I'm also sure that GCC does not use hundreds of megabytes to keep
very complex graphs in memory.
I'd rather have GCC spend its time on using 10 times the memory for
compiling small files fast than waste time on reducing memory usage
by running slower and not really achieving it anyway.
> But it seems you don't get it. I know GCC already has a GC: I'm using it
> as a counterexample to your claim that GC is unnecessary because you can
> always simplify your datastructures. You cannot always do that (although
> you often can and it should be the first thing you try, if practicable),
> and doing it by hand (i.e. explicitly triggering GC cycles) is often
> impractical too.
No, I don't get it. You brought GCC as an example of something that people
complain about that uses too much memory, and of something that uses
complex datastructures that can't be easily handled with manual memory
handling. What I'm trying to say is that GC doesn't solve any real problems,
it's just convenience for the lazy, but when you do use too much memory
(bloat or leak) it won't help you at all. When memory handling is important
then you have to put it into your program design on a higher level. You seem
to assume that GCC would use a lot more memory if it didn't use GC. I think
it's rather that GC doesn't improve the situation.
> Managing (or rather mismanaging) object lifetime is the part of memory
> management that is most likely to cause bugs in C programs (less so in
> C++); GC largely eliminates that as a cause of bugs.
In my experience memory leaks almost never happen and when they do
happen they're easily fixed. At least in C: When it happens with Java it's
a bit harder because then you have to hunt down the dangling references
and you can't use Valgrind. Firefox is a different story, but even a perfect
GC wouldn't have saved it. (But the problem of C++ programs is that C++
adds its own additional complexity without making anything easier, with
as a result more bugs and memory leaks in general.)
Posted Jun 27, 2010 17:05 UTC (Sun) by nix (subscriber, #2304)
That's why I was talking about compilation stages, not optimisation stages.
I'm also sure that GCC does not use hundreds of megabytes to keep very complex graphs in memory.
I'd rather have GCC spend its time on using 10 times the memory for compiling small files fast than waste time on reducing memory usage by running slower and not really achieving it anyway.
You seem to assume that GCC would use a lot more memory if it didn't use GC. I think it's rather that GC doesn't improve the situation.
As for what would happen if it was designed without GC, well, we have actual evidence of that because pre-3.0 it did indeed have no GC, and used manual memory management for everything, with explicit lifetimes. Maintaining the object lifetimes was a nightmare and there were countless bugs raised due to SEGVs caused by something being freed when it shouldn't.
In my experience memory leaks almost never happen and when they do happen they're easily fixed.
GC may be more inefficient than manual memory management (theoretically this is not true, but I've never heard of a situation where that theory is turned to practice) and is surely not right for every situation, but it wipes out all these problems at a stroke.
And to be honest if asked if I'd rather have a program that ran more slowly but used GC or a program that didn't work I'd surely pick the former.
Posted Jul 4, 2010 17:14 UTC (Sun) by i3839 (guest, #31386)
Considering that optimising is a computationally expensive process which
grows much more than O(n) if the working set increases by n, I'm really
surprised that memory usage is a limiting factor at all, and that compile
time hasn't gone through the roof before this became a problem.
What the hell is gcc doing to achieve this? Making hundreds of copies of
all the nodes every time it modifies them? (I know you can trade memory
usage for time with certain algorithms, but is gcc actually doing that?)
In a dark past I compiled the whole system on a slow, crappy computer with
128 MB of RAM (bloody Gentoo, but Archlinux had no i586 binaries), and I'm
pretty sure it was totally limited by CPU speed. Took a whole day, but it
got to the end. What code makes gcc use hundreds of megabytes of memory
in your experience?
But yeah, if you have a huge body of code with crappy internal APIs then
GC is probably a very good idea. Using explicit memory allocation for short
lived objects plainly sucks, GC is better for that. In normal C the thing is
to avoid such objects in the first place. I only use malloc for long-lived
objects and mostly at program start-up or once per major event, like a new
client connecting. If for whatever reason (IMHO mostly because of bad design,
but not always) the code isn't orderly and clear, but messy, then having
no GC would be a bit of a hell, I admit. But you might want to consider
rewriting the whole thing in Python too, because just using a GC won't
help much. And that's my main point you seem to ignore, that if having GC
improves the situation, it generally means you got other, bigger, design
problems. The case where it's really because of "complex data structures"
is a rare case indeed. Mostly it's unnecessarily complex and then it's
plain bad design.
Posted Jul 7, 2010 15:56 UTC (Wed) by nix (subscriber, #2304)
Yes, but compilers don't grab the whole source and optimise everything at
once, at least not generally, though they can do that. So it's fine if they use a bit more memory when compiling one unit, the only moment when the whole program has to be in memory is when linking.
You might find the -fmem-report flag interesting.
(Note that reuse of tree nodes is a classic example of a time where GC is very useful, because all of a sudden that object's lifetime must be extended, and telling the entity that originally allocated it is likely to be unpleasant.)
But yeah, if you have a huge body of code with crappy internal APIs then GC is probably a very good idea. Using explicit memory allocation for short lived objects plainly sucks, GC is better for that.
The case where it's really because of "complex data structures" is a rare case indeed.
Posted Jul 8, 2010 14:36 UTC (Thu) by etienne (subscriber, #25256)
I am not a specialist of GC, but I wonder if the usual malloc/free interface is wrong, and if we should use something like malloc/memlock/free.
Basically, the function which create the data allocates it, all functions using the data memlock() it, and every function calls free() when it finishes using that data. Free() decrements a counter and frees the memory when the counter reaches zero.
Has that already been tried and what would be the problem of it?
Posted Jul 8, 2010 14:52 UTC (Thu) by nye (guest, #51576)
How would this be different from any other form of reference counting, with its attendant problems?
Posted Jul 9, 2010 16:22 UTC (Fri) by nix (subscriber, #2304)
Posted Jul 14, 2010 14:23 UTC (Wed) by i3839 (guest, #31386)
-fmem-report flag is indeed interesting. On my current project, which
is about 4K lines of C code, it reports 16MB allocated with combine +
whole-program, 12MB when using dietlibc, and max 3MB when compiling
files one at a time.
So assuming C++ is ten times worse, and the code ten times bigger, then
you're indeed easily using gigabytes of memory. I guess you don't want
to compile big C++ programs at once though, doing it per file should be
> That's exactly the class of allocations for which obstacks are good and
> GC can often be forgone. When you have long-lived allocations in a
> complex webwork in elaborate interconnected graphs, then is when GC
> becomes essential. And GCC has elaborate interconnected graphs up the
> wazoo. The quality of internal APIs is mostly irrelevant here: it's the
> nature of the data structures that matters.
With crappy APIs/design you can't allocate objects on the stack, but are
forced to allocate them dynamically even if they're short lived.
The problem of elaborate interconnected graphs is that it's hard to end
up with nodes with no references at all, so GC usually won't help much.
And even if it does it probably doesn't reduce peak memory usage. So yes,
in such cases you want something like GC, for your own sanity, but it
won't improve the total memory usage much if you don't limit the graph
size some other way.
Posted Jul 15, 2010 17:06 UTC (Thu) by nix (subscriber, #2304)
-combine -fwhole-program has a similar memory hit, for similar reasons.
C++ is a good bit worse memory-wise than C, mostly because the headers make C headers look like child's toys: they have a lot of definitions in them and a large number of very complex declarations. I've known compiling *single C++ files* to eat a couple of gigabytes before (although, admittedly, those were exceptional, things like large yacc parsers compiled with C++: mysql's SQL parser is a good example).
Posted Sep 6, 2010 20:22 UTC (Mon) by Blaisorblade (guest, #25465)
Well, if no pointer exists to an object, there's no way somebody can use it. It seems you don't know what you're talking about.
> Even if you just replace multiple nodes with fewer ones you can't save much memory because of fragmentation
Fragmentation only exists in languages which don't support GC well (i.e., C). GC have been able to fix that for ages. Whether we have decent heuristics for handling it without using too much CPUs is an open issue, when that's a problem choosing the GC algorithm among the implemented ones is the best solution.
Posted Sep 6, 2010 20:14 UTC (Mon) by Blaisorblade (guest, #25465)
First, refcounting is GC, and lots of C programs use it.
I used to program in C in kernel space (on UserModeLinux, I remember a nix, probably you, on our MLs), now I usually program in functional languages. If you ever tried, you'd realize that there, manual memory management would be much more code than getting the job done.
On the other hand, in C, and even in kernel space, reference counting is frequently used, especially for just _everything_ shared among multiple threads. It might be omitted for stuff which is only manipulated under locks in certain ways, but kernel code craves for using as few locks as possible, so refcounting is required by the coding style for such cases.
And reference counting is... just another way to implement GC. Which is probably fine for those usages - but if you implement GC with it, it's often the worst way (because you really use it differently).
Posted Sep 6, 2010 21:17 UTC (Mon) by nix (subscriber, #2304)
The biggest problem with refcounting GC (ignoring the cycle problem) is that its constant tweaking of refcounts leads to appalling cacheline bouncing with concurrent code. (Slightly different from the nasty cache effects of crude old mark-and-sweep collectors.)
Posted Sep 6, 2010 22:38 UTC (Mon) by Blaisorblade (guest, #25465)
Yep, that's what doesn't happen (not that much) in kernel (with explicit refcounting) and what smarter refcounting GCs try to avoid as much as possible. No refcounting for stack operations (which implies a stop-the-world phase to scan the stack) is an old optimization; the Recycler from IBM does something better, but there must be reasons why nobody actually uses it, and reading the paper with attention gives some suggestions. Most importantly, the most reputable comment I got about it was "it's crap", unfortunately I couldn't ask details.
Yes, I'm the same Nix :)
Nice to see you again - I've noticed your nick a number of times here. And now I speak as a researcher in programming languages and their implementation - what a huge change.
But back to UserModeLinux, I agree that it would definitely make sense to use these tricks, or at least something related (you mentioned that somewhere else). We were actually trying to push forward some code for our needs, i.e. Ingo's remap_file_pages patch, which I resumed. It allowed playing not only with offsets but also with protection bits of individual pages. It seems that here, the guys are doing much more stuff - including, probably, batched VM operations, since they need to use them to set up read barriers, i.e. to trap memory reads for their GC, which was originally developed for a RISC CPU which gives a small HW support for them!
But I would have needed a full-time position to handle that, rather than working on it on my free time, after UML maintenance and support, and UML as a whole was severely lacking manpower. But understanding and fixing some bugs in Ingo Molnar's code (on which he had played just for a small while) was one of my best achievements by that time.
Posted Sep 6, 2010 23:17 UTC (Mon) by nix (subscriber, #2304)
Nice to see you again - I've noticed your nick a number of times here. And now I speak as a researcher in programming languages and their implementation - what a huge change.
Just like to say thanks for the UML work over the years, it kept my firewall happy for many years before that machine croaked its last and I switched to a Soekris... UML seems to have been obsoleted by qemu now, which is in a way a shame: it did a lot of things in interesting and unusual ways, and I'm fairly sure nobody else had ever abused ptrace() quite as badly (given the number of ptrace() bugs UML teased out, I think we can be sure of it).
allocation in C
Posted Jun 17, 2010 16:02 UTC (Thu) by tialaramex (subscriber, #21167)
In some places the reason I don't use a GC is because I'm lazy and there wasn't one sat there ready to go. It is possible that I should learn to use the Boehm GC in these cases and that I should evangelise its use.
But in a lot of places it's because it's because the code's on the critical path. I don't see a GC (and particularly one you admit is "computationally expensive") buying us more than it costs in these codepaths. This is optimised code, where people have chosen the algorithms very carefully and read the latest papers, and run benchmarks to find good parameters and done all that good stuff. And we already have (by the nature of the system) an insatiable appetite for RAM so we certainly don't want to make that any worse.
Perhaps a better goal is to stop using C at all for the "lazy" code. I probably spend as little as 30% of my programming time writing C these days anyway, maybe it should be less than 10%. Java for example seems to work well enough, it's slow and it wastes a lot of RAM, and it's missing a bunch of features I like, but it works well enough. Probably a lot of my "lazy" code could be Java, more than already is.
Posted Jun 20, 2010 20:13 UTC (Sun) by nix (subscriber, #2304)
But yes, avoiding C for the boring stuff is probably best. Embed a better language (or embed the C in a better language) and use that, and get GC for free :)
Posted Jun 21, 2010 0:33 UTC (Mon) by khc (subscriber, #45209)
Posted Jun 21, 2010 20:54 UTC (Mon) by nix (subscriber, #2304)
Posted Jun 17, 2010 17:47 UTC (Thu) by jordanb (guest, #45668)
With a sales pitch like that, who wouldn't use it? :P
Anyway, I think RAII is a better approach to memory management than GC for high-performance applications. It's deterministic and has very little run-time cost. The only disadvantage is that it doesn't exist in C, but it's supported by all modern languages except the ones that have gone the builtin-GC route.
RAII, but no GC?
Posted Jun 17, 2010 19:05 UTC (Thu) by sbishop (guest, #33061)
I'm also curious to know where RAII is used outside of C++.
Posted Jun 17, 2010 19:29 UTC (Thu) by alextingle (guest, #20593)
Posted Jun 17, 2010 21:18 UTC (Thu) by tack (subscriber, #12542)
Posted Jun 19, 2010 13:39 UTC (Sat) by cortana (subscriber, #24596)
Posted Jun 20, 2010 20:18 UTC (Sun) by nix (subscriber, #2304)
Posted Jun 19, 2010 5:34 UTC (Sat) by jonabbey (subscriber, #2736)
Modern versions of the JDK really do have truly exceptional GC performance. Java 7 will feature a compacting, predictable delay time GC system that can take excellent advantage of many-core systems to provide great throughput.
With GC systems, there's a trade-off between speed of collection, CPU efficiency, and overall memory usage. Java allows these parameters to be tuned for a specific use case, but it is true that the default configuration tends to emphasize CPU efficiency over overall memory usage.
Posted Jun 17, 2010 14:37 UTC (Thu) by NAR (subscriber, #1313)
They might be used in a telephony exchange system near you? I'm not that familiar with the Erlang GC, but I haven't met the "everything stops for 15 seconds" problem yet. Of course, it's still possible to leak memory (i.e. breaking the code that removes the elements from a list, so it just grows and grows), and sometimes an explicit garbage collection frees up a surprising amount of memory, but it doesn't happen that often.
Posted Jun 19, 2010 5:36 UTC (Sat) by jonabbey (subscriber, #2736)
Real-time GC does not work, period.
Posted Jun 17, 2010 10:57 UTC (Thu) by khim (subscriber, #9252)
Problems with real-time GC is not the GC itself. Rather it's general problem: you can't manage memory efficiently unless the program does it. If you program keeps links to gigabyte of long-dead objects and then suddenly drop links to all these objects to create another gygabyte of objects you'll see UI freezes no matter what you do. And if your program is accurate enough and careful enough with allocation of objects then it's usually easy to add manual allocation or trivial refcounting GC.
Good memory management is hard, good memory management with GC is harder. Not only you must think about references to objects - you must keep the capabilities of GC in mind too!
GC is great for batch-processing, but real-time? It does not work in practice beyong toy samples.
Posted Jun 17, 2010 15:49 UTC (Thu) by marcH (subscriber, #57642)
> And if your program is accurate enough and careful enough with allocation of objects then it's usually easy to add manual allocation or trivial refcounting GC.
If it is trivial then why do I have to do it myself?
See also "GC allows for entirely different algorithms" somewhere here:
Posted Jun 17, 2010 23:28 UTC (Thu) by ncm (subscriber, #165)
Posted Jun 18, 2010 1:07 UTC (Fri) by pflugstad (subscriber, #224)
GC is great for batch-processing, but real-time? It does not work in practice beyond toy samples.
Hmmm... Been there, done that: embedded system running RTOS and RT Java, driving an avionics Mil-Std-1553 bus at 50Hz with <30ms latency under heavy load for many hours. Very non-trivial (several hundred thousand LOC).
Yes we had to pay attention to memory allocations in the time critical code paths, but that was actually pretty easy. And yes, we used the real-time GC that the RT Java gave us to essentially run in the background.
And we had to tune some other parts of the code where some programmers had made some dreadful data structure choices (which worked find under low utilization, but didn't scale). But mostly things worked and we didn't really pay much attention to memory allocations across the majority of the application. No stop-the-world pauses, no burps in latency (which there were before we tuned).
The key with the whole thing was to have enough memory available so the GC could come along behind and clean up without having to stop the rest of the system from running. Similar tests on a much more memory constrained device did not scale as well - but not: it is just a matter of scaling.
I'd done similar things on Windows with C/C++ and all the explicit memory management, so I'm very aware of the trade-offs. I'll take the GC thank-you-very-much, and RT GC does work just fine, IMO.
Posted Sep 6, 2010 20:00 UTC (Mon) by Blaisorblade (guest, #25465)
Posted Oct 10, 2010 1:18 UTC (Sun) by Blaisorblade (guest, #25465)
Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds