Yes, but compilers don't grab the whole source and optimise everything at
once, at least not generally, though they can do that. So it's fine if they
use a bit more memory when compiling one unit, the only moment when the
whole program has to be in memory is when linking.
Considering that optimising is a computationally expensive process which
grows much more than O(n) if the working set increases by n, I'm really
surprised that memory usage is a limiting factor at all, and that compile
time hasn't gone through the roof before this became a problem.
What the hell is gcc doing to achieve this? Making hundreds of copies of
all the nodes every time it modifies them? (I know you can trade memory
usage for time with certain algorithms, but is gcc actually doing that?)
In a dark past I compiled the whole system on a slow, crappy computer with
128 MB of RAM (bloody Gentoo, but Archlinux had no i586 binaries), and I'm
pretty sure it was totally limited by CPU speed. Took a whole day, but it
got to the end. What code makes gcc use hundreds of megabytes of memory
in your experience?
But yeah, if you have a huge body of code with crappy internal APIs then
GC is probably a very good idea. Using explicit memory allocation for short
lived objects plainly sucks, GC is better for that. In normal C the thing is
to avoid such objects in the first place. I only use malloc for long-lived
objects and mostly at program start-up or once per major event, like a new
client connecting. If for whatever reason (IMHO mostly because of bad design,
but not always) the code isn't orderly and clear, but messy, then having
no GC would be a bit of a hell, I admit. But you might want to consider
rewriting the whole thing in Python too, because just using a GC won't
help much. And that's my main point you seem to ignore, that if having GC
improves the situation, it generally means you got other, bigger, design
problems. The case where it's really because of "complex data structures"
is a rare case indeed. Mostly it's unnecessarily complex and then it's
plain bad design.
Posted Jul 7, 2010 15:56 UTC (Wed) by nix (subscriber, #2304)
[Link]
Yes, but compilers don't grab the whole source and optimise everything at
once, at least not generally, though they can do that. So it's fine if they use a bit more memory when compiling one unit, the only moment when the whole program has to be in memory is when linking.
Well, -flto does exactly that. In any case, grabbing one translation unit and optimizing it can easily use hundreds of megabytes of memory, and without doing garbage collections (i.e. never freeing storage) would easily use many gigabytes.
Considering that optimising is a computationally expensive process which
grows much more than O(n) if the working set increases by n, I'm really
surprised that memory usage is a limiting factor at all, and that compile
time hasn't gone through the roof before this became a problem.
Well, compile time having gone through the roof is a perennial complaint about GCC, and its L2 cache utilization is pretty awful.
What the hell is gcc doing to achieve this? Making hundreds of copies of
all the nodes every time it modifies them? (I know you can trade memory
usage for time with certain algorithms, but is gcc actually doing that?)
No, but it has a lot of optimization passes, several IRs, and it does a lot of work. Also, headers are large these days: decl nodes alone can use hundreds of megabytes, and recent work to reduce the size of decl nodes has reduced the memory consumption of the compiler considerably.
You might find the -fmem-report flag interesting.
(Note that reuse of tree nodes is a classic example of a time where GC is very useful, because all of a sudden that object's lifetime must be extended, and telling the entity that originally allocated it is likely to be unpleasant.)
In a dark past I compiled the whole system on a slow, crappy computer with
128 MB of RAM (bloody Gentoo, but Archlinux had no i586 binaries), and I'm
pretty sure it was totally limited by CPU speed. Took a whole day, but it
got to the end. What code makes gcc use hundreds of megabytes of memory
in your experience?
Virtually anything in C++. Anything more than a few thousand lines long in C. It's downright commonplace. It's pretty common to see GCC using >500Mb, and I've seen it using gigabytes before now. If you run several compilations in parallel, you'll need even more...
But yeah, if you have a huge body of code with crappy internal APIs then GC is probably a very good idea. Using explicit memory allocation for short lived objects plainly sucks, GC is better for that.
That's exactly the class of allocations for which obstacks are good and GC can often be forgone. When you have long-lived allocations in a complex webwork in elaborate interconnected graphs, then is when GC becomes essential. And GCC has elaborate interconnected graphs up the wazoo. The quality of internal APIs is mostly irrelevant here: it's the nature of the data structures that matters.
The case where it's really because of "complex data structures" is a rare case indeed.
I find that about 5% of the programs I write start to get complex enough data structures that memory management becomes a serious burden. I think it depends on the class of program, really. If you mostly write little scripting things or financial add-up-these-numbers programs, then, no, complex data structures will be uncommon. If you're hacking on the inside of a database server or a compiler, then they'll be common.
Pauses
Posted Jul 8, 2010 14:36 UTC (Thu) by etienne (subscriber, #25256)
[Link]
> Note that reuse of tree nodes is a classic example of a time where GC is
> very useful, because all of a sudden that object's lifetime must be
> extended, and telling the entity that originally allocated it is likely
> to be unpleasant.
I am not a specialist of GC, but I wonder if the usual malloc/free interface is wrong, and if we should use something like malloc/memlock/free.
Basically, the function which create the data allocates it, all functions using the data memlock() it, and every function calls free() when it finishes using that data. Free() decrements a counter and frees the memory when the counter reaches zero.
Has that already been tried and what would be the problem of it?
Pauses
Posted Jul 8, 2010 14:52 UTC (Thu) by nye (guest, #51576)
[Link]
>Basically, the function which create the data allocates it, all functions using the data memlock() it, and every function calls free() when it finishes using that data. Free() decrements a counter and frees the memory when the counter reaches zero.
How would this be different from any other form of reference counting, with its attendant problems?
Pauses
Posted Jul 9, 2010 16:22 UTC (Fri) by nix (subscriber, #2304)
[Link]
Straight reference counting works if you can tell who the 'users' are, and if the data structures are fairly simple. But we can't really ask everyone touching a graph to walk the entire graph and all its data structures and increment, then decrement, all its reference counts (that would be much more cache-unfriendly and unpleasant to maintain than what is done now); and because nodes are often shared between graphs we can't have one giant reference count across the whole graph... and if the reference-count-freer walks the graph and only frees if all connected entities are at refcount zero, you might as well drop the reference count, because what you've got there is a garbage collector. :)
Pauses
Posted Jul 14, 2010 14:23 UTC (Wed) by i3839 (guest, #31386)
[Link]
-flto enables link time optimisation, which is IMHO something slightly
different, and earlier (current?) implementation has a reputation of being
slow. I was thinking more about -combine -fwhole-program. As far as I know
link time optimisation is about doing further optimisations before the real
linking.
-fmem-report flag is indeed interesting. On my current project, which
is about 4K lines of C code, it reports 16MB allocated with combine +
whole-program, 12MB when using dietlibc, and max 3MB when compiling
files one at a time.
So assuming C++ is ten times worse, and the code ten times bigger, then
you're indeed easily using gigabytes of memory. I guess you don't want
to compile big C++ programs at once though, doing it per file should be
fine.
> That's exactly the class of allocations for which obstacks are good and
> GC can often be forgone. When you have long-lived allocations in a
> complex webwork in elaborate interconnected graphs, then is when GC
> becomes essential. And GCC has elaborate interconnected graphs up the
> wazoo. The quality of internal APIs is mostly irrelevant here: it's the
> nature of the data structures that matters.
With crappy APIs/design you can't allocate objects on the stack, but are
forced to allocate them dynamically even if they're short lived.
The problem of elaborate interconnected graphs is that it's hard to end
up with nodes with no references at all, so GC usually won't help much.
And even if it does it probably doesn't reduce peak memory usage. So yes,
in such cases you want something like GC, for your own sanity, but it
won't improve the total memory usage much if you don't limit the graph
size some other way.
Pauses
Posted Jul 15, 2010 17:06 UTC (Thu) by nix (subscriber, #2304)
[Link]
Link-time optimization works by having the compilation phases drive the compilation to the GIMPLE tree stage, then writing it out and having the link stage invoke the compiler again (via collect2, or via a linker plugin: the latter is preferable because then you can run it over the contents of .a archives as well) to read in all the GIMPLE trees and optimize them. At that stage you've got all the trees written out by all the compilation phases in memory at once. And a good few optimizations will, unless bounded explicitly, start using up crazy amounts of memory when hit with large programs all at once. (Of course they *are* bounded explicitly. At least all of them that anyone noticed are.)
-combine -fwhole-program has a similar memory hit, for similar reasons.
C++ is a good bit worse memory-wise than C, mostly because the headers make C headers look like child's toys: they have a lot of definitions in them and a large number of very complex declarations. I've known compiling *single C++ files* to eat a couple of gigabytes before (although, admittedly, those were exceptional, things like large yacc parsers compiled with C++: mysql's SQL parser is a good example).