Posted Jul 7, 2010 15:56 UTC (Wed) by nix
In reply to: Pauses
Parent article: The Managed Runtime Initiative
Yes, but compilers don't grab the whole source and optimise everything at
once, at least not generally, though they can do that. So it's fine if they use a bit more memory when compiling one unit, the only moment when the whole program has to be in memory is when linking.
Well, -flto does exactly that. In any case, grabbing one translation unit
and optimizing it can easily use hundreds of megabytes of memory, and without doing garbage collections (i.e. never freeing storage) would easily use many gigabytes.
Considering that optimising is a computationally expensive process which
grows much more than O(n) if the working set increases by n, I'm really
surprised that memory usage is a limiting factor at all, and that compile
time hasn't gone through the roof before this became a problem.
Well, compile time having gone through the roof is
a perennial complaint about GCC, and its L2 cache utilization is pretty awful.
What the hell is gcc doing to achieve this? Making hundreds of copies of
all the nodes every time it modifies them? (I know you can trade memory
usage for time with certain algorithms, but is gcc actually doing that?)
No, but it has a lot
of optimization passes, several IRs, and it does a lot of work. Also, headers are large these days: decl nodes alone can use hundreds of megabytes, and recent work to reduce the size of decl nodes has reduced the memory consumption of the compiler considerably.
You might find the -fmem-report flag interesting.
(Note that reuse of tree nodes is a classic example of a time where GC is very useful, because all of a sudden that object's lifetime must be extended, and telling the entity that originally allocated it is likely to be unpleasant.)
In a dark past I compiled the whole system on a slow, crappy computer with
128 MB of RAM (bloody Gentoo, but Archlinux had no i586 binaries), and I'm
pretty sure it was totally limited by CPU speed. Took a whole day, but it
got to the end. What code makes gcc use hundreds of megabytes of memory
in your experience?
Virtually anything in C++. Anything more than a few thousand lines long in C. It's downright commonplace
. It's pretty common to see GCC using >500Mb, and I've seen it using gigabytes before now. If you run several compilations in parallel, you'll need even more...
But yeah, if you have a huge body of code with crappy internal APIs then GC is probably a very good idea. Using explicit memory allocation for short lived objects plainly sucks, GC is better for that.
That's exactly the class of allocations for which obstacks are good and GC can often be forgone. When you have long-lived allocations in a complex webwork in elaborate interconnected graphs, then
is when GC becomes essential. And GCC has elaborate interconnected graphs up the wazoo. The quality of internal APIs is mostly irrelevant here: it's the nature of the data structures that matters.
The case where it's really because of "complex data structures" is a rare case indeed.
I find that about 5% of the programs I write start to get complex enough data structures that memory management becomes a serious burden. I think it depends on the class of program, really. If you mostly write little scripting things or financial add-up-these-numbers programs, then, no, complex data structures will be uncommon. If you're hacking on the inside of a database server or a compiler, then they'll be common.
to post comments)