LWN.net Logo

LCA: Lessons from 30 years of Sendmail

LCA: Lessons from 30 years of Sendmail

Posted Feb 5, 2011 2:16 UTC (Sat) by cmccabe (guest, #60281)
In reply to: LCA: Lessons from 30 years of Sendmail by daglwn
Parent article: LCA: Lessons from 30 years of Sendmail

> I answered the question you asked which, frankly, is a poor question.
> "Can" is easy to answer and "will" is a ridiculous question.

Every large C++ project I've ever worked on has had long compilation times. It's a consequence of the design of the language. Every file in a C++ project must do the same work over and over for #include directives. A single #define could change the meaning of everything. This means that compilation times for C++ projects tend to be O(n^n), where n = number of files.

It's not a ridiculous question to ask "will my compile times be longer if I port project X to C++." The answer is almost certainly yes. Does it matter? Depends on what project X is.

C's limitations are its virtue. It enforces a consistent low-level, performance-sensitive style. Above all, C tends to encourage simplicity, the programmer's best friend.

My biggest problem with C++ is a philosophical one. It blurs the lines between high-level components and low-level ones. You often see this confusion in the minds of C++ advocates. They ask why C isn't good for implementing user interfaces, or parsing text files. The answer is that it isn't supposed to be good for those things. Those are the things that you write another component for-- a cleanly separated component that uses a nice API to talk to the rest of the system.

This is the future. You can see it in all the newest software-- the web sites that have a Javascript front end, talking to a Java or Ruby backend, talking to a webserver and kernel that are written in C. Ask anyone writing an Android app or a website backend what a vtable or a buffer overflow is. Blank stare. It's like asking them what a PNP transistor is.

There is no room for C++ in this world because C++ itself is a layering violation, a hack. When performance was super-duper critical it was ok to have this layering violation. But in the coming years, performance is going to be less and less of an issue and people will move to more cleanly structured systems.


(Log in to post comments)

LCA: Lessons from 30 years of Sendmail

Posted Feb 5, 2011 5:43 UTC (Sat) by daglwn (subscriber, #65432) [Link]

> It's not a ridiculous question to ask "will my compile times be longer if
> I port project X to C++." The answer is almost certainly yes.

That's simply not credible. If it was a port from C, there is nothing a C++ compiler would do differently than a C compiler that would greatly increase compile time.

> Every large C++ project I've ever worked on has had long compilation
> times.

That's a valid observation but it doesn't indicate any general statements can be made.

> It's a consequence of the design of the language.

That does not follow.

> Every file in a C++ project must do the same work over and over for
> #include directives. A single #define could change the meaning of
> everything.

No, that would be a violation of the ODR. Structures and definitions cannot change after they've been used.

> There is no room for C++ in this world because C++ itself is a layering
> violation.

Your entire argument makes no sense. There is nothing in C or C++ that prevents or encourages any particular design. One can write well defined modules and interfaces in both languages. One can write poorly structured code in both languages.

But C++ provides safety mechanisms that are simply not available in C. RAII is an example.

LCA: Lessons from 30 years of Sendmail

Posted Feb 5, 2011 21:27 UTC (Sat) by cmccabe (guest, #60281) [Link]

> No, that would be a violation of the ODR. Structures and definitions
> cannot change after they've been used.

It's only a violation of the ODR if the objects that are defined differently have global linkage and all of them are not weak symbols.

There are actually a lot of macros that change the behavior of standard headers. _GNU_SOURCE, _BSD_SOURCE, and _SVID_SOURCE are three popular ones.

You seem to be confused about how include files work in C and C++. The way they work is that each translation unit (that's .cpp file to you) has to scan through all the files included by that unit, recursively. There are no shortcuts and the compiler cannot cache this work.

The reason why I said it was O(n^n) is because n^n is the upper bound on the time complexity. Remember that you can include .c or .cpp files. In reality, most projects compilation times will grow slower than this. However, it's still exponential in the number of files and the compile times seen by real-world projects like WebKit reflect this.

> One can write well defined
> modules and interfaces in both languages. One can write poorly structured
> code in both languages.

I agree. A good programmer can write good code in any language. A bad one can write Vogon poetry in any language.

There's a lot of projects I like and respect that use C++. LLVM, OpenCV, Ceph, WebKit, and a lot of others. C++ will be around for a long time. For new projects, however, I would encourage people to look at newer languages like Google Go. Progress hasn't stood still and we have learned some things since the early nineties. I swear!

LCA: Lessons from 30 years of Sendmail

Posted Feb 7, 2011 19:13 UTC (Mon) by nix (subscriber, #2304) [Link]

You seem to be confused about how include files work in C and C++. The way they work is that each translation unit (that's .cpp file to you) has to scan through all the files included by that unit, recursively. There are no shortcuts and the compiler cannot cache this work.
Except that there are shortcuts and GCC does cache this work, and has for more than fifteen years. (e.g. you can skip even opening files more than once if they are entirely contained in include guards and the guards are not #undefed.)

LCA: Lessons from 30 years of Sendmail

Posted Feb 8, 2011 0:26 UTC (Tue) by cmccabe (guest, #60281) [Link]

Sigh. I knew I was going to get some grief when I said "there are no shortcuts." :)

It depends on what you call a shortcut I guess. The header guard optimization is good, but the process as a whole is still O(n^n). Doing slightly more efficient things with file descriptors can't change that.

LCA: Lessons from 30 years of Sendmail

Posted Feb 8, 2011 18:27 UTC (Tue) by nix (subscriber, #2304) [Link]

Um, for 'slightly more efficient things with file descriptors' substitute 'almost always avoid parsing the vast majority of headers more than once'.

The exponential explosion you refer to simply does not happen with real code. And if header parsing is slow, GCC supports precompiled headers on common platforms to speed things up. (Yes, you may have to restructure your headers a bit to use them, but if you're compiling slow enough that you need this feature, that's a small cost.)

LCA: Lessons from 30 years of Sendmail

Posted Feb 11, 2011 9:33 UTC (Fri) by cmccabe (guest, #60281) [Link]

See my comment below. Basically, private data members of a class also need to be #included in that class' header file. So you *cannot* "avoid parsing the vast majority of headers more than once."

Under ideal conditions, C++ compilation is slow. If you add even a few non-ideal conditions, like programmers who love to define functions in header files "for performance", extensive use of templates, auto-generated anything, or unecessary cross-module dependencies, it becomes positively glacial.

Unfortunately real-world projects tend to have some or all of these conditions. I'm too lazy to find the reference now, but Google's C++ compile times are said to be measured in hours. And those guys read Effective C++ and know their stuff.

Precompiled headers sound helpful, but only for headers you are including from external libraries. Maybe they would be useful for something like QT? I haven't used precompiled headers.

LCA: Lessons from 30 years of Sendmail

Posted Feb 19, 2011 0:24 UTC (Sat) by nix (subscriber, #2304) [Link]

Er, when I said 'more than once' I said 'more than once per translation unit', and this is almost universal. This reduces your claimed O(n^2) to, uh, O(nm) where n is the number of translation units and m is the number of headers.

Precompiled headers are useful in any project where you have one great big header that #includes a lot of stuff. This is extremely common.

LCA: Lessons from 30 years of Sendmail

Posted Feb 5, 2011 13:55 UTC (Sat) by mpr22 (subscriber, #60784) [Link]

This means that compilation times for C++ projects tend to be O(n^n), where n = number of files.

Nice piece of hilariously ridiculous hyperbole there. A 1000-file C++ project - even one composed by the Stupidest Imaginable Programmer - does not take 10001000 times as long to compile as a one-file project.

O(m * n), where m = number of source files and n = number of ubiquitously included header files, is rather closer to the mark.

LCA: Lessons from 30 years of Sendmail

Posted Feb 8, 2011 0:37 UTC (Tue) by cmccabe (guest, #60281) [Link]

First of all, big-O is the worst-case upper bound, not the average case.

> O(m * n), where m = number of source files and n = number of ubiquitously
> included header files, is rather closer to the mark.

Header files tend to include other header files. This is a consequence of one of the other design features of C++, the fact that the definition of a class cannot be spread across multiple files.

So you often see stuff like this:
> #include "private_helper.h"
>
> class MyClass {
> ...
> private:
> PrivateHelper myPrivateHelper;
> };

PrivateHelper is not be part of the public API of the class (hopefully), but even so, it will require you to include private_helper.h. That class itself might have its own private helpers... and so it goes, on and on.

What's that, you say? I can use the pImpl idiom? Sure, if I can live with reduced performance and more boilerplate code to initialize and destroy the pImpl. Sigh.

It would be interesting to graph, say, WebKit compilation times as a function of the number of files. I really doubt it's anywhere close to linear.

LCA: Lessons from 30 years of Sendmail

Posted Feb 8, 2011 9:13 UTC (Tue) by mpr22 (subscriber, #60784) [Link]

A worst-case upper bound that will never be hit even by insane usage is of purely academic interest - particularly in this discussion, since the feature that permits the O(nn) worst case in C++ is also present in C.

Now, I'll happily admit that real-world incremental recompilation times are worse for C++ than C - but the worst-case upper bound is a red herring there.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds