LWN.net Logo

Sometimes C++ is faster

Sometimes C++ is faster

Posted Jun 18, 2008 15:31 UTC (Wed) by johnkarp (guest, #39285)
In reply to: Sometimes C++ is faster by seanyoung
Parent article: Converting GCC to C++

IIRC, if you need inheritance and virtual functions, its much faster to 
use C++'s builtins rather than to kludge your own in C. Also, the C 
standard library string functions can be pretty inefficient.


(Log in to post comments)

Virtual functions have the same speed

Posted Jun 18, 2008 20:54 UTC (Wed) by khim (subscriber, #9252) [Link]

Remember - for a long, long time C++ compiler was C++-to-C compiler. Only one thing can not be efficiently implemented in C++-to-C compiler: exceptions. Everything else is just as efficient in C and in C++... except templates: where C++ compiler can generate series of functions automatically in C you either need horrible preprocessor cludges and/or code duplication.

Of course the same capability can be easily abused and lead to slower code!

Virtual functions have the same speed

Posted Jun 19, 2008 9:48 UTC (Thu) by nix (subscriber, #2304) [Link]

For a long, long time? I think the gap was something like three years between Cfront and G++.
It might have felt like a long time then, but it really wasn't. :)

10 years is long time in computing

Posted Jun 19, 2008 14:04 UTC (Thu) by khim (subscriber, #9252) [Link]

Cfront defined C++ for almost 10 years. Between 1983 and 1993 Cfront was the C++ compiler - and a lot of limitations of C++ back then were justified by the need to compile to C. It was reference implementation, it was used to define "standard C++" (people even used versions of Cfront to explain what type of language their compiler supported), etc. Eventually Cfront 4.0 was abandoned and after long period of instability (~5years) we've got consolidated standard and language was set in stone. But yes, for a long time needs of C++-to-C compiler defined the language.

Sometimes C++ is faster

Posted Jun 19, 2008 9:34 UTC (Thu) by cate (subscriber, #1359) [Link]

virtual function are a lot slower than the direct C equivalent. Because of class expandability
and multiple heritage, the virtual table is not put together data, but accessed with an
additional pointer. So to call a virtual function CPU will use two pointers, which is cache
and pre-fetch inefficient.

Ideally a C++ optimized (program level, not unit level) should know if a class is expanded and
it should allocates some part of virtual table together to the data. (maybe there is also a
pragma directive). So in this trade-off, c++ chooses the slower but more expandable method.

Anyway I think he said faster, because the STL are optimized (probably ugly code, but well
tested and anyway not mixed with program), which is not the case of normal c++ programming.
But we speak about compiler developers, which should know very well C++ and ugly optimization,
so for GCC I don't think STL will give significant improvements.

C++ has also linker problem: dynamic linking is a lot slower (number of function, but also
because long names, which differentiate only at the end)

IMHO the readability, extendibility of code is better that fast code (but for gentoo users
;-)).

Sometimes C++ is faster

Posted Jun 19, 2008 10:01 UTC (Thu) by nix (subscriber, #2304) [Link]

My understanding is that what you say is true in modern compilers only for virtual functions
defined in virtual base classes. Non-virtual-base class virtual function lookup requires only
a single pointer dereference, from a fixed offset in the VMT, as does lookup of a virtual
function defined in a class that has virtual bases, but not itself defined in that virtual
base class or one of its ancestors.

Virtual base classes are rare. This is one reason why. (Another reason is that, well, the
designs that require virtual bases are generally either involuted or icky.)

Sometimes C++ is faster

Posted Jun 19, 2008 11:53 UTC (Thu) by cate (subscriber, #1359) [Link]

I'm not so sure that I understand your comment. My concern is about where compilers put the
VMT table. If it is split from the data (as my (old) understanding and as I see in wikipedia),
every class requires a pointer to the VMT, so there is two dereferences.

The solution was to put some part (i.e. the first 10 virtual function pointer) into the class
data, but because it cause more memory demand (i.e. event driven programming), I don't think
it is default.

Thus I like the C++ idea, if it bring to more development, especially on the optimization at
program level (and not only at unit level).

Sometimes C++ is faster

Posted Jun 19, 2008 18:06 UTC (Thu) by nix (subscriber, #2304) [Link]

Oh, yeah, you have to jump to the VMT first, so two dereferences it is. 
Sometimes there are *more* than two, though (and sometimes there are none, 
as when the compiler can determine the static type of the instance: this 
happens more often than you might expect).

Usually C++ is faster

Posted Jun 19, 2008 18:39 UTC (Thu) by ncm (subscriber, #165) [Link]

Lots of falsehoods, misconceptions, and old myths to fix here.

First, the STL containers and algorithms don't use virtual functions at all.  Virtual
functions aren't used much in modern C++ code, particularly in places where it matters how
fast they are or aren't.  This isn't so much because people worry much about how fast they
are, but just because they are not often especially useful in low-level code where speed
matters.

Second, whatever overhead is associated with a virtual function call has very little to do
with how many memory accesses occur, and everything to do with pipeline stalls.  If the speed
of a virtual function call matters, it's in a loop, and all the relevant memory is in cache.
You might lose a cycle or two loading a memory word, then, but you lose a dozen or two
branching through a pointer because the execution pipeline can't look ahead past that branch.
If you used a function pointer in C, you would suffer precisely the same stall.   (There are
further complications I won't get into here.)

So, as rules of thumb: (1) if it matters how fast a virtual function call is, compared to a
regular function call, you're probably doing it wrong; and (2) a virtual function call is
architecturally equivalent to calling through a C function pointer.  How much do you use
function pointers in C?  That's about how much you should use virtual functions in C++.  (If
you're more used to Java... may heaven grant mercy on your soul.)

In practice, the time spent doing virtual function calls has no measurable effect on the speed
of a well-conceived C++ program.

Sometimes C++ is faster

Posted Jun 19, 2008 15:20 UTC (Thu) by endecotp (guest, #36428) [Link]

What you say is true, but it's mitigated in a few ways:

- None of this matters if the actual class of the object is known.  It only matters if you're
accessing a derived class via a base class pointer.

- If you're accessing several virtual members, the dereference to get the virtual function
table only needs to be done once.

- Virtual methods are rare in C++ (compared to e.g. Java).  I don't believe there are any at
all in the standard library, for example.

- Using less memory per object will improve your cache hit rate when you have more than a few
objects.


Here's a C++ benchmark.  "----" indicates a new file, to prevent too much being optimised
away:

struct base {
  virtual void foo() = 0;
};

struct derived: public base {
  void foo() {
    f1();
  }
};

----

int main()
{
  for (int i=0; i<100000000; ++i) {
    base* p = f2();
    p->foo();
  }
}

----

void f1() {
}

base* f2() {
  static derived d;
  return &d;
}


And here's a C version (well, actually it's still C++ but it's using a function pointer in the
struct rather than a virtual function):

struct base {
  typedef void(*foo_t)();
  const foo_t foo;

  base(foo_t f_): foo(f_) {}
};

----

int main()  // same as C++ version
{
  for (int i=0; i<100000000; ++i) {
    base* p = f2();
    p->foo();
  }
}

----

void f1() {
}

struct derived: public base {
  derived(): base(&f1) {}
};

base* f2() {
  static derived d;
  return &d;
}


Comparing these programs (x86, gcc 4.3.1, -O3) I find that the "C" version is about 20%
faster, or around 6 nanoseconds per call, on this 1 GHz VIA C3 machine.  That fraction would
clearly drop if you actually did something inside the virtual function.

Sometimes C++ is faster

Posted Jun 26, 2008 12:45 UTC (Thu) by tbrownaw (guest, #45457) [Link]

- Virtual methods are rare in C++ (compared to e.g. Java). I don't believe there are any at
all in the standard library, for example.
std::streambuf::xs{get,put}n and several other protected streambuf functions. Not something you'll usually work with directly, but used by all the i/o streams.

Sometimes C++ is faster

Posted Jun 19, 2008 21:03 UTC (Thu) by pynm0001 (guest, #18379) [Link]

> virtual function are a lot slower than the direct C equivalent.

How would you do in C then?  Because however it is, it can be done that way in C++.

More seriously though, you are saying that C++ does something like:

this->v_btl->virtual_call(params); right?

I would agree, but how would you do this any faster in C?  Recognize that you may not cast the
object pointer to a derived class, the compiler can't (necessarily) know that the conversion
is valid, and you could cast it in C++ if the programmer happens to know more than the
compiler.

i.e. if you were to do:

derived_virtual_impl(klass, params) in C, you could do:

obj->Derived::impl(params) in C++ and call the appropriate implementation directly.

Is there some neat-o technique I'm missing?

Sometimes C++ is faster

Posted Jun 19, 2008 21:10 UTC (Thu) by pynm0001 (guest, #18379) [Link]

Sorry about replying twice but I missed this one too:

> C++ has also linker problem: dynamic linking is a lot slower (number of 
> function, but also because long names, which differentiate only at the
> end)

This is annoyingly true.  However, this has pretty much been rectified.

With recent binutils (I think 2.17.50 or later?) a new hash table format is used for ELF
dynamic binaries which significantly reduces the amount of string comparisons that must be
done in order to properly load a symbol.  Of course long symbol names still take a while to
load so it is important to reduce the amount of symbols used to the minimum necessary.  If you
have recent binutils you may already be able to see the fruits of this, you can use readelf -d
to see if you have a GNU_HASH section of a binary or a .so.

g++ since 4.0 has supported visibility for symbols, including symbols defined as part of C++
class and template generation, which takes care of that problem.  The problem is that C++
libraries must specifically support it, but support for that is growing.

prelinking provides benefits on top of that.  There are patches from Michael Meeks floating
around to allow a flag called -Bsymbolic which apparently also helps, but I don't really know
what it does beyond that, and it does not look like it will make it into binutils anyways.

But there's lots of work being done to rectify these problems, so it lookslike we're *almost*
getting to the point to where we can have our cake and eat it too. :)

Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds