The coding standards are the least of your worries!

Posted Jun 1, 2010 11:50 UTC (Tue) by rev (guest, #15082)
Parent article: GCC begins move to C++

For programmers going from C to C++ the most important issue is dealing with OO.

It is very easy to abuse C++'s OO features. The most common mistake is using inheritance where it not appropriate. Particularly when the relationship between two concepts is 1:1, programmers coming from C tend to use inheritance where composition is in order.

For instance, a HTTP connection has a TCP connection. For C programmers it is tempting to make a class representing a HTTP connection inherit from a class representing a TCP connection.

How can you see that this is wrong?

Well, suppose one day you need a HTTP connection over some other transport mechanism. Unfortunately your HTTP connection is intimately tied to a TCP connection (being a subclass it is a special kind of TCP connection after all). You cannot have a HTTP connection over some other transport mechanism without rewriting your HTTP connection class. Suppose you want to make your classes suitable for IPv6. You not only need to modify your TCP connection class but also your HTTP connection class.

Rather, you should supply the HTTP connection class (through the ctor) with an object representing a transport layer connection. (I.e. use composition). It has methods to read and write data, but whether it is a connection over TCP over IPv4, IPv6, or over some other transport layer, you don't care about. (Well, you also are likely to need to find a common interface for setting connection parameters dynamically, BTW. Finding commonalities amongst simmilar concepts is a common meme in OO.)

I did not make the above example up. It comes actually from a C++ library from the GNU project. The name escapes me. Sorry to say it was terribly designed. To get at the headers of a HTTP request, one had to subclass a HTTP connection class. There was no class representing a HTTP request to which one could simply say `getHeaders()'. It had knowledge about IPv6 in several layers of the class hierarchy.

Another common mistake is inherit to data rather than behavior. Having a class 'ellipse' inherit from a class 'circle' is a classic example. Rather `circle' and `ellipse' should both inherit from a class `shape' having a pure virtual method `draw()' that is implemented differently in its subclasses.

Needless to say the above abuse gets worse with every layer of subclassing one uses.

Some guiding principles:

Use inheritance only when you are dealing with similar but different behaviors. I.e. only when one inherits to implement pure virtual functions.
Related: favor composition over inheritance.
Don't use setters, unless one deals with a naturally dynamically alterable property of an object. The hostname of a TCP connection should not be set using a setter (once connected, it is immutable). It should be ctor supplied.
And related, all properties that are required to bring an object into a meaningful state should be supplied through the ctor.
Isolate responsibilities into a class.
Write down every piece of behavior only once.
Variables are private and never public. They may be protected when subclasses need access to them.
Avoid using `friend'. Don't use private and protected inheritance, these are meaningless.
Don't use multiple inheritance. It is a can of worms. When you think you need it, you likely have violated the first two principles.

Sorry for the sermon. It's just that it gets to my nerves that when programmers talk about writing maintainable code they jump to the relatively minor issues showing to have a blind spot for the issues that really matter.

The coding standards are the least of your worries!

Posted Jun 1, 2010 16:34 UTC (Tue) by Ed_L. (guest, #24287) [Link] (5 responses)

For the most part I agree, which is probably why Mark suggested limiting or forbidding inheritance in the first GCCC++ Guideline draft. (But I doubt it will fly). Composition vs. Inheritance is a basic question to all OO design, frequently aliased as "ISA vs. HASA". Prsumably, GCC already has some decent OO hacks to guide any OO "newbies" which in this particular context takes on a whole 'nother meaning as I 'spect the hardline GCC C hacks have eschewed OO because they are already aware of most of its pitfalls and simply don't want to deal with them. I expect some lively debate.

And related, all properties that are required to bring an object into a meaningful state should be supplied through the ctor.

Yabut here you run up against the "do you allow exceptions and where?" question,as well as your local definition of "meaningful state." Personally, I use exceptions. Except I try to avoid throwing from a constructor, because that requires a try block around the entire scope of the object. Rather, if an object needs to perform some exceptional operation in order to be initialized (e.g. allocating memory) I usually write a separate throwable Init() method that sets a private is_init flag to let all other methods know whether the object is ready to use. But (a) this isn't always possible, (b) I haven't investigated whether auto_ptr might be used to limit a constructor try block, and (c) YMMV anyway.

The Guide *will* be interesting.

The coding standards are the least of your worries!

Posted Jun 1, 2010 17:04 UTC (Tue) by Ed_L. (guest, #24287) [Link] (2 responses)

[Edit]
Yes, I meant the try block must surround the scope of any auto object whose constructor might throw. Certainly

C_Bar *pfoo;
try{
pfoo = new(C_Bar);
}catch(...){};

will work just fine. But I *like* auto objects....

And yes, if used an Init() method may return a status flag rather than throw an exception.

The coding standards are the least of your worries!

Posted Jun 1, 2010 17:13 UTC (Tue) by Ed_L. (guest, #24287) [Link]

Or even

pfoo = new C_Bar();

(Its what I get for writing code fragments without a compiler...)

The coding standards are the least of your worries!

Posted Jun 2, 2010 18:27 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

This is exceedingly bad design. Any good C++ programmer will write:

std::auto_ptr<C_BAR> bar(new C_Bar());

...or something like it. It's exception- and leak-proof.

In one of my projects we had a 'no naked new' rule. So the result of EACH 'new' must be wrapped in a smart pointer (there were only a few carefully audited places where this policy was relaxed). Worked wonders.

The coding standards are the least of your worries!

Posted Jun 1, 2010 17:49 UTC (Tue) by jwakely (subscriber, #60262) [Link]

> Personally, I use exceptions. Except I try to avoid throwing from a constructor,

This (not uncommon) approach amuses me. Handling errors in constructors was one of the motivations for adding exceptions to C++

> because that requires a try block around the entire scope of the object.

Only if you need to catch the exception in the same scope as the object. Usually I don't need to do that and let the exception exit the current scope (with destructors doing whatever cleanup is needed)

As has been suggested, adding exceptions to a large codebase that wasn't written with them in mind can cause problems. New code can be made exception-safe from the start and so there is less reason to not use exceptions. Writing exception-safe code is *not* difficult, it's just different.

The coding standards are the least of your worries!

Posted Jun 2, 2010 19:55 UTC (Wed) by rev (guest, #15082) [Link]

Well I may have jumped to conclusions about the OO awareness of the GCC team. If that is the case, sorry.

I've become a little touchy on the issue having seen so many C programmers struggle and failing with OO and seen so many programmers when asked about code quality jump right into detail issues like naming conventions and where to place the opening brace. Significant issues but the overall issues are way more fundamental.

See my post elsewhere for a comment on the guidelines.

The coding standards are the least of your worries!

Posted Jun 1, 2010 22:45 UTC (Tue) by Tobu (subscriber, #24111) [Link] (1 responses)

Good guidelines. I especially agree with the ones about keeping variables private not making things mutable lightly.

I'd like to amend the rule about multiple inheritance though; besides an optional base class, it is perfectly fine to inherit from a number of mixins. For example, any class that deals with a non-copyable system resource (such as a socket) should add boost::noncopyable (or the local equivalent) to its parents. And I recall friends can help making data more private to the world at large, and should be allowed to help write things like output operators that are closely related to the class.

The coding standards are the least of your worries!

Posted Jun 2, 2010 20:18 UTC (Wed) by rev (guest, #15082) [Link]

@Ed_L & Tobu

Surely. These are guidelines. And my list is not complete I might add. I like to formulate them a bit strict. Firstly to make them clear. Secondly to steer programmers with little OO experience away from the abuse their instinct seems to drives them to. You are allowed to break the rules when you understand them a kind of being the motto.

There will always be situations where we have to break the rules. Rules cannot deal 100% with all the details a quirks of reality.

And, of course, there are always situations where the nature of the underlying problem is such that deviation from the rules is the required solution, such as building a class hierarchy, or using friends.

The coding standards are the least of your worries!

Posted Jun 2, 2010 18:50 UTC (Wed) by ikm (guest, #493) [Link] (7 responses)

> Don't use private and protected inheritance, these are meaningless.

And why is that, exactly?

The coding standards are the least of your worries!

Posted Jun 2, 2010 19:48 UTC (Wed) by rev (guest, #15082) [Link] (6 responses)

Because they amount to delegation to a private/protected member. What you are doing when using private/protected inheritance is writing a HAS-A relationship as if it where an IS-A relationship.

The coding standards are the least of your worries!

Posted Jun 2, 2010 19:54 UTC (Wed) by ikm (guest, #493) [Link] (5 responses)

> writing a HAS-A relationship as if it where an IS-A relationship.

Yes, that's exactly what I'm doing. What's wrong with it, exactly?

The coding standards are the least of your worries!

Posted Jun 2, 2010 20:09 UTC (Wed) by ikm (guest, #493) [Link] (2 responses)

And by the way, it is still an IS-A relationship, since I can access inherited protected members and override virtual functions. This relation isn't exposed to the outside world, though. Imagine a Class which is a Werewolf, but it doesn't actually want anybody to know about that fact :)

What I want to say is, protected/private inheritance makes sense. A classic example is when you want to make an active object:

class Foo: QThread
{
...
private:
virtual void run();
}

You don't want start() and stop() to be exposed to the outside world, and yet you want it to be a thread (by being able to override the actual thread function).

Lastly, stating that something in the language is plain "meaningless" is implying that the people who agreed to add it to the standard are morons. This rarely makes sense.

The coding standards are the least of your worries!

Posted Jun 2, 2010 21:21 UTC (Wed) by rev (guest, #15082) [Link] (1 responses)

Well if you want your class to be a Werewolf but don't want the world to know about it your class really isn't a werewolf but has-a werewolf.

Your Foo class in your example is now a QThread except that it isn't: the outside world cannot call any of the methods that make it a QThread.

That you find yourself forced to use protected inheritance in your example is due to a modelling flaw in the QThread class. Not unlike the situation in which I found myself forced to subclass a HTTPConnection only to be able to get to the headers of a HTTP request. The QThread class should have something like a Worker member having the run() method. You implement a Worker (ie subclass it and implement the virtual run() function). You have a Qthread instance var, feed it your worker implementation, and call start() on your QThread.

A useful way, BTW, to find out whether you have woven different concepts into a single class, that thus should be decomposed into different classes is this:

Picture all the instance variables of your class and it superclass on one side. Likewise, picture all functions on the other side. Now draw a line between a function and a variable when said function uses that variable, If the resulting class now tends to consists of several unconnected graphs you have reason to consider decomposing your classes: the belong to the same hierarchy but appear to be unrelated.

Applied to the QThread example: the designers of the QThread should have asked themselves the question: is it desirable that an implementation of the run() function have access to the protected instance variables? The answer is no. Because QThread deals with the machinery of threading while the run() functions deals with matters of the user's problem domain, like reading messages from a queue. The stuff in the run() function is therefore another beast that needs to sit in its own class.

I find your 'moron' remark rather odd. Firstly, no language is perfect, no language designer is. Secondly, C++ was conceived at a time when understanding of OO was, by today's standards, rather limited.

The coding standards are the least of your worries!

Posted Jun 2, 2010 21:34 UTC (Wed) by ikm (guest, #493) [Link]

> Well if you want your class to be a Werewolf but don't want the world to know about it your class really isn't a werewolf but has-a werewolf.

Maybe to the world, but not to him. To him, he is a werewolf. At night, when there is a full moon, he transforms into an unholy beast and hunts basic_strings. The world doesn't have to know.

The coding standards are the least of your worries!

Posted Jun 2, 2010 20:38 UTC (Wed) by rev (guest, #15082) [Link]

Well.. a HAS-A relationship being modelled as an ISA-A?

Please see my HTTP connection example in my post http://lwn.net/Articles/390256/ to see why this is, in general bad.

The coding standards are the least of your worries!

Posted Jun 2, 2010 20:57 UTC (Wed) by rev (guest, #15082) [Link]

I forgot.

To recapitulate:

When you write an HAS-A as an IS-A you really tied two concepts into one which are not.

Suppose you have class D: C

Suppose later you need another flavor of C, say B, or you have to rewrite D You will have to rewrite D and its relationship.

Now write:

class D {
private:
C c;
}

You one day need a B. Introduce an abstract base class A for both B and C:

class D {
private:
A a;
}

Now you have the advantage that you can easily change 'a' to be a B or a C.

BTW, there's people that take this advantage to the next level. They recognized that A has become a dependency of D. And rather than hard-wiring this dependency in your code created tools that allow you to specify in a configuration file whether 'a' should be an B or a C. Some day another kind of A is needed? Write a class E, deploy it, modify the config file and there you go.

The coding standards are the least of your worries!

Posted Jun 2, 2010 23:41 UTC (Wed) by daglwn (guest, #65432) [Link]

Some guiding principles:

Most of these are good. A few comments and additional suggestions follow.

Use inheritance only when you are dealing with similar but different behaviors. I.e. only when one inherits to implement pure virtual functions.

This seems too strict to me. There are reasons to inherit beyond overriding virtual functions. The Curiously Recurring Template Pattern is a prime example (replacing dynamic polymorphism with static polymorphism).

Isolate responsibilities into a class.

I think I know what you mean, but we have to be careful with how we define "class" or "object." A class does not necessarily have all of its functionality in member functions (see below).

Write down every piece of behavior only once.

If I understand your meaning, this is good in every language. Factoring out common behavior is always important. If I've missed your point, please follow up. It's good to learn from others.

Variables are private and never public. They may be protected when subclasses need access to them.

I would go even stronger than that. Rather than make the variables protected, make the accessors (possibly getters/setters) protected. That way you minimize impacts of data member changes.

Avoid using `friend'. Don't use private and protected inheritance, these are meaningless.

This also seems overly restrictive. In general, yes, one doesn't need private or protected inheritance, but they have their uses, especially when composing behaviors from several classes (aha! see the next point :) ).

Don't use multiple inheritance. It is a can of worms. When you think you need it, you likely have violated the first two principles.

This is by far my biggest pet peeve of most "good C++ practice" lists. MI is incredibly useful. What one usually doesn't want to do is create a diamond hierarchy where base classes have data members. The problem with data members in diamond hierarchies is the ambiguity of initialization. The most-derived class needs to take care of initializing base class data members, which breaks encapsulation. But MI is very, very useful for imparting properties and/or varying behaviors on a common class (template). Mixins are wonderful things. MI gets used all over the place when template metaprogramming is involved (c.f. Modern C++ Design, boost, etc.).

I would add these guidelines:

Make all virtual functions private with a public API when necessary. This allows one to insert various behaviors (debug messages, etc.) in the public API once and have it apply to all invocations of the (private) virtual function.

Example:

class Base {
private:
  virtual void doItImpl(void) = 0;

public:
  void doIt(void) {
    debug("Calling doIt!");
    doItImpl();
  }
};

class Foo : Base {
private:
  void doItImpl(void) {
    doSomething();
  }
};

Enforce const correctness (applies to C as well).
Use RAII liberally.
Use exceptions.
Use type generators to specify data structure types. Do not hard-code std::map<>, etc. When C++0x is ready, use template aliases. This allows one to easily swap out data structure implementations and specialize them for various types. This is quite useful when working to optimize runtime.

Example:

template<typename Key, typename Value>
struct MapGenerator {
  typedef std::map<Key, Value> type;
};

template<typename Value>
struct MapGenerator<int, Value> {
  typedef MyFancyEfficientIntMapper<Value> type;
};

typedef MapGenerator<double, std::string>::type DoubleStringMap;
typedef MapGenerator<int, std::string>::type IntStringMap;

Prefer free functions. This may seem counterintuitive, but actually makes generic programming easier. Compare boost.Range(Ex) and the standard algorithms. Having freestanding begin() and end() functions is super useful. The same pattern applies to many intrinsic operations. Making them free functions instead of members means that one can adapt existing classes to the interface without having to muck around with changing the class (adding member functions, etc.).

And my number one guideline:

Do not write code unless you have to. This may seem obvious but there are countless examples of projects that refuse to use existing libraries like boost and end up reimplementing them. Badly. Reimplement the stuff if necessary, by all means (to solve proven performance bottlenecks, for example), but make a darn good case for doing so. This is right up there with "Don't optimize prematurely."