LWN.net Logo

Zeuthen: Writing a C library, part 1

Zeuthen: Writing a C library, part 1

Posted Jun 28, 2011 11:51 UTC (Tue) by HelloWorld (guest, #56129)
In reply to: Zeuthen: Writing a C library, part 1 by dgm
Parent article: Zeuthen: Writing a C library, part 1

> And yes, C doesn't support you, but also doesn't tie your hands.
Not tying my hands isn't enough nowadays. C++ provides destructors and exceptions to help me deal with resource management and error handling. Other languages have linear types. C has literally nothing, and that just sucks.


(Log in to post comments)

Zeuthen: Writing a C library, part 1

Posted Jun 28, 2011 13:56 UTC (Tue) by cmccabe (guest, #60281) [Link]

It's funny that you mention "destructors and exceptions" as a way of dealing with error handling. If you're any good at C++ at all, you already know that exceptions can't help you inside destructors.

In practice, when you write code in higher-level languages, you do not handle out-of-memory errors. The one exception is when you're allocating some really huge chunk of memory. The whole point of writing the code in the higher-level language is so that you don't have to deal with that stuff.

The whole point of classes like std::vector and std::string is that they do the memory allocations for you, and also handle cleanup. But that makes it a lot harder to do something sensible if the allocation fails.

Just consider trying to handle every possible std::bad_alloc. If small allocations are failing, that means I can't create a std::string, because std::string will try to allocation memory. If I can't create a std::string without getting an exception, that means I can't use std::string in a destructor. If I can't use std::string in a destructor, then I can't construct any class that uses std::string in a destructor. And around and around it goes, until finally the programmer just decides enough! and declares that small allocations must never fail.

C++ isn't unique in this regard. Java also just aborts when it runs out of memory, and so does Ruby. Handling memory automatically is incompatible with having a detailed programmer-specified plan about exactly how to handle it-- what a surprise.

Zeuthen: Writing a C library, part 1

Posted Jun 28, 2011 14:04 UTC (Tue) by jwakely (subscriber, #60262) [Link]

How often do you need to create classes that perform allocations in a destructor? I certainly don't do that often.

Zeuthen: Writing a C library, part 1

Posted Jun 28, 2011 17:33 UTC (Tue) by MisterIO (guest, #36192) [Link]

Let's say that you have created a class that creates a rb-tree(dinamically allocating the nodes). How would you clean it up in the destructor? Erase one node at a time? that would mean that destroying the tree is an O(nlog(n)) operation. Another way would be to do a visit in order of the tree and put pointers to the elements on a vector and then free them. This way the destruction of the tree would be an O(n) operation. There are probably other ways to achieve this, but it was just an example.

Zeuthen: Writing a C library, part 1

Posted Jun 28, 2011 17:48 UTC (Tue) by MisterIO (guest, #36192) [Link]

actually doing an inorder visit, checking if a node has both children as NULL, you could free it once you've reached the father, so that shouldn't be needed.

Zeuthen: Writing a C library, part 1

Posted Jun 28, 2011 21:06 UTC (Tue) by elanthis (guest, #6227) [Link]

Or, if you're any good at your job and have any understanding of modern CPUs and system architectures, you're not allocating each node as an independent block but rather as a single pool which can be released with a single call to free().

Zeuthen: Writing a C library, part 1

Posted Jun 29, 2011 11:51 UTC (Wed) by MisterIO (guest, #36192) [Link]

If you have a templated class which can take classes of variable dimensions( i.e. whose dimensions are dinamic at runtime), how would you do that?
You can do that for the nodes structs, but that alone would still be a minimal part of the overall allocations.

Zeuthen: Writing a C library, part 1

Posted Jun 29, 2011 17:18 UTC (Wed) by cmccabe (guest, #60281) [Link]

Perhaps you could use a memory pool that was freed later.

Zeuthen: Writing a C library, part 1

Posted Jun 29, 2011 19:25 UTC (Wed) by MisterIO (guest, #36192) [Link]

Yeah, I'm not saying that you can't use it in general, but that's a good optimization if you've programmed everything. If the classes that are inserted are someone else's code and they don't use the pool, then suddenly you're gonna have various different memory allocation/deallocation patterns. It could be done, is it always better? I don't know, I consider it an optimization. The suggestion would be to code it in a simple way(which doesn't mean dumb), then profile it and see if/where the bottleneck is and try to solve that and so on. Starting with a complex and optimized design that could be not needed, I'm not sure that's always a good choice.

Zeuthen: Writing a C library, part 1

Posted Jun 29, 2011 15:02 UTC (Wed) by jwakely (subscriber, #60262) [Link]

Why O(nlog(n))? Are you rebalancing your tree as you destroy it?

Zeuthen: Writing a C library, part 1

Posted Jun 29, 2011 16:04 UTC (Wed) by MisterIO (guest, #36192) [Link]

Actually, I'm not doing anything, this was a purely hypothetical case constructed to create an example of where it could make sense to allocate in a destructor(though in the end, with the other post, I noticed that it probably wasn't needed).
Anyway, no, it's not about rebalancing. Even on a normal tree removal of an object is an O(h) operation(with h equal to the height of the tree). In an rb-tree, being it a balanced tree, the height of the tree is (a*log(n))(with some "a" number which now I don't remember). Thus the removal of all the n elements becomes an O(nlog(n)) operation. If instead you do a find_minimum() followed by an inorder visit from there, you're gonna have a delete_tree() operation which is O(n).

Zeuthen: Writing a C library, part 1

Posted Jun 29, 2011 17:53 UTC (Wed) by jwakely (subscriber, #60262) [Link]

Just start at the root node and start destroying nodes, you don't need to find the minimum and you don't care what order the nodes get destroyed in.
If you allocate a std::vector to do it you have to walk the entire map, then iterate across the vector too, for no good reason.
A RB tree's dtor can be O(n) without allocating memory.

Zeuthen: Writing a C library, part 1

Posted Jun 29, 2011 19:16 UTC (Wed) by MisterIO (guest, #36192) [Link]

I've already said myself that you likely don't need to do that, in my 2nd post. About the order though, how are you going to free nodes starting from the root without allocating temporary variables for children and then from the children of the children, etc?

Zeuthen: Writing a C library, part 1

Posted Jun 29, 2011 23:51 UTC (Wed) by njs (guest, #40338) [Link]

The slightly inefficient way is to just do something like
while (there are nodes left) {
  current_node = root;
  while (has_children(current_node))
    current_node = first_undeleted_child(current_node);
  delete(current_node);
}
That deletes everything, needs no heap storage, and it's O(n (log n)^2), not so bad.

The more efficient way is to just preallocate a vector whose length is the same as your maximum tree height (resizing when necessary), and then in the destructor use it as the stack for a single-pass depth-first traversal. The extra memory required is pretty trivial. Not sure if it's worth the bother, though.

Well, honestly I'm not sure that any of this is worth the bother. It's a nice puzzle to handle this sort of thing properly in a classic data structure with perfect encapsulation from the rest of the program, but in real life it doesn't help until you've solved a whole pile of harder and idiosyncratic problems too.

Zeuthen: Writing a C library, part 1

Posted Jun 30, 2011 10:33 UTC (Thu) by MisterIO (guest, #36192) [Link]

>That deletes everything, needs no heap storage, and it's O(n (log n)^2), not so bad.
There's no point to this, a simple remove(value_at_root)(value at root simply because it's the simplest to find) for a balanced binary tree already takes O(log(n)), so why would I create a different algorithm to achieve the same performance?

Zeuthen: Writing a C library, part 1

Posted Jun 30, 2011 18:01 UTC (Thu) by njs (guest, #40338) [Link]

If your rebalancing algorithm doesn't require any heap allocations then right, there is no point.

Zeuthen: Writing a C library, part 1

Posted Jul 3, 2011 8:14 UTC (Sun) by vlovich (guest, #63271) [Link]

I'm not understanding the claims that it takes anything other than O(n) in the worst case for a tree de-allocation.
function destroy_tree(tree) {
   foreach child in tree
       delete_tree(child)
   free(tree)
}

Zeuthen: Writing a C library, part 1

Posted Jul 3, 2011 11:35 UTC (Sun) by cladisch (✭ supporter ✭, #50193) [Link]

If a generic deletion function is used, it might rebalance the tree after each step.

Zeuthen: Writing a C library, part 1

Posted Jul 3, 2011 18:35 UTC (Sun) by nix (subscriber, #2304) [Link]

When you're writing a tree deallocator, I'd expect it to have privileged access to the interior of the tree. There's no reason not to use a specialized high-speed node deletor in that case: rebalancing the tree between every deletion is silly, because it will never be accessed again. (Even a multithread-safe tree should probably block and warn on accesses once tree deallocation begins: any such accesses are racy and could easily access rubbish if they happened just an instant later.)

Zeuthen: Writing a C library, part 1

Posted Jul 3, 2011 19:39 UTC (Sun) by vlovich (guest, #63271) [Link]

Rebalancing the tree would be enormously redundant - the extra 4-6 lines of really simple code is worth it (hell, if you're lazy, provide a boolean to the internal delete function that says whether or not to rebalance).

Thread-safety is irrelevant here - if you can access the tree after the destructor is already called, you've failed at thread safety (actually you've failed at maintaining the lifetime of the object properly).

Zeuthen: Writing a C library, part 1

Posted Jul 3, 2011 22:04 UTC (Sun) by nix (subscriber, #2304) [Link]

I think we are in violent agreement.

Zeuthen: Writing a C library, part 1

Posted Jul 6, 2011 15:56 UTC (Wed) by jwakely (subscriber, #60262) [Link]

I'm glad someone sensible agrees with the point I made way back in http://lwn.net/Articles/449645/ ... I was starting to think I was going mad and everyone else knew something about RB trees that I didn't

Zeuthen: Writing a C library, part 1

Posted Jul 6, 2011 16:56 UTC (Wed) by vlovich (guest, #63271) [Link]

Same here. I mean it's been a while since I took data structures. Maybe things have changed ;).

Zeuthen: Writing a C library, part 1

Posted Jun 30, 2011 6:07 UTC (Thu) by dark (subscriber, #8483) [Link]

You can do it in O(n) by abusing the child pointers of nodes that you've visited but not yet destroyed. For example if you always descend to the left node first, then you can put the current node on a linked list that's threaded through the left pointers. When you hit the bottom of the tree, you take a node out of the linked list, destroy it, and proceed to its right child.

This way you visit each node a constant number of times (once to put it on the list, once to take it off) and you only need two temporary pointers: one for the start of the list and one for the child to visit next. You can optimize a bit for leaf nodes but that doesn't impact the time complexity.

Managing the linked list is easy because you're just using it as a stack.

Zeuthen: Writing a C library, part 1

Posted Jun 30, 2011 10:28 UTC (Thu) by MisterIO (guest, #36192) [Link]

>then you can put the current node on a linked list that's threaded through the left pointers
We were discussing about the problems caused by allocations in the destructor. And what exacly would I gain by doing this instead of, like I said in the first post, find_minimum(), then next() till the end and save pointers to elements on a vector, then free them all through the vector of pointers? Also, if my assumption in my second post is correct, you don't even need to save temporaries if you do an inorder visit, you just need to free the node once you've reached the parent, checking that the children have all its own children NULL(which means that you've already visited all the grandchildren).

Zeuthen: Writing a C library, part 1

Posted Jul 3, 2011 20:48 UTC (Sun) by dark (subscriber, #8483) [Link]

What you gain is a destructor that works in O(n) time and does no allocations. By reusing the child pointers as a linked list, you avoid having to make any allocations for it.

I think your approach can get there too, with some refinement. How are you implementing the inorder visit?

Zeuthen: Writing a C library, part 1

Posted Jul 6, 2011 16:01 UTC (Wed) by jwakely (subscriber, #60262) [Link]

Zeuthen: Writing a C library, part 1

Posted Jun 28, 2011 15:05 UTC (Tue) by gowen (guest, #23914) [Link]

If I can't use std::string in a destructor, then I can't construct any class that uses std::string in a destructor
Well ... no. You can't.

But I can't think of a sane use case where I would need to, so it's not really a big deal. Can you give a concrete example of where this is a problem, and where a string literal cannot be sensibly used instead?

Follow up question: how would you handle this clean-up in C?
Why would this approach not work in C++ (suitably wrapped as a destructor)?

Zeuthen: Writing a C library, part 1

Posted Jun 28, 2011 18:24 UTC (Tue) by cmccabe (guest, #60281) [Link]

> But I can't think of a sane use case where I would need to, so it's
> not really a big deal

class MyFile {
  MyFile(const char *filename) { fd = open(filename); ... etc ... }
  ~MyFile() {
    if (close(fd)) {
      std::ostringstream oss;
      oss << "close failed, but I can't throw an exception from "
             "a destructor!";
      log(oss.str().c_str());
    }
  }
}
However, streams can allocate memory. So if we run out of memory, get a std::bad_alloc, and try to unwind our stack, we'll probably suffer the insta-kill that happens when you throw an exception while unwinding another exception.

This is only the most simple, obvious example of why you might need to allocate memory in a constructor. Any time you put together strings with concatenation, create a stringstream object, or even call syslog, you are allocating memory. And those are all obvious things to do when handling an error.

So if you can't throw an exception, and you can't log the error, things look pretty grim for RAII. So in practice, everyone just gives up on the whole handling std::bad_alloc thing.

N.B.: In this situation, having a close method with saner error handling, in addition to the destructor, is a good idea. In that case, the caller would be expected to call close before the object is destroyed, and the destructor would just be a fallback. This is a common pattern in real-world code.

Zeuthen: Writing a C library, part 1

Posted Jun 29, 2011 7:25 UTC (Wed) by gowen (guest, #23914) [Link]

You've created a ostringstream in order to write a string literal to it, in order to obtain the address of the string literal to pass to some logging function? You've created an unnecessary object and performed an unnecessary copy - and THAT is your sane-use-case counter example?

Zeuthen: Writing a C library, part 1

Posted Jun 29, 2011 14:52 UTC (Wed) by jwakely (subscriber, #60262) [Link]

So make sure your logging API supports printf-style API as well (or instead of) only supporting writing a single string, then you don't need to concatenate std::strings.

Go back and do it again. This is a common pattern in real-world code reviews.

Zeuthen: Writing a C library, part 1

Posted Jun 30, 2011 19:07 UTC (Thu) by cmccabe (guest, #60281) [Link]

> So make sure your logging API supports printf-style API as
> well (or instead of) only supporting writing a single string,
> then you don't need to concatenate std::strings.

printf can also allocate memory. That is why it is not signal-safe.

Zeuthen: Writing a C library, part 1

Posted Jun 30, 2011 23:44 UTC (Thu) by jwakely (subscriber, #60262) [Link]

but it won't throw a std::bad_alloc, or any other exception, and you were talking about the dangers of exceptions being thrown from destructors, right? and how you can't do anything useful in a destructor because it might throw, and therefore "look pretty grim for RAII"

(although I'm not sure why you were talking about that, the comment you replied to wasn't talking about exceptions *in* destructors, just exceptions *and* destructors as being helpful for error handling)

Zeuthen: Writing a C library, part 1

Posted Jul 2, 2011 0:20 UTC (Sat) by cmccabe (guest, #60281) [Link]

I never said that you "couldn't do anything useful in a destructor". Destructors are helpful in general for error handling in C++. I just said that std::bad_alloc is not a great way to handle the failure of small memory allocations.

It is true that printf would get you out of a jam in this case, because it doesn't throw the silly bad_alloc exception when it fails to allocate memory. You could also catch bad_alloc in your destructors whenever it pops up, and try to do something reasonable. In some cases, you can force the nothrow variant of new to be used (although not when std::vector and friends are involved.) But that is going to be very difficult in any non-trivial program.

I get annoyed when people talk about exceptions as a panacea for error handling problems. You don't have to think, just use exceptions, and your problems will magically melt away. The reality is, writing low-level code that correctly handles errors is very difficult in either C or C++. It's simple enough to avoid new and free, but once you start needing to implement transactional behavior-- either a function call succeeds or it fails, but it doesn't leave a mess-- things get difficult.

If you find yourself in a scenario where you have to undo changes that you made earlier to some data structure, your error handling may itself encounter an error. And how do you automatically handle that? Well, you have to engage your brain and structure the operation into a prepare operation followed by a commit operation. And that is not going to be trivial in either C or C++.

Zeuthen: Writing a C library, part 1

Posted Jul 4, 2011 0:11 UTC (Mon) by dgm (subscriber, #49227) [Link]

I'm afraid your simple example got much more flak that it deserved. People, it was only an example! The truth is, if I had a dime for every time I have seen something like what you pointed out, I would be rich by now.

The problem with encapsulation (and information hiding in general) is that sometimes you _need_ to take the implementation into account. It doesn't mean that it's not a great engineering principle, it is. It's just that it sometimes makes things complicated. In this case, before you can use something in a destructor you have to be pretty sure that the implementation will not throw an exception. That's not only difficult to do, it's subject to constant change in current development paradigms.

> I get annoyed when people talk about exceptions as a panacea for error handling problems.

Me too. No amount of exceptions, garbage collectors or type inference can replace your brain. They are there you help you with the grunt work, but it doesn't mean you can forget how to properly engineer stuff, because from time to time, you need to.

> It's simple enough to avoid new and free, but once you start needing to implement transactional behavior-- either a function call succeeds or it fails, but it doesn't leave a mess-- things get difficult.

So true. I have come to the conclusion that error handling is maybe 30-50% of the effort needed to get properly done code. And regretfully is often a neglected part. For instance, what you call transactional behavior is called "class invariants" in OO programming. It's critical to well behaved long living programs, but many people haven't head about them (for those: no action should let an object in an inconsistent state).

Zeuthen: Writing a C library, part 1

Posted Jun 28, 2011 16:08 UTC (Tue) by HelloWorld (guest, #56129) [Link]

> The whole point of classes like std::vector and std::string is that they do the memory allocations for you, and also handle cleanup. But that makes it a lot harder to do something sensible if the allocation fails.
No, it actually makes it easier. You don't have to check every operation that needs to allocate memory, and you don't have to manually clean up what has been allocated already when an allocation fails.

> If I can't create a std::string without getting an exception, that means I can't use std::string in a destructor.
If you need to allocate memory dynamically in a destructor, that's tough, but it doesn't make destructors less useful in the 95% of all cases where you don't.

Zeuthen: Writing a C library, part 1

Posted Jul 1, 2011 10:50 UTC (Fri) by jwakely (subscriber, #60262) [Link]

Exactly! (on both points)

Zeuthen: Writing a C library, part 1

Posted Jun 29, 2011 13:02 UTC (Wed) by michaeljt (subscriber, #39183) [Link]

> C++ provides destructors and exceptions to help me deal with resource management and error handling.

Can you be sure though without testing e.g. like what the OP mentioned that your destructors and exceptions will handle the OOP case correctly?

Zeuthen: Writing a C library, part 1

Posted Jun 29, 2011 13:04 UTC (Wed) by michaeljt (subscriber, #39183) [Link]

Sorry, OOM of course.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds