Posted Jun 28, 2011 7:39 UTC (Tue) by gowen (guest, #23914)
[Link]
While I agree that its undesirable in a library, I'd argue that it is by no means the simplest. Firstly, your API must now have an error-return mechanism for any function that could possibly fail due to resource starvation. Which means, basically, one of three things:
i) a function that returns a value now needs to reserve a special value to signal failure. Trivial for pointers, plausible for built-in numeric types, a pain for user-defined structures.
ii) A global (or thread-local) errno-type error code. Don't worry about the thread-safety issues, because the caller will forget to check it anyway.
iii) Pass a pointer and have the function write the return value to that address, and return true/false. This will work, but its uglifies your API, and its not simpler than calling abort().
Secondly, without an RAII-like stack unwinding mechanism, that does complicates resource management in functions. Why? because on failure you have to unwind any prior successfully allocated resources before returning. That's good engineering and to be encouraged, but its not more simple than calling abort().
And, of course, for most desktop linux installs, overcommit means that even if you get all that right, it still may prove to be fruitless.
Zeuthen: Writing a C library, part 1
Posted Jun 28, 2011 9:09 UTC (Tue) by dgm (subscriber, #49227)
[Link]
The old adage says make things as simple as possible, but no simpler. Calling abort() is often _too_ simple. Returning error codes, and per thread error flags are nothing new. It has been standard practice for ages. It works, and is not that complicated.
Yes, overcommit is a problem. When (if) you get a out-of-memory error, it means the system is pretty ill, possibly dying. The proper solution in that case varies from program to program, and even within a program. Sometimes you should free up resources, sleep a while, and try again. Sometimes aborting is the right thing to do. And yet sometimes you should ask the user how to proceed. That's the reason why fixing this policy in a library is bad design.
Zeuthen: Writing a C library, part 1
Posted Jun 28, 2011 9:20 UTC (Tue) by gowen (guest, #23914)
[Link]
Returning error codes, and per thread error flags are nothing new. It has been standard practice for ages. It works, and is not that complicated.
I never suggested it was overly complicated, I was disagreeing with the assertion that it was the *most* simple. Almost everything else you say I am in agreement with.
Good engineering says "don't call abort()"
Zeuthen: Writing a C library, part 1
Posted Jun 28, 2011 10:16 UTC (Tue) by gevaerts (subscriber, #21521)
[Link]
Did the original post say that though?
I didn't read "Proper memory management is most expected, and easiest implemented, in libraries" as "memory management is easier than anything else" but rather as "memory management in libraries is easier than anywhere else"
Zeuthen: Writing a C library, part 1
Posted Jun 28, 2011 12:34 UTC (Tue) by gowen (guest, #23914)
[Link]
That's a good point. I think its possible I mis-parsed the original posters intent, in which case I withdraw my argument about "easiest". But I do think that thread-local-errno is a horrible hack around C's limitations.
Zeuthen: Writing a C library, part 1
Posted Jun 28, 2011 13:31 UTC (Tue) by cmccabe (guest, #60281)
[Link]
> But I do think that thread-local-errno is a horrible hack
> around C's limitations.
Then you'll be happy to hear that thread-local errno has been deprecated for decades now. New functions that are added to POSIX generally return an error code indicating the error instead.
errno has nothing to do with "C's limitations" and everything to do with preserving compatibility with an older interface that isn't worth the effort to change.
Returning error codes is a great convention because you can flag them with __attribute__((warn_unused)). Then the programmer will get a warning from the compiler unless he checks the return code.
Zeuthen: Writing a C library, part 1
Posted Jun 29, 2011 7:16 UTC (Wed) by gowen (guest, #23914)
[Link]
Returning error codes is a great convention because you can flag them with __attribute__((warn_unused)). Then the programmer will get a warning from the compiler unless he checks the return code.
Returning error codes is fine in certain circumstances (particularly for functions where the side-effects are the point). Sometimes, though, you want your functions to be functions in the lambda-calculus sense - you want to return *results*.
In general, you can't return both results *and* error codes. (As I said, for pointers you can return NULL, and for functions whose domain of valid results is limited in some sense [abs()], you can, but if you're returning anything other than a pointer, int-type or floating-point-type, you're basically hosed.
Zeuthen: Writing a C library, part 1
Posted Jul 7, 2011 0:22 UTC (Thu) by cmccabe (guest, #60281)
[Link]
> In general, you can't return both results *and* error codes. (As I said,
> for pointers you can return NULL, and for functions whose domain of valid
> results is limited in some sense [abs()], you can, but if you're returning
> anything other than a pointer, int-type or floating-point-type, you're
> basically hosed.
>
If we're still talking about C/C++, then you can only ever return:
* a primitive (int, float, etc.)
* a pointer
* a struct
All of those have a natural 'none of the above' value. Integer types have 0 or a negative, floats and doubles have NaN, and pointers have NULL.
If you're returning a struct by value, then you're probably using C++, since C programmers rarely return an entire structure by value. The obvious thing to do is to either return an integer error code and take a reference to the thing to be modified, or use C++ exceptions. Either way, problem solved.
Being able to return multiple values at once is nice, but it's hardly the biggest challenge when using C or C++.
Zeuthen: Writing a C library, part 1
Posted Jul 7, 2011 12:08 UTC (Thu) by jwakely (subscriber, #60262)
[Link]
> In general, you can't return both results *and* error codes.
std::future<double> squerrt(double x)
{
std::promise<double> p;
if (x < 0)
p.set_exception(copy_exception(std::domain_error("negative")));
else
p.set_value(sqrt(x));
return p.get_future();
}
int main()
{
double i = squerrt(-1).get(); // boom
}
Zeuthen: Writing a C library, part 1
Posted Jun 28, 2011 9:58 UTC (Tue) by alexl (subscriber, #19068)
[Link]
When you get an out-of-memory error in an overcommited state there is no way to free up resources, nor ask the user how to proceed (and anyway that would generally require allocations). What happens is that on out-of-memory the oom killer wakes up and kills a semi-random process, with you having no say in this at all.
For the case where you have resources that can be safely freed in an out of memory situation the right thing to do is not OOM from allocation at all, but rather have some kind of signal for memory pressure when memory is tight (but not full). Then apps could handle this by cleaning up caches and other resources. That way you will not run into the OOM killer problem.
There is one kind of allocation failure that is not oom-killer related though, and thats where a single allocation is larger than the physical memory or the mappable region. This can happen for instance if you're reading in some random user file (say an image) and it happens to decode to a 8 gigabyte array (maybe because its an exploit, or just large). In these kinds of situation I think it makes sense to check for allocation failures, and glib does in fact have a call for that (g_try_malloc).
However, in most cases (like allocating internal know sized objects) I'm purely in the abort-on-oom school, since adding all the complexity (both to your code and to users of your library) means more bugs, and doesn't help anyway (since oom doesn't get reported, the kernel just kills some process instead). Of course, as david said in the article, there are of course exceptional situations, like core system software (init, dbus, etc) where we can't just have it die and where the complexity is worth it.
Zeuthen: Writing a C library, part 1
Posted Jun 28, 2011 17:48 UTC (Tue) by xtifr (subscriber, #143)
[Link]
> the oom killer wakes up
Assuming A) you have an OOM killer, and B) it hasn't been thoroughly disabled. If you're writing a _general purpose_ library, neither is really a valid assumption, though both remain possibilities you should remain aware of. Aside from that quibble, I basically agree with you, but I'll note that writing libraries for embedded systems comes with a whole additional set of complications of its own. (Basically, my advice would be to not try unless you or someone on your team has some expertise with embedded systems.)
Zeuthen: Writing a C library, part 1
Posted Jun 30, 2011 15:23 UTC (Thu) by nix (subscriber, #2304)
[Link]
I'm not sure you *can* completely disable overcommit. Robust Unix programs theoretically have to assume that they might get killed at any instant, either due to OOM in something like the stack (which obviously cannot be trapped), or due to a user sending it a kill signal.
Alas the latter is rare (and misbehaviour might be expected if you kill something maintaining persistent state while it is updating that state), and the former is so rare and so hard to cater to that simply nobody ever bothers. Sufficiently Paranoid Programs could avoid the stack-OOM by doing a massive deep recursion early on, to balloon their stack out to the maximum they might need. A few programs do this. You can avoid being user-killed by being installed setuid or setgid, but this has other disadvantages and is basically never done (at least not solely for this reason).
This is probably a fault of some kind in POSIX, but I have not even the faintest glimmerings of a clue as to how to fix it.
Zeuthen: Writing a C library, part 1
Posted Jul 1, 2011 9:46 UTC (Fri) by dgm (subscriber, #49227)
[Link]
I believe that what really paranoid programs have to do is keep critical state in non-volatile memory (a disk, a remote machine, etc), and do everything possible to ensure that it's always consistent. That way it doesn't matter if the program goes away because of a programming error, a kill signal or the power going down in the middle of a system call.
Zeuthen: Writing a C library, part 1
Posted Jul 1, 2011 13:40 UTC (Fri) by nix (subscriber, #2304)
[Link]
Yes, probably. And then we can get into fsync() flamewars instead! Isn't POSIX fun?
Zeuthen: Writing a C library, part 1
Posted Jul 3, 2011 23:09 UTC (Sun) by dgm (subscriber, #49227)
[Link]
We can probably shortcircuit the flamewar by using a relational database. Oh, noes! PostgreSQL vs. MySQL anyone? ;-)
Zeuthen: Writing a C library, part 1
Posted Jul 3, 2011 23:40 UTC (Sun) by nix (subscriber, #2304)
[Link]
And then we can write a nice high-performance FUSE filesystem on top of the relational database! And then we can run MySQL on top of that! (And then we can run the FUSE filesystem atop that, and run a virtual machine inside that filesystem. And then we have a nice room heater.)
Zeuthen: Writing a C library, part 1
Posted Jun 28, 2011 9:59 UTC (Tue) by dlang (✭ supporter ✭, #313)
[Link]
with overcommit, you don't get an error when you allocate memory, you get an OOM error later when you change the value of a variable
Zeuthen: Writing a C library, part 1
Posted Jun 28, 2011 10:58 UTC (Tue) by dgm (subscriber, #49227)
[Link]
In my machine, malloc() does fail with /proc/sys/vm/overcommit_memory changed to 2 (after allocating just 7 MiB). Also, not all systems are Linux.
Zeuthen: Writing a C library, part 1
Posted Jun 28, 2011 11:11 UTC (Tue) by alexl (subscriber, #19068)
[Link]
Its true that not all systems do overcommit, and even on linux you can disable it. However, in practice, for most "desktop" apps running on most OSes memory will be overcommited. Adding large amounts of complexity that will only run in rarely used configurations is largely a waste of time.
(Of course, as said before, there are special cases where its needed, but not in general).
Zeuthen: Writing a C library, part 1
Posted Jun 29, 2011 12:54 UTC (Wed) by michaeljt (subscriber, #39183)
[Link]
> Its true that not all systems do overcommit, and even on linux you can disable it. However, in practice, for most "desktop" apps running on most OSes memory will be overcommited.
At least on Linux overcommitting is a choice the user makes (or at least they can choose not too). And by overcommitting they are saying in a certain sense that they don't care too much about OOM. So I do see a certain sense in targeting the non-overcommitted situation and ignoring overcommit.
Slightly off-topic, but what is overcommit good for apart from forking (or more general copy-on-write)?
Zeuthen: Writing a C library, part 1
Posted Jun 29, 2011 18:29 UTC (Wed) by dlang (✭ supporter ✭, #313)
[Link]
overcommit is primarily good for copy-on-write situations, but can also help if an application declares a large structure and then doesn't use it (or at least doesn't use it before other apps exit and free up the real memory before it's needed)
it also allows you to deal with cases where a library/binary gets used, but not all of it is ever used. Linux will only read the pages from disk into memory that are actually needed. without overcommit space for the entire binary needs to be allocated, with overcommit it doesn't matter.y
the thing is that the COW situation is extremely common, so in practice overcommit works very well.
Zeuthen: Writing a C library, part 1
Posted Jun 29, 2011 21:59 UTC (Wed) by michaeljt (subscriber, #39183)
[Link]
> it also allows you to deal with cases where a library/binary gets used, but not all of it is ever used.
Is this quite the same thing? Those pages are all backed up by disk storage - assuming you meant the binary text - so they can be ejected from physical RAM again whenever needed. Thrashing instead of OOM-ing...
I suppose what I am wondering is that, given that there are such heavy handed mechanisms for dealing with OOM (the OOM monster) whether it might make sense to have a setting to only allow overcommitting for processes which have just forked, which are probably the main users of really overcommitted memory which they will probably never need. Then they could be the only ones liably to be killed on OOM and other processes could live more predictably.
Zeuthen: Writing a C library, part 1
Posted Jun 29, 2011 23:16 UTC (Wed) by dlang (✭ supporter ✭, #313)
[Link]
I am not sure if (with overcommit disabled) you have to allocate memory space for the entire binary or not.
the problem with your suggestion (only allow overcommit for processes that just forked), is that I don't see that working. you have no way of knowing if the process is going to exec something (relatively) soon, or if it's apache that forked a child that is going to stay running for the next year.
And I don't think it helps anyway.
the problem scenario is
large process A forks (creating A`), almost all it's memory is COW
process B allocates some memory, but doesn't touch it yet
process A` changes some memory (breaking COW), requiring real memory to hold the result.
process B then tries to use the memory it had previously allocated and finds that it is not available.
if you could somehow define 'forked recently' in a way that could be cheap enough, then you could possibly do it.
All this said, I really don't see many cases in practice where disabling overcommit will really help.
yes, you avoid the OOM killer kicking in and instead the process that tried to allocate memory dies instead.
but the idea that (in the general case), this will make your system more predictable is not something I believe. you have no way of knowing _which_ process (including system daemons) will need to allocate more memory at the instant that you are out, so you really don't know which process will die anyway. (and no, in general processes and libraries don't do anything except die when they run out of memory).
in some ways, it would make it easier to DOS a system, just have your memory hog _not_ die if a malloc fails, instead sleep and try again. eventually something else in the system will need memory and die, then you can repeat the process. you won't even be able to ssh in to the box to fix it, as you won't be able to spawn/fork a new process (as that will require memory allocation)
there's also the problem that without overcommit you need to have significantly more swap enabled in the system (since you have to have enough ram+swap to handle the peak theoretical memory use from large processes doing a fork+exec), and with the increasing gap between memory speed and disk speed, your system will dive into swap to the point of being useless (including the inability to login to it) before you start getting memory failures. With overcommit you can have a small amount of swap (including none) and instead count on the OOM killer + watchdog timers to bring the box down (and possibly even reboot it to recover) rather than having the box 'up' but unable to provide service.
Zeuthen: Writing a C library, part 1
Posted Jun 30, 2011 7:19 UTC (Thu) by michaeljt (subscriber, #39183)
[Link]
> large process A forks (creating A`), almost all it's memory is COW
>
> process B allocates some memory, but doesn't touch it yet
>
> process A` changes some memory (breaking COW), requiring real memory to hold the result.
>
>process B then tries to use the memory it had previously allocated and finds that it is not available.
That I do not see as a problem - when process B allocates the memory it is really allocated, and if A tries to use its COW memory later it will just not be there.
> if you could somehow define 'forked recently' in a way that could be cheap enough, then you could possibly do it.
That I do see as more of a problem. One could have some background thread gradually actually allocating the process's memory, but that is replacing one piece of complexity by another.
> but the idea that (in the general case), this will make your system more predictable is not something I believe. you have no way of knowing _which_ process (including system daemons) will need to allocate more memory at the instant that you are out, so you really don't know which process will die anyway. (and no, in general processes and libraries don't do anything except die when they run out of memory).
True, it doesn't change the fundamental problem that you need enough memory for whatever you want to do.
> in some ways, it would make it easier to DOS a system, just have your memory hog _not_ die if a malloc fails, instead sleep and try again. eventually something else in the system will need memory and die
I thought that ulimits were supposed to solve that. Do they work as intended these days?
> there's also the problem that without overcommit you need to have significantly more swap enabled in the system (since you have to have enough ram+swap to handle the peak theoretical memory use from large processes doing a fork+exec)
The idea was to disable overcommit except for forking, so that shouldn't be such an issue. Thinking about it one could also freeze the overcommitted process if it tries to actually use its memory and it isn't there (making sure there is a bit of memory left over for doing emergency process surgery).
Zeuthen: Writing a C library, part 1
Posted Jun 30, 2011 12:55 UTC (Thu) by nix (subscriber, #2304)
[Link]
I am not sure if (with overcommit disabled) you have to allocate memory space for the entire binary or not.
You don't. Overcommit does not apply to read-only file-backed regions, because they can be dropped at any time without harm.
Zeuthen: Writing a C library, part 1
Posted Jul 8, 2011 1:40 UTC (Fri) by kabloom (guest, #59417)
[Link]
Does Windows overcommit? Windows programs don't fork the way Linux programs do, so I would guess that overcommitting memory is much less of a problem on Windows.
Zeuthen: Writing a C library, part 1
Posted Jun 28, 2011 20:00 UTC (Tue) by dlang (✭ supporter ✭, #313)
[Link]
yes, you can get errors with malloc when you have overcommit enabled.
however the normal type of OOM problem you have when overcommit is turned on doesn't happen when you do a malloc, but instead when you attempt to write to a page that was shared and now cannot be (requiring additional memory pages)
Zeuthen: Writing a C library, part 1
Posted Jul 8, 2011 1:41 UTC (Fri) by kabloom (guest, #59417)
[Link]
Yes. ulimit is one way to get allocation errors even with overcommit enabled.
Zeuthen: Writing a C library, part 1
Posted Jun 28, 2011 10:09 UTC (Tue) by HelloWorld (guest, #56129)
[Link]
There are (at least) two problems with that. One is that the out-of-memory code paths are hard to test, the other one is that the C language doesn't support you in any way. Which is yet another reason not to use C.
Zeuthen: Writing a C library, part 1
Posted Jun 28, 2011 11:14 UTC (Tue) by dgm (subscriber, #49227)
[Link]
The traditional way to test out of memory code paths is to use a custom malloc() that fails when you want it to.
And yes, C doesn't support you, but also doesn't tie your hands. This can be a curse or a blessing, depending on what are you trying to accomplish. But you're right, it's another reason not to use C when you don't care about this stuff. Bu for when you do... you cannot replace it.
Zeuthen: Writing a C library, part 1
Posted Jun 28, 2011 11:51 UTC (Tue) by HelloWorld (guest, #56129)
[Link]
> And yes, C doesn't support you, but also doesn't tie your hands.
Not tying my hands isn't enough nowadays. C++ provides destructors and exceptions to help me deal with resource management and error handling. Other languages have linear types. C has literally nothing, and that just sucks.
Zeuthen: Writing a C library, part 1
Posted Jun 28, 2011 13:56 UTC (Tue) by cmccabe (guest, #60281)
[Link]
It's funny that you mention "destructors and exceptions" as a way of dealing with error handling. If you're any good at C++ at all, you already know that exceptions can't help you inside destructors.
In practice, when you write code in higher-level languages, you do not handle out-of-memory errors. The one exception is when you're allocating some really huge chunk of memory. The whole point of writing the code in the higher-level language is so that you don't have to deal with that stuff.
The whole point of classes like std::vector and std::string is that they do the memory allocations for you, and also handle cleanup. But that makes it a lot harder to do something sensible if the allocation fails.
Just consider trying to handle every possible std::bad_alloc. If small allocations are failing, that means I can't create a std::string, because std::string will try to allocation memory. If I can't create a std::string without getting an exception, that means I can't use std::string in a destructor. If I can't use std::string in a destructor, then I can't construct any class that uses std::string in a destructor. And around and around it goes, until finally the programmer just decides enough! and declares that small allocations must never fail.
C++ isn't unique in this regard. Java also just aborts when it runs out of memory, and so does Ruby. Handling memory automatically is incompatible with having a detailed programmer-specified plan about exactly how to handle it-- what a surprise.
Zeuthen: Writing a C library, part 1
Posted Jun 28, 2011 14:04 UTC (Tue) by jwakely (subscriber, #60262)
[Link]
How often do you need to create classes that perform allocations in a destructor? I certainly don't do that often.
Zeuthen: Writing a C library, part 1
Posted Jun 28, 2011 17:33 UTC (Tue) by MisterIO (guest, #36192)
[Link]
Let's say that you have created a class that creates a rb-tree(dinamically allocating the nodes). How would you clean it up in the destructor? Erase one node at a time? that would mean that destroying the tree is an O(nlog(n)) operation. Another way would be to do a visit in order of the tree and put pointers to the elements on a vector and then free them. This way the destruction of the tree would be an O(n) operation. There are probably other ways to achieve this, but it was just an example.
Zeuthen: Writing a C library, part 1
Posted Jun 28, 2011 17:48 UTC (Tue) by MisterIO (guest, #36192)
[Link]
actually doing an inorder visit, checking if a node has both children as NULL, you could free it once you've reached the father, so that shouldn't be needed.
Zeuthen: Writing a C library, part 1
Posted Jun 28, 2011 21:06 UTC (Tue) by elanthis (guest, #6227)
[Link]
Or, if you're any good at your job and have any understanding of modern CPUs and system architectures, you're not allocating each node as an independent block but rather as a single pool which can be released with a single call to free().
Zeuthen: Writing a C library, part 1
Posted Jun 29, 2011 11:51 UTC (Wed) by MisterIO (guest, #36192)
[Link]
If you have a templated class which can take classes of variable dimensions( i.e. whose dimensions are dinamic at runtime), how would you do that?
You can do that for the nodes structs, but that alone would still be a minimal part of the overall allocations.
Zeuthen: Writing a C library, part 1
Posted Jun 29, 2011 17:18 UTC (Wed) by cmccabe (guest, #60281)
[Link]
Perhaps you could use a memory pool that was freed later.
Zeuthen: Writing a C library, part 1
Posted Jun 29, 2011 19:25 UTC (Wed) by MisterIO (guest, #36192)
[Link]
Yeah, I'm not saying that you can't use it in general, but that's a good optimization if you've programmed everything. If the classes that are inserted are someone else's code and they don't use the pool, then suddenly you're gonna have various different memory allocation/deallocation patterns. It could be done, is it always better? I don't know, I consider it an optimization. The suggestion would be to code it in a simple way(which doesn't mean dumb), then profile it and see if/where the bottleneck is and try to solve that and so on. Starting with a complex and optimized design that could be not needed, I'm not sure that's always a good choice.
Zeuthen: Writing a C library, part 1
Posted Jun 29, 2011 15:02 UTC (Wed) by jwakely (subscriber, #60262)
[Link]
Why O(nlog(n))? Are you rebalancing your tree as you destroy it?
Zeuthen: Writing a C library, part 1
Posted Jun 29, 2011 16:04 UTC (Wed) by MisterIO (guest, #36192)
[Link]
Actually, I'm not doing anything, this was a purely hypothetical case constructed to create an example of where it could make sense to allocate in a destructor(though in the end, with the other post, I noticed that it probably wasn't needed).
Anyway, no, it's not about rebalancing. Even on a normal tree removal of an object is an O(h) operation(with h equal to the height of the tree). In an rb-tree, being it a balanced tree, the height of the tree is (a*log(n))(with some "a" number which now I don't remember). Thus the removal of all the n elements becomes an O(nlog(n)) operation. If instead you do a find_minimum() followed by an inorder visit from there, you're gonna have a delete_tree() operation which is O(n).
Zeuthen: Writing a C library, part 1
Posted Jun 29, 2011 17:53 UTC (Wed) by jwakely (subscriber, #60262)
[Link]
Just start at the root node and start destroying nodes, you don't need to find the minimum and you don't care what order the nodes get destroyed in.
If you allocate a std::vector to do it you have to walk the entire map, then iterate across the vector too, for no good reason.
A RB tree's dtor can be O(n) without allocating memory.
Zeuthen: Writing a C library, part 1
Posted Jun 29, 2011 19:16 UTC (Wed) by MisterIO (guest, #36192)
[Link]
I've already said myself that you likely don't need to do that, in my 2nd post. About the order though, how are you going to free nodes starting from the root without allocating temporary variables for children and then from the children of the children, etc?
Zeuthen: Writing a C library, part 1
Posted Jun 29, 2011 23:51 UTC (Wed) by njs (guest, #40338)
[Link]
The slightly inefficient way is to just do something like
while (there are nodes left) {
current_node = root;
while (has_children(current_node))
current_node = first_undeleted_child(current_node);
delete(current_node);
}
That deletes everything, needs no heap storage, and it's O(n (log n)^2), not so bad.
The more efficient way is to just preallocate a vector whose length is the same as your maximum tree height (resizing when necessary), and then in the destructor use it as the stack for a single-pass depth-first traversal. The extra memory required is pretty trivial. Not sure if it's worth the bother, though.
Well, honestly I'm not sure that any of this is worth the bother. It's a nice puzzle to handle this sort of thing properly in a classic data structure with perfect encapsulation from the rest of the program, but in real life it doesn't help until you've solved a whole pile of harder and idiosyncratic problems too.
Zeuthen: Writing a C library, part 1
Posted Jun 30, 2011 10:33 UTC (Thu) by MisterIO (guest, #36192)
[Link]
>That deletes everything, needs no heap storage, and it's O(n (log n)^2), not so bad.
There's no point to this, a simple remove(value_at_root)(value at root simply because it's the simplest to find) for a balanced binary tree already takes O(log(n)), so why would I create a different algorithm to achieve the same performance?
Zeuthen: Writing a C library, part 1
Posted Jun 30, 2011 18:01 UTC (Thu) by njs (guest, #40338)
[Link]
If your rebalancing algorithm doesn't require any heap allocations then right, there is no point.
Zeuthen: Writing a C library, part 1
Posted Jul 3, 2011 8:14 UTC (Sun) by vlovich (guest, #63271)
[Link]
I'm not understanding the claims that it takes anything other than O(n) in the worst case for a tree de-allocation.
function destroy_tree(tree) {
foreach child in tree
delete_tree(child)
free(tree)
}
Zeuthen: Writing a C library, part 1
Posted Jul 3, 2011 11:35 UTC (Sun) by cladisch (✭ supporter ✭, #50193)
[Link]
If a generic deletion function is used, it might rebalance the tree after each step.
Zeuthen: Writing a C library, part 1
Posted Jul 3, 2011 18:35 UTC (Sun) by nix (subscriber, #2304)
[Link]
When you're writing a tree deallocator, I'd expect it to have privileged access to the interior of the tree. There's no reason not to use a specialized high-speed node deletor in that case: rebalancing the tree between every deletion is silly, because it will never be accessed again. (Even a multithread-safe tree should probably block and warn on accesses once tree deallocation begins: any such accesses are racy and could easily access rubbish if they happened just an instant later.)
Zeuthen: Writing a C library, part 1
Posted Jul 3, 2011 19:39 UTC (Sun) by vlovich (guest, #63271)
[Link]
Rebalancing the tree would be enormously redundant - the extra 4-6 lines of really simple code is worth it (hell, if you're lazy, provide a boolean to the internal delete function that says whether or not to rebalance).
Thread-safety is irrelevant here - if you can access the tree after the destructor is already called, you've failed at thread safety (actually you've failed at maintaining the lifetime of the object properly).
Zeuthen: Writing a C library, part 1
Posted Jul 3, 2011 22:04 UTC (Sun) by nix (subscriber, #2304)
[Link]
I think we are in violent agreement.
Zeuthen: Writing a C library, part 1
Posted Jul 6, 2011 15:56 UTC (Wed) by jwakely (subscriber, #60262)
[Link]
I'm glad someone sensible agrees with the point I made way back in http://lwn.net/Articles/449645/ ... I was starting to think I was going mad and everyone else knew something about RB trees that I didn't
Zeuthen: Writing a C library, part 1
Posted Jul 6, 2011 16:56 UTC (Wed) by vlovich (guest, #63271)
[Link]
Same here. I mean it's been a while since I took data structures. Maybe things have changed ;).
Zeuthen: Writing a C library, part 1
Posted Jun 30, 2011 6:07 UTC (Thu) by dark (subscriber, #8483)
[Link]
You can do it in O(n) by abusing the child pointers of nodes that you've visited but not yet destroyed. For example if you always descend to the left node first, then you can put the current node on a linked list that's threaded through the left pointers. When you hit the bottom of the tree, you take a node out of the linked list, destroy it, and proceed to its right child.
This way you visit each node a constant number of times (once to put it on the list, once to take it off) and you only need two temporary pointers: one for the start of the list and one for the child to visit next. You can optimize a bit for leaf nodes but that doesn't impact the time complexity.
Managing the linked list is easy because you're just using it as a stack.
Zeuthen: Writing a C library, part 1
Posted Jun 30, 2011 10:28 UTC (Thu) by MisterIO (guest, #36192)
[Link]
>then you can put the current node on a linked list that's threaded through the left pointers
We were discussing about the problems caused by allocations in the destructor. And what exacly would I gain by doing this instead of, like I said in the first post, find_minimum(), then next() till the end and save pointers to elements on a vector, then free them all through the vector of pointers? Also, if my assumption in my second post is correct, you don't even need to save temporaries if you do an inorder visit, you just need to free the node once you've reached the parent, checking that the children have all its own children NULL(which means that you've already visited all the grandchildren).
Zeuthen: Writing a C library, part 1
Posted Jul 3, 2011 20:48 UTC (Sun) by dark (subscriber, #8483)
[Link]
What you gain is a destructor that works in O(n) time and does no allocations. By reusing the child pointers as a linked list, you avoid having to make any allocations for it.
I think your approach can get there too, with some refinement. How are you implementing the inorder visit?
Zeuthen: Writing a C library, part 1
Posted Jul 6, 2011 16:01 UTC (Wed) by jwakely (subscriber, #60262)
[Link]
Posted Jun 28, 2011 15:05 UTC (Tue) by gowen (guest, #23914)
[Link]
If I can't use std::string in a destructor, then I can't construct any class that uses std::string in a destructor
Well ... no. You can't.
But I can't think of a sane use case where I would need to, so it's not really a big deal. Can you give a concrete example of where this is a problem, and where a string literal cannot be sensibly used instead?
Follow up question: how would you handle this clean-up in C?
Why would this approach not work in C++ (suitably wrapped as a destructor)?
Zeuthen: Writing a C library, part 1
Posted Jun 28, 2011 18:24 UTC (Tue) by cmccabe (guest, #60281)
[Link]
> But I can't think of a sane use case where I would need to, so it's
> not really a big deal
class MyFile {
MyFile(const char *filename) { fd = open(filename); ... etc ... }
~MyFile() {
if (close(fd)) {
std::ostringstream oss;
oss << "close failed, but I can't throw an exception from "
"a destructor!";
log(oss.str().c_str());
}
}
}
However, streams can allocate memory. So if we run out of memory, get a std::bad_alloc, and try to unwind our stack, we'll probably suffer the insta-kill that happens when you throw an exception while unwinding another exception.
This is only the most simple, obvious example of why you might need to allocate memory in a constructor. Any time you put together strings with concatenation, create a stringstream object, or even call syslog, you are allocating memory. And those are all obvious things to do when handling an error.
So if you can't throw an exception, and you can't log the error, things look pretty grim for RAII. So in practice, everyone just gives up on the whole handling std::bad_alloc thing.
N.B.: In this situation, having a close method with saner error handling, in addition to the destructor, is a good idea. In that case, the caller would be expected to call close before the object is destroyed, and the destructor would just be a fallback. This is a common pattern in real-world code.
Zeuthen: Writing a C library, part 1
Posted Jun 29, 2011 7:25 UTC (Wed) by gowen (guest, #23914)
[Link]
You've created a ostringstream in order to write a string literal to it, in order to obtain the address of the string literal to pass to some logging function? You've created an unnecessary object and performed an unnecessary copy - and THAT is your sane-use-case counter example?
Zeuthen: Writing a C library, part 1
Posted Jun 29, 2011 14:52 UTC (Wed) by jwakely (subscriber, #60262)
[Link]
So make sure your logging API supports printf-style API as well (or instead of) only supporting writing a single string, then you don't need to concatenate std::strings.
Go back and do it again. This is a common pattern in real-world code reviews.
Zeuthen: Writing a C library, part 1
Posted Jun 30, 2011 19:07 UTC (Thu) by cmccabe (guest, #60281)
[Link]
> So make sure your logging API supports printf-style API as
> well (or instead of) only supporting writing a single string,
> then you don't need to concatenate std::strings.
printf can also allocate memory. That is why it is not signal-safe.
Zeuthen: Writing a C library, part 1
Posted Jun 30, 2011 23:44 UTC (Thu) by jwakely (subscriber, #60262)
[Link]
but it won't throw a std::bad_alloc, or any other exception, and you were talking about the dangers of exceptions being thrown from destructors, right? and how you can't do anything useful in a destructor because it might throw, and therefore "look pretty grim for RAII"
(although I'm not sure why you were talking about that, the comment you replied to wasn't talking about exceptions *in* destructors, just exceptions *and* destructors as being helpful for error handling)
Zeuthen: Writing a C library, part 1
Posted Jul 2, 2011 0:20 UTC (Sat) by cmccabe (guest, #60281)
[Link]
I never said that you "couldn't do anything useful in a destructor". Destructors are helpful in general for error handling in C++. I just said that std::bad_alloc is not a great way to handle the failure of small memory allocations.
It is true that printf would get you out of a jam in this case, because it doesn't throw the silly bad_alloc exception when it fails to allocate memory. You could also catch bad_alloc in your destructors whenever it pops up, and try to do something reasonable. In some cases, you can force the nothrow variant of new to be used (although not when std::vector and friends are involved.) But that is going to be very difficult in any non-trivial program.
I get annoyed when people talk about exceptions as a panacea for error handling problems. You don't have to think, just use exceptions, and your problems will magically melt away. The reality is, writing low-level code that correctly handles errors is very difficult in either C or C++. It's simple enough to avoid new and free, but once you start needing to implement transactional behavior-- either a function call succeeds or it fails, but it doesn't leave a mess-- things get difficult.
If you find yourself in a scenario where you have to undo changes that you made earlier to some data structure, your error handling may itself encounter an error. And how do you automatically handle that? Well, you have to engage your brain and structure the operation into a prepare operation followed by a commit operation. And that is not going to be trivial in either C or C++.
Zeuthen: Writing a C library, part 1
Posted Jul 4, 2011 0:11 UTC (Mon) by dgm (subscriber, #49227)
[Link]
I'm afraid your simple example got much more flak that it deserved. People, it was only an example! The truth is, if I had a dime for every time I have seen something like what you pointed out, I would be rich by now.
The problem with encapsulation (and information hiding in general) is that sometimes you _need_ to take the implementation into account. It doesn't mean that it's not a great engineering principle, it is. It's just that it sometimes makes things complicated. In this case, before you can use something in a destructor you have to be pretty sure that the implementation will not throw an exception. That's not only difficult to do, it's subject to constant change in current development paradigms.
> I get annoyed when people talk about exceptions as a panacea for error handling problems.
Me too. No amount of exceptions, garbage collectors or type inference can replace your brain. They are there you help you with the grunt work, but it doesn't mean you can forget how to properly engineer stuff, because from time to time, you need to.
> It's simple enough to avoid new and free, but once you start needing to implement transactional behavior-- either a function call succeeds or it fails, but it doesn't leave a mess-- things get difficult.
So true. I have come to the conclusion that error handling is maybe 30-50% of the effort needed to get properly done code. And regretfully is often a neglected part. For instance, what you call transactional behavior is called "class invariants" in OO programming. It's critical to well behaved long living programs, but many people haven't head about them (for those: no action should let an object in an inconsistent state).
Zeuthen: Writing a C library, part 1
Posted Jun 28, 2011 16:08 UTC (Tue) by HelloWorld (guest, #56129)
[Link]
> The whole point of classes like std::vector and std::string is that they do the memory allocations for you, and also handle cleanup. But that makes it a lot harder to do something sensible if the allocation fails.
No, it actually makes it easier. You don't have to check every operation that needs to allocate memory, and you don't have to manually clean up what has been allocated already when an allocation fails.
> If I can't create a std::string without getting an exception, that means I can't use std::string in a destructor.
If you need to allocate memory dynamically in a destructor, that's tough, but it doesn't make destructors less useful in the 95% of all cases where you don't.
Zeuthen: Writing a C library, part 1
Posted Jul 1, 2011 10:50 UTC (Fri) by jwakely (subscriber, #60262)
[Link]
Exactly! (on both points)
Zeuthen: Writing a C library, part 1
Posted Jun 29, 2011 13:02 UTC (Wed) by michaeljt (subscriber, #39183)
[Link]
> C++ provides destructors and exceptions to help me deal with resource management and error handling.
Can you be sure though without testing e.g. like what the OP mentioned that your destructors and exceptions will handle the OOP case correctly?
Zeuthen: Writing a C library, part 1
Posted Jun 29, 2011 13:04 UTC (Wed) by michaeljt (subscriber, #39183)
[Link]
Sorry, OOM of course.
Zeuthen: Writing a C library, part 1
Posted Jun 28, 2011 13:32 UTC (Tue) by nix (subscriber, #2304)
[Link]
Not least because you can't tell where that library might be used. X.org had to discourage use of its (IIRC) dbus support because that library's behaviour on so many errors is to abort, and when your X server aborts that might leave you with a dead display or a locked-up PCI bus.
The rule I follow is simply that libraries should never ever abort, period, except perhaps in a limited number of functions whose documented purpose is to terminate the process. Yes, it means you have to actually bother with a bit of error handling. That's the least you can expect from a half-competent C programmer, I'd say.
Zeuthen: Writing a C library, part 1
Posted Jun 30, 2011 14:08 UTC (Thu) by stevem (subscriber, #1512)
[Link]
Absolutely. 100%.
A library should *never* call abort in an attempt at error-handling. You throw away any chance for the calling code to do sensible stuff such as clean up properly.
Zeuthen: Writing a C library, part 1
Posted Jun 30, 2011 15:19 UTC (Thu) by nix (subscriber, #2304)
[Link]
And I see David has linked to this comment with the link text 'people sometimes fail to realize that the caller has a responsibility'. I might respond that people sometimes fail to realize that libraries have a responsibility not to shoot the whole process in the head when the caller could perfectly well recover.
(Ironically, elsewhere in the same post David comments that libraries should not manipulate signal dispositions because these are process-wide state. I'd call the continued existence of a process process-wide state, as well, but apparently David disagrees. I think his attitude leads to appalling and unnecessary instability: he thinks mine leads to unfixed bugs in callers. We are probably both right, but ne'er the twain shall meet.)
Zeuthen: Writing a C library, part 1
Posted Jun 30, 2011 17:49 UTC (Thu) by davidz2525 (subscriber, #58065)
[Link]
I think I even prefaced the word 'people' with 'smart' :-). But, yeah, I completely agree that we're both right which is why the guidance is to establish a policy on what to do when the programmer screws up rather than doing either A or B. I brought this up mostly to address the popular misconception that libdbus-1 is an evil library that eats babies - it's not.
(Btw, another misconception is that libdbus-1 is meant for applications to use; it's really not, it's more intended for a base that you can build language/toolkit bindings on top + the implementation used to implement dbus-daemon(1) which MUST be able to handle OOM.)
Zeuthen: Writing a C library, part 1
Posted Jun 30, 2011 21:12 UTC (Thu) by nix (subscriber, #2304)
[Link]
If it's a base that you can build language bindings on top of, then the libdbus-1 authors have even *less* idea than they might have as to what the process is doing, and it's even more critical that it should never die. (But it is true that a lot of Great Big Languages have no hope of handling OOM errors: dynamic allocation is too fundamental to them. However, for others this same property is a benefit, because they often have a lot of dynamically-allocated storage that they can get rid of, and even their own allocators so they can be sure the storage finds its way back to the OS again.)
Damn good articles btw.
Zeuthen: Writing a C library, part 1
Posted Jul 1, 2011 5:28 UTC (Fri) by whot (subscriber, #50317)
[Link]
FTR, we discouraged dbus support mostly because the communication API was less than ideal and just talking directly to hal (now udev) was more sensible.
The abort() was extremely annoying while programming but it also lead to the code being very stable. Every small error got punished badly so we had to make sure we fixed up the code properly.
Zeuthen: Writing a C library, part 1
Posted Jul 1, 2011 9:37 UTC (Fri) by nix (subscriber, #2304)
[Link]
OK, so my memory was faulty :) not much of a surprise there, some days it seems it consists mostly of faults.
Zeuthen: Writing a C library, part 1
Posted Jun 28, 2011 13:38 UTC (Tue) by cmccabe (guest, #60281)
[Link]
> Aborting in an OOM scenario isn't very good advice for a library
I agree. If you write your library to abort when a memory allocation fails, you are forcing that policy on the library user. He may not want it.
Even if you choose not to handle out-of-memory errors, calling abort() doesn't seem like the right thing to do. Just because your library can't get its job done doesn't mean that the developer necessarily wants to bring down the whole application. Maybe that job was something extremely minor and we just want to keep going.
Zeuthen: Writing a C library, part 1
Posted Jun 28, 2011 15:48 UTC (Tue) by nix (subscriber, #2304)
[Link]
I wonder if David was a Perl programmer before he was a C programmer? 'die on error' actually makes sense in Perl, because library users can trap it. But they can't trap abort(). You should abort() about as often as you BUG_ON() in the kernel: when you can't continue, your internal state is blown and can't be fixed, and the world is ending. And nothing you can predict (e.g. OOM) should lead you into such a state.
Zeuthen: Writing a C library, part 1
Posted Jun 28, 2011 16:18 UTC (Tue) by HelloWorld (guest, #56129)
[Link]
> But they can't trap abort().
They can use longjump in the SIGABRT signal handler. But that probably doesn't make much of a difference, as a library that calls abort probably won't free allocated resources before it does, resulting in resource leaks.