My advice on implementing stuff in C:

Posted Oct 15, 2010 16:03 UTC (Fri) by Ed_L. (guest, #24287)
In reply to: My advice on implementing stuff in C: by mjthayer
Parent article: Russell: On C Library Implementation

Its not just you, but for the sake of argument it may as well be :) If you feel you personally are a better, more productive programmer using C rather than C++, by all means use C. Aside from corner cases, its a subset :) :)

Me, I've been productive with C++ for over twenty years, and really like it. I'll grant there are more modern languages, but for HPC purposes I haven't found any more powerful, until I recently stumbled across D. (You know, the language C++ always wanted to be but was too rushed.) And that trip is too recent for me to draw a firm conclusion.

Some will ague that Java is just as good at HPC, and for them they are probably right. (Insert obligatory Fortran dereference here.) I also dabble in system programming, and just personally prefer one language that does it all. Others prefer to mix and match. And surely there must be places for Perl and its ilk -- provided they are kept brief and to the point.

"Although programmers dream of a small, simple languages, it seems when they wake up what they really want is more modelling power." -- Andrei Alexandrescu

My advice on implementing stuff in C:

Posted Oct 15, 2010 16:21 UTC (Fri) by mjthayer (guest, #39183) [Link] (34 responses)

> Its not just you, but for the sake of argument it may as well be :) If you feel you personally are a better, more productive programmer using C rather than C++, by all means use C.

I do now prefer to use C for that reason. But I still find C++ tantalisingly tempting, as it can do so many things that are just painful in C. I do know from experience though that it will come back to haunt me if I give in to the temptation. And I am experimenting to find ways to do those things more easily in C. The two that I miss most are automatic destruction of local objects (which is actually just a poor man's garbage collection) and STL containers.

Oh yes, add binary compatibility with other things to my list of complaints above; dvdeug's comment below is one example of the problem. That is something that has hurt me more often than I expected.

My advice on implementing stuff in C:

Posted Oct 15, 2010 20:25 UTC (Fri) by mpr22 (subscriber, #60784) [Link]

I gave up on C for recreational programming for one very simple reason: It is impossible to write vector arithmetic in a civilized syntax in C.

My advice on implementing stuff in C:

Posted Oct 16, 2010 10:18 UTC (Sat) by paulj (subscriber, #341) [Link] (32 responses)

Have you looked at Vala? Modern OOP language that builds on GLib and spits out C. Seems reasonably sane, certainly compared to C++...

My advice on implementing stuff in C:

Posted Oct 18, 2010 8:41 UTC (Mon) by marcH (subscriber, #57642) [Link] (4 responses)

> Have you looked at Vala? Modern OOP language that builds on GLib and spits out C.

Compiling to a lower-level yet still "human-writable" language is an interesting approach that can be successful to some extend. However it always has this major drawback: debugging & profiling becomes incredibly more difficult. It also gives a really hard time to fancy IDEs. All these need tight integration and the additional layer of indirection breaks that. So handing maintenance of average/poor quality code over to other developers becomes nearly impossible.

My advice on implementing stuff in C:

Posted Oct 18, 2010 9:09 UTC (Mon) by mjthayer (guest, #39183) [Link] (3 responses)

> Compiling to a lower-level yet still "human-writable" language is an interesting approach that can be successful to some extend. However it always has this major drawback: debugging & profiling becomes incredibly more difficult.

Without having looked at Vala, I don't see why this has to be the case. C itself is implemented as a pipeline, this would just add one stage onto the end. The main problem to solve that I can see is how to pass information down to the lower levels about what C code corresponds to which Vala code.

My advice on implementing stuff in C:

Posted Oct 18, 2010 9:28 UTC (Mon) by cladisch (✭ supporter ✭, #50193) [Link] (2 responses)

> The main problem to solve that I can see is how to pass information down to the lower levels about what C code corresponds to which Vala code.

C has the #line directive for that (GCC doc); AFAIK Vala generates it when in debug mode.

My advice on implementing stuff in C:

Posted Oct 18, 2010 10:58 UTC (Mon) by mjthayer (guest, #39183) [Link]

> C has the #line directive for that (GCC doc); AFAIK Vala generates it when in debug mode.
Sounds reasonable as long as they skip the pre-processor stage, otherwise things might get rather confused. I assume that their variables map one-to-one to C variables to simplify debugging.

My advice on implementing stuff in C:

Posted Oct 19, 2010 11:23 UTC (Tue) by nix (subscriber, #2304) [Link]

I don't entirely understand why they don't generate it always. GCC's own preprocessor does. If you don't, even compiler error messages will be wrong, and you want them to be right even if you're not in debug mode.

My advice on implementing stuff in C:

Posted Oct 18, 2010 9:14 UTC (Mon) by mjthayer (guest, #39183) [Link] (26 responses)

> Have you looked at Vala? Modern OOP language that builds on GLib and spits out C.
I haven't looked at GLib that closely though. Is it used anywhere other than user space/desktop programming? If you are careful about what language features you use - and to disable exceptions! - C++ can be used very close to the bone (or iron or whatever).

My advice on implementing stuff in C:

Posted Oct 18, 2010 18:01 UTC (Mon) by paulj (subscriber, #341) [Link]

You can make Vala not use GLib if you wish, on a class by class basis by makring them "Compact". You lose some things, like the automatically refcounted classes, inheritance.

My advice on implementing stuff in C:

Posted Oct 19, 2010 11:24 UTC (Tue) by nix (subscriber, #2304) [Link] (24 responses)

Yes, glib is used all over the place these days. syslog-ng and dbus aren't desktop programs by any means.

My advice on implementing stuff in C:

Posted Oct 19, 2010 15:19 UTC (Tue) by mjthayer (guest, #39183) [Link]

> Yes, glib is used all over the place these days. syslog-ng and dbus aren't desktop programs by any means.

They are still definitely user space though. If you are careful, C++ can be used for driver or even kernel code (e.g. the TU-Dresden implementation of the L4 micro-kernel with its unfortunate name was implemented in C++). Perhaps GLib would be too with a bit of work on it, I haven't used it enough to know.

My advice on implementing stuff in C:

Posted Oct 21, 2010 2:35 UTC (Thu) by wahern (subscriber, #37304) [Link] (22 responses)

Perfect. Core system daemons using a library that aborts on malloc() failure.

Geez.

This is why I never use Linux on multi-user systems.

My advice on implementing stuff in C:

Posted Oct 21, 2010 3:00 UTC (Thu) by foom (subscriber, #14868) [Link] (21 responses)

With the default settings on many distros, you're much more likely to just get a random process on your box forcibly killed when you run out of memory than for malloc to fail. So, there's really not much point in being able to gracefully handle malloc failure...

Just so long as pid 1 can deal with malloc failure, that's pretty much good enough: it can just respawn any other daemon that gets forcibly killed or aborts due to malloc failure.

My advice on implementing stuff in C:

Posted Oct 21, 2010 19:55 UTC (Thu) by nix (subscriber, #2304) [Link] (20 responses)

Quite so. Note that things like bash also abort on malloc() failure.

My advice on implementing stuff in C:

Posted Oct 21, 2010 20:15 UTC (Thu) by mjthayer (guest, #39183) [Link] (19 responses)

> Quite so. Note that things like bash also abort on malloc() failure.

Isn't that the FSF's standard recommendation (/requirement)? I find the thought amusing that if you subdivide your application well into different processes and make sure that you set atexit() functions for those resources that won't be freed by the system that isn't so far away from throwing an exception in C++.

My advice on implementing stuff in C:

Posted Oct 21, 2010 22:01 UTC (Thu) by nix (subscriber, #2304) [Link] (18 responses)

Yes, it is. It's really the only sensible thing to do. If you want to do something more complex than die on malloc() failure, do it in a parent monitor process: anything else is too likely to be unable to do whatever the recovery process is, because, well, you're still out of memory. (Bonus: overcommit and the OOM killer work fine with this model, as long as your monitor process is much smaller than the OOMing one, which is very likely. It's even more certain to work if the monitor oom_adj/oom_score's itself away from being OOM-killed.)

My advice on implementing stuff in C:

Posted Oct 22, 2010 21:42 UTC (Fri) by wahern (subscriber, #37304) [Link] (17 responses)

That's horrible design for a system, especially a server system. All of my daemons handle malloc failure. If I'm streaming a video feed to 2,000 clients and get a failure on the 2,001st (descriptor failure, malloc failure, any other failure), why would I destroy all 2,000 contexts when I can just fail one!? The practice is called graceful failure for a reason.

The first thing I do on any of my server systems is to disable overcommit. Even w/ it disabled I believe the kernel will still overcommit in some places (fork, perhaps), but at least I don't need to worry about some broken application 'causing some other critical service to be terminated.

If an engineer can't handle malloc failure how can he be expected to handle any other myriad possible failure modes? Handling malloc failure is hardly any more difficult, if at all, than handling other types of failures (disk full, descriptor limit, shared memory segment limit, thread limit, invalid input, etc, etc, etc). With proper design all those errors should share the same failure path; if you can't handle one you probably aren't handling any of them properly.

Plus, it's a security nightmare. If the 2,001st client can cause adverse results to the other 2,000 clients... that's a fundamentally broken design. Yes, there are other issues (bandwidth, etc), but those are problems to be addressed, not justifications for skirking responsibility.

And of course, on embedded system's memory (RAM and swap) isn't the virtually limitless resource as on desktops or servers.

Bailing on malloc is categorically wrong for any daemon, and most user-interactive applications. Bailing on malloc failure really only makes sense for batch jobs, where a process is doing one thing, and so exiting the process is equivalent to signaling inability to complete that particular job. Once you start juggling multiple jobs internally, bailing on malloc failure is a bug, plain and simple.

My advice on implementing stuff in C:

Posted Oct 22, 2010 22:18 UTC (Fri) by nix (subscriber, #2304) [Link] (14 responses)

Well, you still do need to worry about that. Not because of fork(): because of the stack. Unless your programs carefully start with a huge deep recursion to blow the stack out, you're risking an OOM kill every single time you make a function call. So you do need to deal with it anyway.

I don't know of any programs (other than certain network servers doing simple highly decoupled jobs, and sqlite, whose testing framework is astonishingly good) where malloc() failure is usefully handled. Even when they try, a memory allocation easily slips in there, and how often are those code paths tested? Oops, you die. From a brief inspection glibc has a number of places where it kills you on malloc() failure too (mostly due to trying to handle errors and failing), and a number of places where the error handling is there but is obviously leaky or leads to the internal state of things getting messed up. And if glibc can't get it right, who can? In practice this is not a problem because glibc also calls functions so can OOM-kill you just by doing that.

(And having one process doing only one job? That's called good design for the vast majority of Unix programs. Massive internal multithreading is a model you move to because you are *forced* to, and one consequence of it is indeed much worse consequences on malloc() failure.)

Even Apache calls malloc() here and there instead of using memory pools. Most of these handle errors by aborting (such as some MPM worker calls) or don't even check (pretty much all of the calls in the NT service-specific worker, but maybe NT malloc() never returns NULL, I dunno).

In an ideal world I would agree with you... but in practice handling all memory errors as gracefully as you suggest would result in our programs disappearing under a mass of almost-untestable massively bitrotten error-handling code. Better to isolate things into independently-failable units. (Not that anyone does that anyway, and with memory as cheap as it is now, I can't see anyone's handling of OOM improving in any non-safety-critical system for some time. Hell, I was at the local hospital a while back and their *MRI scanner* sprayed out-of-memory errors on the screen and needed restarting. Now *that* scared me...)

My advice on implementing stuff in C:

Posted Oct 23, 2010 1:14 UTC (Sat) by wahern (subscriber, #37304) [Link] (13 responses)

glib or glibc? Those are completely different libraries. If glibc is aborting on allocation error than it's non-conforming and it should be reported as a bug. There's a reason C and POSIX define ENOMEM.

As for the stack, the solution there is easy, don't recurse. Any recursive algorithm can be re-written as an iterative algorithm. Of course, if you use a language that optimizes tail-calls then you're already set. C doesn't, and therefore writing recursive algorithms is a bad idea, and it's why it's quite uncommon in C code.

As for testing error paths: if somebody isn't testing error paths than they're not testing error paths. What difference does it matter whether they're not testing malloc failure or they're not testing invalid input? It's poor design; it creates buggy code. And if you use good design habits, like RAII (not just a C++ pattern), then the places for malloc failure to occur are well isolated. It's not a very good argument to point out that most engineers write crappy code. We all know this; we all do it ourselves; but it's ridiculous to make excuses for it. If you can't handle the responsibility, then don't write applications in C or for its typical domain. If I'm writing non-critical or throw-away code, I'll use Perl or something else. Why invest the effort in using a language with features--explicit memory management--that I'm not going to use?

Using a per-process context design is in many circumstances a solid choice (not for me because I write HP embedded network server software, though I do prefer processes instead of threads for concurrency, so I might have 2 processes per cpu each handling hundreds of connections). But here's another problem w/ default Linux--because of overcommit, it's not always--perhaps not even often--that the offending process gets killed; it's the next guy paging in a small amount of memory that gets killed. It's retarded. It's a security problem. Can you imagine your SSH session getting OOMd because someone was fuzzing your website? It happens.

Why make excuses for poor design?

My advice on implementing stuff in C:

Posted Oct 23, 2010 3:18 UTC (Sat) by foom (subscriber, #14868) [Link]

> it's not always--perhaps not even often--that the offending process gets killed; it's the next guy paging in a small amount of memory that gets killed.

Actually, the OOM-killer tries *very* hard to not simply kill the next guy paging in a small amount of memory, but to determine what the real problem process is and kill that instead. It doesn't always find the correct culprit, but it often does, and at least it tends not to kill your ssh session.

My advice on implementing stuff in C:

Posted Oct 23, 2010 18:29 UTC (Sat) by paulj (subscriber, #341) [Link] (6 responses)

Why make excuses for poor design?

Nix isn't making excuses, he's pointing out reality. Which, sadly, is always far from perfect. A programme which is designed to cope with failure *despite* the suckiness of reality should do better than one that depends on perfection underneath it...

My advice on implementing stuff in C:

Posted Oct 23, 2010 19:39 UTC (Sat) by wahern (subscriber, #37304) [Link]

Robustness, like security, should be applied in-depth. Of course I use monitor processes and dead man switches to restart processes. But I don't rely on one to the exclusion of another.

My advice on implementing stuff in C:

Posted Oct 24, 2010 15:17 UTC (Sun) by nix (subscriber, #2304) [Link] (4 responses)

Indeed. It is simply reality that nobody ever tests malloc() failure paths -- at least, they do not and cannot test every combination of malloc-fails-and-then-it-doesn't, because there is an exponential explosion of them. People do not armour most programs, even important ones, to survive malloc() failure, because it would make the code unreadable and because available memory continues to shoot upwards so most people prefer to assume that reasonably sized allocations will not fail unless something is seriously wrong with the machine. And, guess what? They're right nearly all the time.

The suggestion to avoid stack-OOM by converting recursive algorithms to iterative ones is just another example of this, because while deep recursion is more likely to stack-OOM than the function calls involved in an iterative algorithm, the latter will still happen now and then. The only way to avoid *that* is to do a deep recursion first, and then ensure that you never call functions further down in the call stack than you have already allocated, neither in your code nor in any library you may call. I know of no tools to make this painful maintenance burden less painful. So nobody at all armours against this case, either.

I think it *is* important to trap malloc() failure so that you can *log which malloc() failed* before you die (and that means your logging functions *do* have to be malloc()-failure-proof: I normally do this by having them take their allocations out of a separate, pre-mmap()ed emergency pool). Obviously this doesn't work if you are stack-OOMed, nor if the OOM-killer zaps you. Note that this *is* an argument against memory overcommit: that overcommit makes it harder to detect which of many allocations in a program is buggy and running away allocating unlimited storage. But 'we want to recover from malloc() failure' is not a good reason to not use overcommmitment, because very few programs even try, and of those that try, most are surely lethally buggy in this area in any case: and fixing this is completely impractical.

Regarding my examples above: glib always aborts on malloc() failure, so so do all programs that use it. glibc does not, but its attempts to handle malloc() failure are buggy and leaky at best, and of course it (like everything else) remains vulnerable to stack- or CoW-OOM.

My advice on implementing stuff in C:

Posted Oct 25, 2010 10:05 UTC (Mon) by hppnq (guest, #14462) [Link] (3 responses)

The only way to avoid *that* [stack-OOM] is to do a deep recursion first, and then ensure that you never call functions further down in the call stack than you have already allocated, neither in your code nor in any library you may call.

You would have to know in advance how deep you can recurse, or you should be able to handle SIGSEGV. The maximum stack size can be tuned through rlimits, and that should solve wahern's problem of some other process draining out all available memory. This problem is not the result of bad programming, but of bad systems management.

(That said, rlimits are horribly broken. Just add more memory. ;-)

My advice on implementing stuff in C:

Posted Oct 25, 2010 22:28 UTC (Mon) by paulj (subscriber, #341) [Link] (2 responses)

FWIW, it's not defined what happens if you overflow the stack. You can't rely on getting a SEGV (isn't that a very recent addition to Linux, thanks to that Xorg security hole)?

My advice on implementing stuff in C:

Posted Oct 25, 2010 22:36 UTC (Mon) by nix (subscriber, #2304) [Link] (1 responses)

Even if you do get SIGSEGV from a stack-OOM, well, you'd better hope the system supports sigaltstack() as well, or you'll not be able to call the signal handler... oh, and, btw, it is (even now) easier to make a list of the systems on which sigaltstack() works properly than the systems on which it does not :(

My advice on implementing stuff in C:

Posted Oct 26, 2010 7:55 UTC (Tue) by hppnq (guest, #14462) [Link]

The point is, you can't safely expand the stack by recursing deeply in order to prevent running out of stack.

My advice on implementing stuff in C:

Posted Oct 25, 2010 11:04 UTC (Mon) by mjthayer (guest, #39183) [Link] (4 responses)

> As for the stack, the solution there is easy, don't recurse.

Just out of interest, are there really no simple ways (as nix suggested) to allocate a fixed-size stack at programme begin in Linux userland? I can't see any theoretical reasons why it should be a problem.

> And if you use good design habits, like RAII (not just a C++ pattern), then the places for malloc failure to occur are well isolated.

Again, I am interested in how you do RAII in C. I know the (in my opinion ugly and error-prone) goto way, and I could think of ways to do at run time what C++ does at compile time (doesn't have to be a bad thing, although more manual steps would be needed). Do you have any other insights?

My advice on implementing stuff in C:

Posted Oct 25, 2010 11:52 UTC (Mon) by hppnq (guest, #14462) [Link]

Just out of interest, are there really no simple ways (as nix suggested) to allocate a fixed-size stack at programme begin in Linux userland?

ld --stack or something similar?

My advice on implementing stuff in C:

Posted Oct 25, 2010 22:41 UTC (Mon) by nix (subscriber, #2304) [Link] (2 responses)

You do RAII in C by wrapping everything up in opaque structures allocated by dedicated allocators and freed either by dedicated freers or by APR-style pool destructors. If you're using mempools, you can even get close to the automagic destructor calls of C++ (you still have to free a mempool, but if you free the pool the free cascades down all contained pools and all their destructors.)

My advice on implementing stuff in C:

Posted Oct 26, 2010 8:06 UTC (Tue) by mjthayer (guest, #39183) [Link] (1 responses)

> You do RAII in C by wrapping everything up in opaque structures allocated by dedicated allocators and freed either by dedicated freers or by APR-style pool destructors.

Right, roughly what I was thinking of. Thanks for the concrete pointers!

My advice on implementing stuff in C:

Posted Oct 26, 2010 8:18 UTC (Tue) by mjthayer (guest, #39183) [Link]

> Right, roughly what I was thinking of.

Except of course that there is no overriding need to use memory pools. You can also keep track of multiple allocations (possibly also with destructors) in some structure and free them all at one go when you are done. Freeing many allocations in one go rather than freeing each as soon as it is no longer needed might also be more efficient cache-wise.

My advice on implementing stuff in C:

Posted Oct 25, 2010 1:56 UTC (Mon) by vonbrand (subscriber, #4458) [Link]

No overcommit makes OOM kills much more likely (even in cases which would work fine otherwise). You've got your logic seriously backwards...

My advice on implementing stuff in C:

Posted Oct 25, 2010 16:10 UTC (Mon) by bronson (subscriber, #4806) [Link]

> Bailing on malloc is categorically wrong for any daemon, and most user-interactive applications. Bailing on malloc failure really only makes sense for batch jobs

OK, let's say your interactive application has just received a malloc failure. What should it do? Display an error dialog? Bzzt, that takes memory. Free up some buffers? There's good chance that any memory you free will just get sucked up by a rogue process and your next malloc attempt will fail too. And the next one. And the next one. And be careful with your error-handling code paths because, if you cause more data to get paged in from disk (say, a page of string constants that are only accessed in OOM conditions), you're now in even deeper trouble.

Bailing out is about the only thing ANY process can reliably do. If you try to do anything more imaginative, you are almost guaranteed to get it wrong and make things worse.

The days of cooperative multitasking and deterministic memory behavior are long gone (or, more accurately, restricted to a tiny sliver of embedded environments that no general purpose toolchain would ever consider a primary target). And good riddance! Programming is so much nicer these days that, even though this seems heinous, I'd never want to go back.

I can virtually guarantee you've never actually tested your apps in OOM situations or you would have discovered this for yourself. Try it! Once you fix all the bugs in your untested code, I think you'll be surprised at how few options you actually have.