My advice on implementing stuff in C:
My advice on implementing stuff in C:
Posted Oct 15, 2010 16:03 UTC (Fri) by Ed_L. (guest, #24287)In reply to: My advice on implementing stuff in C: by mjthayer
Parent article: Russell: On C Library Implementation
Its not just you, but for the sake of argument it may as well be :) If you feel you personally are a better, more productive programmer using C rather than C++, by all means use C. Aside from corner cases, its a subset :) :)
Me, I've been productive with C++ for over twenty years, and really like it. I'll grant there are more modern languages, but for HPC purposes I haven't found any more powerful, until I recently stumbled across D. (You know, the language C++ always wanted to be but was too rushed.) And that trip is too recent for me to draw a firm conclusion.
Some will ague that Java is just as good at HPC, and for them they are probably right. (Insert obligatory Fortran dereference here.) I also dabble in system programming, and just personally prefer one language that does it all. Others prefer to mix and match. And surely there must be places for Perl and its ilk -- provided they are kept brief and to the point.
"Although programmers dream of a small, simple languages, it seems when they wake up what they really want is more modelling power." -- Andrei Alexandrescu
Posted Oct 15, 2010 16:21 UTC (Fri)
by mjthayer (guest, #39183)
[Link] (34 responses)
I do now prefer to use C for that reason. But I still find C++ tantalisingly tempting, as it can do so many things that are just painful in C. I do know from experience though that it will come back to haunt me if I give in to the temptation. And I am experimenting to find ways to do those things more easily in C. The two that I miss most are automatic destruction of local objects (which is actually just a poor man's garbage collection) and STL containers.
Oh yes, add binary compatibility with other things to my list of complaints above; dvdeug's comment below is one example of the problem. That is something that has hurt me more often than I expected.
Posted Oct 15, 2010 20:25 UTC (Fri)
by mpr22 (subscriber, #60784)
[Link]
Posted Oct 16, 2010 10:18 UTC (Sat)
by paulj (subscriber, #341)
[Link] (32 responses)
Posted Oct 18, 2010 8:41 UTC (Mon)
by marcH (subscriber, #57642)
[Link] (4 responses)
Compiling to a lower-level yet still "human-writable" language is an interesting approach that can be successful to some extend. However it always has this major drawback: debugging & profiling becomes incredibly more difficult. It also gives a really hard time to fancy IDEs. All these need tight integration and the additional layer of indirection breaks that. So handing maintenance of average/poor quality code over to other developers becomes nearly impossible.
Posted Oct 18, 2010 9:09 UTC (Mon)
by mjthayer (guest, #39183)
[Link] (3 responses)
Without having looked at Vala, I don't see why this has to be the case. C itself is implemented as a pipeline, this would just add one stage onto the end. The main problem to solve that I can see is how to pass information down to the lower levels about what C code corresponds to which Vala code.
Posted Oct 18, 2010 9:28 UTC (Mon)
by cladisch (✭ supporter ✭, #50193)
[Link] (2 responses)
C has the #line directive for that (GCC doc); AFAIK Vala generates it when in debug mode.
Posted Oct 18, 2010 10:58 UTC (Mon)
by mjthayer (guest, #39183)
[Link]
Posted Oct 19, 2010 11:23 UTC (Tue)
by nix (subscriber, #2304)
[Link]
Posted Oct 18, 2010 9:14 UTC (Mon)
by mjthayer (guest, #39183)
[Link] (26 responses)
Posted Oct 18, 2010 18:01 UTC (Mon)
by paulj (subscriber, #341)
[Link]
Posted Oct 19, 2010 11:24 UTC (Tue)
by nix (subscriber, #2304)
[Link] (24 responses)
Posted Oct 19, 2010 15:19 UTC (Tue)
by mjthayer (guest, #39183)
[Link]
They are still definitely user space though. If you are careful, C++ can be used for driver or even kernel code (e.g. the TU-Dresden implementation of the L4 micro-kernel with its unfortunate name was implemented in C++). Perhaps GLib would be too with a bit of work on it, I haven't used it enough to know.
Posted Oct 21, 2010 2:35 UTC (Thu)
by wahern (subscriber, #37304)
[Link] (22 responses)
Geez.
This is why I never use Linux on multi-user systems.
Posted Oct 21, 2010 3:00 UTC (Thu)
by foom (subscriber, #14868)
[Link] (21 responses)
Just so long as pid 1 can deal with malloc failure, that's pretty much good enough: it can just respawn any other daemon that gets forcibly killed or aborts due to malloc failure.
Posted Oct 21, 2010 19:55 UTC (Thu)
by nix (subscriber, #2304)
[Link] (20 responses)
Posted Oct 21, 2010 20:15 UTC (Thu)
by mjthayer (guest, #39183)
[Link] (19 responses)
Isn't that the FSF's standard recommendation (/requirement)? I find the thought amusing that if you subdivide your application well into different processes and make sure that you set atexit() functions for those resources that won't be freed by the system that isn't so far away from throwing an exception in C++.
Posted Oct 21, 2010 22:01 UTC (Thu)
by nix (subscriber, #2304)
[Link] (18 responses)
Posted Oct 22, 2010 21:42 UTC (Fri)
by wahern (subscriber, #37304)
[Link] (17 responses)
The first thing I do on any of my server systems is to disable overcommit. Even w/ it disabled I believe the kernel will still overcommit in some places (fork, perhaps), but at least I don't need to worry about some broken application 'causing some other critical service to be terminated.
If an engineer can't handle malloc failure how can he be expected to handle any other myriad possible failure modes? Handling malloc failure is hardly any more difficult, if at all, than handling other types of failures (disk full, descriptor limit, shared memory segment limit, thread limit, invalid input, etc, etc, etc). With proper design all those errors should share the same failure path; if you can't handle one you probably aren't handling any of them properly.
Plus, it's a security nightmare. If the 2,001st client can cause adverse results to the other 2,000 clients... that's a fundamentally broken design. Yes, there are other issues (bandwidth, etc), but those are problems to be addressed, not justifications for skirking responsibility.
And of course, on embedded system's memory (RAM and swap) isn't the virtually limitless resource as on desktops or servers.
Bailing on malloc is categorically wrong for any daemon, and most user-interactive applications. Bailing on malloc failure really only makes sense for batch jobs, where a process is doing one thing, and so exiting the process is equivalent to signaling inability to complete that particular job. Once you start juggling multiple jobs internally, bailing on malloc failure is a bug, plain and simple.
Posted Oct 22, 2010 22:18 UTC (Fri)
by nix (subscriber, #2304)
[Link] (14 responses)
I don't know of any programs (other than certain network servers doing simple highly decoupled jobs, and sqlite, whose testing framework is astonishingly good) where malloc() failure is usefully handled. Even when they try, a memory allocation easily slips in there, and how often are those code paths tested? Oops, you die. From a brief inspection glibc has a number of places where it kills you on malloc() failure too (mostly due to trying to handle errors and failing), and a number of places where the error handling is there but is obviously leaky or leads to the internal state of things getting messed up. And if glibc can't get it right, who can? In practice this is not a problem because glibc also calls functions so can OOM-kill you just by doing that.
(And having one process doing only one job? That's called good design for the vast majority of Unix programs. Massive internal multithreading is a model you move to because you are *forced* to, and one consequence of it is indeed much worse consequences on malloc() failure.)
Even Apache calls malloc() here and there instead of using memory pools. Most of these handle errors by aborting (such as some MPM worker calls) or don't even check (pretty much all of the calls in the NT service-specific worker, but maybe NT malloc() never returns NULL, I dunno).
In an ideal world I would agree with you... but in practice handling all memory errors as gracefully as you suggest would result in our programs disappearing under a mass of almost-untestable massively bitrotten error-handling code. Better to isolate things into independently-failable units. (Not that anyone does that anyway, and with memory as cheap as it is now, I can't see anyone's handling of OOM improving in any non-safety-critical system for some time. Hell, I was at the local hospital a while back and their *MRI scanner* sprayed out-of-memory errors on the screen and needed restarting. Now *that* scared me...)
Posted Oct 23, 2010 1:14 UTC (Sat)
by wahern (subscriber, #37304)
[Link] (13 responses)
As for the stack, the solution there is easy, don't recurse. Any recursive algorithm can be re-written as an iterative algorithm. Of course, if you use a language that optimizes tail-calls then you're already set. C doesn't, and therefore writing recursive algorithms is a bad idea, and it's why it's quite uncommon in C code.
As for testing error paths: if somebody isn't testing error paths than they're not testing error paths. What difference does it matter whether they're not testing malloc failure or they're not testing invalid input? It's poor design; it creates buggy code. And if you use good design habits, like RAII (not just a C++ pattern), then the places for malloc failure to occur are well isolated. It's not a very good argument to point out that most engineers write crappy code. We all know this; we all do it ourselves; but it's ridiculous to make excuses for it. If you can't handle the responsibility, then don't write applications in C or for its typical domain. If I'm writing non-critical or throw-away code, I'll use Perl or something else. Why invest the effort in using a language with features--explicit memory management--that I'm not going to use?
Using a per-process context design is in many circumstances a solid choice (not for me because I write HP embedded network server software, though I do prefer processes instead of threads for concurrency, so I might have 2 processes per cpu each handling hundreds of connections). But here's another problem w/ default Linux--because of overcommit, it's not always--perhaps not even often--that the offending process gets killed; it's the next guy paging in a small amount of memory that gets killed. It's retarded. It's a security problem. Can you imagine your SSH session getting OOMd because someone was fuzzing your website? It happens.
Why make excuses for poor design?
Posted Oct 23, 2010 3:18 UTC (Sat)
by foom (subscriber, #14868)
[Link]
Actually, the OOM-killer tries *very* hard to not simply kill the next guy paging in a small amount of memory, but to determine what the real problem process is and kill that instead. It doesn't always find the correct culprit, but it often does, and at least it tends not to kill your ssh session.
Posted Oct 23, 2010 18:29 UTC (Sat)
by paulj (subscriber, #341)
[Link] (6 responses)
Nix isn't making excuses, he's pointing out reality. Which, sadly, is always far from perfect. A programme which is designed to cope with failure *despite* the suckiness of reality should do better than one that depends on perfection underneath it...
Posted Oct 23, 2010 19:39 UTC (Sat)
by wahern (subscriber, #37304)
[Link]
Posted Oct 24, 2010 15:17 UTC (Sun)
by nix (subscriber, #2304)
[Link] (4 responses)
The suggestion to avoid stack-OOM by converting recursive algorithms to iterative ones is just another example of this, because while deep recursion is more likely to stack-OOM than the function calls involved in an iterative algorithm, the latter will still happen now and then. The only way to avoid *that* is to do a deep recursion first, and then ensure that you never call functions further down in the call stack than you have already allocated, neither in your code nor in any library you may call. I know of no tools to make this painful maintenance burden less painful. So nobody at all armours against this case, either.
I think it *is* important to trap malloc() failure so that you can *log which malloc() failed* before you die (and that means your logging functions *do* have to be malloc()-failure-proof: I normally do this by having them take their allocations out of a separate, pre-mmap()ed emergency pool). Obviously this doesn't work if you are stack-OOMed, nor if the OOM-killer zaps you. Note that this *is* an argument against memory overcommit: that overcommit makes it harder to detect which of many allocations in a program is buggy and running away allocating unlimited storage. But 'we want to recover from malloc() failure' is not a good reason to not use overcommmitment, because very few programs even try, and of those that try, most are surely lethally buggy in this area in any case: and fixing this is completely impractical.
Regarding my examples above: glib always aborts on malloc() failure, so so do all programs that use it. glibc does not, but its attempts to handle malloc() failure are buggy and leaky at best, and of course it (like everything else) remains vulnerable to stack- or CoW-OOM.
Posted Oct 25, 2010 10:05 UTC (Mon)
by hppnq (guest, #14462)
[Link] (3 responses)
You would have to know in advance how deep you can recurse, or you should be able to handle SIGSEGV. The maximum stack size can be tuned through rlimits, and that should solve wahern's problem of some other process draining out all available memory. This problem is not the result of bad programming, but of bad systems management.
(That said, rlimits are horribly broken. Just add more memory. ;-)
Posted Oct 25, 2010 22:28 UTC (Mon)
by paulj (subscriber, #341)
[Link] (2 responses)
Posted Oct 25, 2010 22:36 UTC (Mon)
by nix (subscriber, #2304)
[Link] (1 responses)
Posted Oct 26, 2010 7:55 UTC (Tue)
by hppnq (guest, #14462)
[Link]
Posted Oct 25, 2010 11:04 UTC (Mon)
by mjthayer (guest, #39183)
[Link] (4 responses)
Just out of interest, are there really no simple ways (as nix suggested) to allocate a fixed-size stack at programme begin in Linux userland? I can't see any theoretical reasons why it should be a problem.
> And if you use good design habits, like RAII (not just a C++ pattern), then the places for malloc failure to occur are well isolated.
Again, I am interested in how you do RAII in C. I know the (in my opinion ugly and error-prone) goto way, and I could think of ways to do at run time what C++ does at compile time (doesn't have to be a bad thing, although more manual steps would be needed). Do you have any other insights?
Posted Oct 25, 2010 11:52 UTC (Mon)
by hppnq (guest, #14462)
[Link]
ld --stack or something similar?
Posted Oct 25, 2010 22:41 UTC (Mon)
by nix (subscriber, #2304)
[Link] (2 responses)
Posted Oct 26, 2010 8:06 UTC (Tue)
by mjthayer (guest, #39183)
[Link] (1 responses)
Right, roughly what I was thinking of. Thanks for the concrete pointers!
Posted Oct 26, 2010 8:18 UTC (Tue)
by mjthayer (guest, #39183)
[Link]
Except of course that there is no overriding need to use memory pools. You can also keep track of multiple allocations (possibly also with destructors) in some structure and free them all at one go when you are done. Freeing many allocations in one go rather than freeing each as soon as it is no longer needed might also be more efficient cache-wise.
Posted Oct 25, 2010 1:56 UTC (Mon)
by vonbrand (subscriber, #4458)
[Link]
No overcommit makes OOM kills much more likely (even in cases which would work fine otherwise). You've got your logic seriously backwards...
Posted Oct 25, 2010 16:10 UTC (Mon)
by bronson (subscriber, #4806)
[Link]
OK, let's say your interactive application has just received a malloc failure. What should it do? Display an error dialog? Bzzt, that takes memory. Free up some buffers? There's good chance that any memory you free will just get sucked up by a rogue process and your next malloc attempt will fail too. And the next one. And the next one. And be careful with your error-handling code paths because, if you cause more data to get paged in from disk (say, a page of string constants that are only accessed in OOM conditions), you're now in even deeper trouble.
Bailing out is about the only thing ANY process can reliably do. If you try to do anything more imaginative, you are almost guaranteed to get it wrong and make things worse.
The days of cooperative multitasking and deterministic memory behavior are long gone (or, more accurately, restricted to a tiny sliver of embedded environments that no general purpose toolchain would ever consider a primary target). And good riddance! Programming is so much nicer these days that, even though this seems heinous, I'd never want to go back.
I can virtually guarantee you've never actually tested your apps in OOM situations or you would have discovered this for yourself. Try it! Once you fix all the bugs in your untested code, I think you'll be surprised at how few options you actually have.
My advice on implementing stuff in C:
I gave up on C for recreational programming for one very simple reason:
It is impossible to write vector arithmetic in a civilized syntax in C.
My advice on implementing stuff in C:
My advice on implementing stuff in C:
My advice on implementing stuff in C:
My advice on implementing stuff in C:
> The main problem to solve that I can see is how to pass information down to the lower levels about what C code corresponds to which Vala code.
My advice on implementing stuff in C:
My advice on implementing stuff in C:
Sounds reasonable as long as they skip the pre-processor stage, otherwise things might get rather confused. I assume that their variables map one-to-one to C variables to simplify debugging.
My advice on implementing stuff in C:
My advice on implementing stuff in C:
I haven't looked at GLib that closely though. Is it used anywhere other than user space/desktop programming? If you are careful about what language features you use - and to disable exceptions! - C++ can be used very close to the bone (or iron or whatever).
My advice on implementing stuff in C:
My advice on implementing stuff in C:
My advice on implementing stuff in C:
My advice on implementing stuff in C:
My advice on implementing stuff in C:
My advice on implementing stuff in C:
My advice on implementing stuff in C:
My advice on implementing stuff in C:
My advice on implementing stuff in C:
My advice on implementing stuff in C:
My advice on implementing stuff in C:
My advice on implementing stuff in C:
Why make excuses for poor design?
My advice on implementing stuff in C:
My advice on implementing stuff in C:
My advice on implementing stuff in C:
My advice on implementing stuff in C:
The only way to avoid *that* [stack-OOM] is to do a deep recursion first, and then ensure that you never call functions further down in the call stack than you have already allocated, neither in your code nor in any library you may call.
My advice on implementing stuff in C:
My advice on implementing stuff in C:
The point is, you can't safely expand the stack by recursing deeply in order to prevent running out of stack.
My advice on implementing stuff in C:
My advice on implementing stuff in C:
My advice on implementing stuff in C:
Just out of interest, are there really no simple ways (as nix suggested) to allocate a fixed-size stack at programme begin in Linux userland?
My advice on implementing stuff in C:
My advice on implementing stuff in C:
My advice on implementing stuff in C:
My advice on implementing stuff in C:
My advice on implementing stuff in C: