Zeuthen: Writing a C library, part 1
Zeuthen: Writing a C library, part 1
Posted Jun 28, 2011 9:09 UTC (Tue) by dgm (subscriber, #49227)In reply to: Zeuthen: Writing a C library, part 1 by gowen
Parent article: Zeuthen: Writing a C library, part 1
Yes, overcommit is a problem. When (if) you get a out-of-memory error, it means the system is pretty ill, possibly dying. The proper solution in that case varies from program to program, and even within a program. Sometimes you should free up resources, sleep a while, and try again. Sometimes aborting is the right thing to do. And yet sometimes you should ask the user how to proceed. That's the reason why fixing this policy in a library is bad design.
Posted Jun 28, 2011 9:20 UTC (Tue)
by gowen (guest, #23914)
[Link] (6 responses)
Good engineering says "don't call abort()"
Posted Jun 28, 2011 10:16 UTC (Tue)
by gevaerts (subscriber, #21521)
[Link] (5 responses)
Posted Jun 28, 2011 12:34 UTC (Tue)
by gowen (guest, #23914)
[Link] (4 responses)
Posted Jun 28, 2011 13:31 UTC (Tue)
by cmccabe (guest, #60281)
[Link] (3 responses)
Then you'll be happy to hear that thread-local errno has been deprecated for decades now. New functions that are added to POSIX generally return an error code indicating the error instead.
errno has nothing to do with "C's limitations" and everything to do with preserving compatibility with an older interface that isn't worth the effort to change.
Returning error codes is a great convention because you can flag them with __attribute__((warn_unused)). Then the programmer will get a warning from the compiler unless he checks the return code.
Posted Jun 29, 2011 7:16 UTC (Wed)
by gowen (guest, #23914)
[Link] (2 responses)
In general, you can't return both results *and* error codes. (As I said, for pointers you can return NULL, and for functions whose domain of valid results is limited in some sense [abs()], you can, but if you're returning anything other than a pointer, int-type or floating-point-type, you're basically hosed.
Posted Jul 7, 2011 0:22 UTC (Thu)
by cmccabe (guest, #60281)
[Link]
If we're still talking about C/C++, then you can only ever return:
All of those have a natural 'none of the above' value. Integer types have 0 or a negative, floats and doubles have NaN, and pointers have NULL.
If you're returning a struct by value, then you're probably using C++, since C programmers rarely return an entire structure by value. The obvious thing to do is to either return an integer error code and take a reference to the thing to be modified, or use C++ exceptions. Either way, problem solved.
Being able to return multiple values at once is nice, but it's hardly the biggest challenge when using C or C++.
Posted Jul 7, 2011 12:08 UTC (Thu)
by jwakely (subscriber, #60262)
[Link]
Posted Jun 28, 2011 9:58 UTC (Tue)
by alexl (subscriber, #19068)
[Link] (6 responses)
For the case where you have resources that can be safely freed in an out of memory situation the right thing to do is not OOM from allocation at all, but rather have some kind of signal for memory pressure when memory is tight (but not full). Then apps could handle this by cleaning up caches and other resources. That way you will not run into the OOM killer problem.
There is one kind of allocation failure that is not oom-killer related though, and thats where a single allocation is larger than the physical memory or the mappable region. This can happen for instance if you're reading in some random user file (say an image) and it happens to decode to a 8 gigabyte array (maybe because its an exploit, or just large). In these kinds of situation I think it makes sense to check for allocation failures, and glib does in fact have a call for that (g_try_malloc).
However, in most cases (like allocating internal know sized objects) I'm purely in the abort-on-oom school, since adding all the complexity (both to your code and to users of your library) means more bugs, and doesn't help anyway (since oom doesn't get reported, the kernel just kills some process instead). Of course, as david said in the article, there are of course exceptional situations, like core system software (init, dbus, etc) where we can't just have it die and where the complexity is worth it.
Posted Jun 28, 2011 17:48 UTC (Tue)
by xtifr (guest, #143)
[Link] (5 responses)
Assuming A) you have an OOM killer, and B) it hasn't been thoroughly disabled. If you're writing a _general purpose_ library, neither is really a valid assumption, though both remain possibilities you should remain aware of. Aside from that quibble, I basically agree with you, but I'll note that writing libraries for embedded systems comes with a whole additional set of complications of its own. (Basically, my advice would be to not try unless you or someone on your team has some expertise with embedded systems.)
Posted Jun 30, 2011 15:23 UTC (Thu)
by nix (subscriber, #2304)
[Link] (4 responses)
Alas the latter is rare (and misbehaviour might be expected if you kill something maintaining persistent state while it is updating that state), and the former is so rare and so hard to cater to that simply nobody ever bothers. Sufficiently Paranoid Programs could avoid the stack-OOM by doing a massive deep recursion early on, to balloon their stack out to the maximum they might need. A few programs do this. You can avoid being user-killed by being installed setuid or setgid, but this has other disadvantages and is basically never done (at least not solely for this reason).
This is probably a fault of some kind in POSIX, but I have not even the faintest glimmerings of a clue as to how to fix it.
Posted Jul 1, 2011 9:46 UTC (Fri)
by dgm (subscriber, #49227)
[Link] (3 responses)
Posted Jul 1, 2011 13:40 UTC (Fri)
by nix (subscriber, #2304)
[Link] (2 responses)
Posted Jul 3, 2011 23:09 UTC (Sun)
by dgm (subscriber, #49227)
[Link] (1 responses)
Posted Jul 3, 2011 23:40 UTC (Sun)
by nix (subscriber, #2304)
[Link]
Posted Jun 28, 2011 9:59 UTC (Tue)
by dlang (guest, #313)
[Link] (11 responses)
Posted Jun 28, 2011 10:58 UTC (Tue)
by dgm (subscriber, #49227)
[Link] (10 responses)
Posted Jun 28, 2011 11:11 UTC (Tue)
by alexl (subscriber, #19068)
[Link] (7 responses)
(Of course, as said before, there are special cases where its needed, but not in general).
Posted Jun 29, 2011 12:54 UTC (Wed)
by mjthayer (guest, #39183)
[Link] (5 responses)
At least on Linux overcommitting is a choice the user makes (or at least they can choose not too). And by overcommitting they are saying in a certain sense that they don't care too much about OOM. So I do see a certain sense in targeting the non-overcommitted situation and ignoring overcommit.
Slightly off-topic, but what is overcommit good for apart from forking (or more general copy-on-write)?
Posted Jun 29, 2011 18:29 UTC (Wed)
by dlang (guest, #313)
[Link] (4 responses)
it also allows you to deal with cases where a library/binary gets used, but not all of it is ever used. Linux will only read the pages from disk into memory that are actually needed. without overcommit space for the entire binary needs to be allocated, with overcommit it doesn't matter.y
the thing is that the COW situation is extremely common, so in practice overcommit works very well.
Posted Jun 29, 2011 21:59 UTC (Wed)
by mjthayer (guest, #39183)
[Link] (3 responses)
Is this quite the same thing? Those pages are all backed up by disk storage - assuming you meant the binary text - so they can be ejected from physical RAM again whenever needed. Thrashing instead of OOM-ing...
I suppose what I am wondering is that, given that there are such heavy handed mechanisms for dealing with OOM (the OOM monster) whether it might make sense to have a setting to only allow overcommitting for processes which have just forked, which are probably the main users of really overcommitted memory which they will probably never need. Then they could be the only ones liably to be killed on OOM and other processes could live more predictably.
Posted Jun 29, 2011 23:16 UTC (Wed)
by dlang (guest, #313)
[Link] (2 responses)
the problem with your suggestion (only allow overcommit for processes that just forked), is that I don't see that working. you have no way of knowing if the process is going to exec something (relatively) soon, or if it's apache that forked a child that is going to stay running for the next year.
And I don't think it helps anyway.
the problem scenario is
large process A forks (creating A`), almost all it's memory is COW
process B allocates some memory, but doesn't touch it yet
process A` changes some memory (breaking COW), requiring real memory to hold the result.
process B then tries to use the memory it had previously allocated and finds that it is not available.
if you could somehow define 'forked recently' in a way that could be cheap enough, then you could possibly do it.
All this said, I really don't see many cases in practice where disabling overcommit will really help.
yes, you avoid the OOM killer kicking in and instead the process that tried to allocate memory dies instead.
but the idea that (in the general case), this will make your system more predictable is not something I believe. you have no way of knowing _which_ process (including system daemons) will need to allocate more memory at the instant that you are out, so you really don't know which process will die anyway. (and no, in general processes and libraries don't do anything except die when they run out of memory).
in some ways, it would make it easier to DOS a system, just have your memory hog _not_ die if a malloc fails, instead sleep and try again. eventually something else in the system will need memory and die, then you can repeat the process. you won't even be able to ssh in to the box to fix it, as you won't be able to spawn/fork a new process (as that will require memory allocation)
there's also the problem that without overcommit you need to have significantly more swap enabled in the system (since you have to have enough ram+swap to handle the peak theoretical memory use from large processes doing a fork+exec), and with the increasing gap between memory speed and disk speed, your system will dive into swap to the point of being useless (including the inability to login to it) before you start getting memory failures. With overcommit you can have a small amount of swap (including none) and instead count on the OOM killer + watchdog timers to bring the box down (and possibly even reboot it to recover) rather than having the box 'up' but unable to provide service.
Posted Jun 30, 2011 7:19 UTC (Thu)
by mjthayer (guest, #39183)
[Link]
> if you could somehow define 'forked recently' in a way that could be cheap enough, then you could possibly do it.
That I do see as more of a problem. One could have some background thread gradually actually allocating the process's memory, but that is replacing one piece of complexity by another.
> but the idea that (in the general case), this will make your system more predictable is not something I believe. you have no way of knowing _which_ process (including system daemons) will need to allocate more memory at the instant that you are out, so you really don't know which process will die anyway. (and no, in general processes and libraries don't do anything except die when they run out of memory).
True, it doesn't change the fundamental problem that you need enough memory for whatever you want to do.
> in some ways, it would make it easier to DOS a system, just have your memory hog _not_ die if a malloc fails, instead sleep and try again. eventually something else in the system will need memory and die
I thought that ulimits were supposed to solve that. Do they work as intended these days?
> there's also the problem that without overcommit you need to have significantly more swap enabled in the system (since you have to have enough ram+swap to handle the peak theoretical memory use from large processes doing a fork+exec)
The idea was to disable overcommit except for forking, so that shouldn't be such an issue. Thinking about it one could also freeze the overcommitted process if it tries to actually use its memory and it isn't there (making sure there is a bit of memory left over for doing emergency process surgery).
Posted Jun 30, 2011 12:55 UTC (Thu)
by nix (subscriber, #2304)
[Link]
Posted Jul 8, 2011 1:40 UTC (Fri)
by kabloom (guest, #59417)
[Link]
Posted Jun 28, 2011 20:00 UTC (Tue)
by dlang (guest, #313)
[Link] (1 responses)
however the normal type of OOM problem you have when overcommit is turned on doesn't happen when you do a malloc, but instead when you attempt to write to a page that was shared and now cannot be (requiring additional memory pages)
Posted Jul 8, 2011 1:41 UTC (Fri)
by kabloom (guest, #59417)
[Link]
Zeuthen: Writing a C library, part 1
Returning error codes, and per thread error flags are nothing new. It has been standard practice for ages. It works, and is not that complicated.
I never suggested it was overly complicated, I was disagreeing with the assertion that it was the *most* simple. Almost everything else you say I am in agreement with.
Did the original post say that though?
I didn't read "Proper memory management is most expected, and easiest implemented, in libraries" as "memory management is easier than anything else" but rather as "memory management in libraries is easier than anywhere else"
Zeuthen: Writing a C library, part 1
Zeuthen: Writing a C library, part 1
Zeuthen: Writing a C library, part 1
> around C's limitations.
Zeuthen: Writing a C library, part 1
Returning error codes is a great convention because you can flag them with __attribute__((warn_unused)). Then the programmer will get a warning from the compiler unless he checks the return code.
Returning error codes is fine in certain circumstances (particularly for functions where the side-effects are the point). Sometimes, though, you want your functions to be functions in the lambda-calculus sense - you want to return *results*.Zeuthen: Writing a C library, part 1
> for pointers you can return NULL, and for functions whose domain of valid
> results is limited in some sense [abs()], you can, but if you're returning
> anything other than a pointer, int-type or floating-point-type, you're
> basically hosed.
>
* a primitive (int, float, etc.)
* a pointer
* a struct
> In general, you can't return both results *and* error codes.
Zeuthen: Writing a C library, part 1
std::future<double> squerrt(double x)
{
std::promise<double> p;
if (x < 0)
p.set_exception(copy_exception(std::domain_error("negative")));
else
p.set_value(sqrt(x));
return p.get_future();
}
int main()
{
double i = squerrt(-1).get(); // boom
}
Zeuthen: Writing a C library, part 1
Zeuthen: Writing a C library, part 1
Zeuthen: Writing a C library, part 1
Zeuthen: Writing a C library, part 1
Zeuthen: Writing a C library, part 1
Zeuthen: Writing a C library, part 1
Zeuthen: Writing a C library, part 1
Zeuthen: Writing a C library, part 1
Zeuthen: Writing a C library, part 1
Zeuthen: Writing a C library, part 1
Zeuthen: Writing a C library, part 1
Zeuthen: Writing a C library, part 1
Zeuthen: Writing a C library, part 1
Zeuthen: Writing a C library, part 1
Zeuthen: Writing a C library, part 1
>
> process B allocates some memory, but doesn't touch it yet
>
> process A` changes some memory (breaking COW), requiring real memory to hold the result.
>
>process B then tries to use the memory it had previously allocated and finds that it is not available.
That I do not see as a problem - when process B allocates the memory it is really allocated, and if A tries to use its COW memory later it will just not be there.
Zeuthen: Writing a C library, part 1
I am not sure if (with overcommit disabled) you have to allocate memory space for the entire binary or not.
You don't. Overcommit does not apply to read-only file-backed regions, because they can be dropped at any time without harm.
Zeuthen: Writing a C library, part 1
Zeuthen: Writing a C library, part 1
Zeuthen: Writing a C library, part 1