Zeuthen: Writing a C library, part 1 [LWN.net]

Zeuthen: Writing a C library, part 1

Posted Jun 28, 2011 9:20 UTC (Tue) by gowen (guest, #23914) [Link] (6 responses)

Returning error codes, and per thread error flags are nothing new. It has been standard practice for ages. It works, and is not that complicated.

I never suggested it was overly complicated, I was disagreeing with the assertion that it was the *most* simple. Almost everything else you say I am in agreement with.

Good engineering says "don't call abort()"

Zeuthen: Writing a C library, part 1

Posted Jun 28, 2011 10:16 UTC (Tue) by gevaerts (subscriber, #21521) [Link] (5 responses)

Did the original post say that though? I didn't read "Proper memory management is most expected, and easiest implemented, in libraries" as "memory management is easier than anything else" but rather as "memory management in libraries is easier than anywhere else"

Zeuthen: Writing a C library, part 1

Posted Jun 28, 2011 12:34 UTC (Tue) by gowen (guest, #23914) [Link] (4 responses)

That's a good point. I think its possible I mis-parsed the original posters intent, in which case I withdraw my argument about "easiest". But I do think that thread-local-errno is a horrible hack around C's limitations.

Zeuthen: Writing a C library, part 1

Posted Jun 28, 2011 13:31 UTC (Tue) by cmccabe (guest, #60281) [Link] (3 responses)

> But I do think that thread-local-errno is a horrible hack
> around C's limitations.

Then you'll be happy to hear that thread-local errno has been deprecated for decades now. New functions that are added to POSIX generally return an error code indicating the error instead.

errno has nothing to do with "C's limitations" and everything to do with preserving compatibility with an older interface that isn't worth the effort to change.

Returning error codes is a great convention because you can flag them with __attribute__((warn_unused)). Then the programmer will get a warning from the compiler unless he checks the return code.

Zeuthen: Writing a C library, part 1

Posted Jun 29, 2011 7:16 UTC (Wed) by gowen (guest, #23914) [Link] (2 responses)

Returning error codes is a great convention because you can flag them with __attribute__((warn_unused)). Then the programmer will get a warning from the compiler unless he checks the return code.

Returning error codes is fine in certain circumstances (particularly for functions where the side-effects are the point). Sometimes, though, you want your functions to be functions in the lambda-calculus sense - you want to return *results*.

In general, you can't return both results *and* error codes. (As I said, for pointers you can return NULL, and for functions whose domain of valid results is limited in some sense [abs()], you can, but if you're returning anything other than a pointer, int-type or floating-point-type, you're basically hosed.

Zeuthen: Writing a C library, part 1

Posted Jul 7, 2011 0:22 UTC (Thu) by cmccabe (guest, #60281) [Link]

> In general, you can't return both results *and* error codes. (As I said,
> for pointers you can return NULL, and for functions whose domain of valid
> results is limited in some sense [abs()], you can, but if you're returning
> anything other than a pointer, int-type or floating-point-type, you're
> basically hosed.
>

If we're still talking about C/C++, then you can only ever return:
* a primitive (int, float, etc.)
* a pointer
* a struct

All of those have a natural 'none of the above' value. Integer types have 0 or a negative, floats and doubles have NaN, and pointers have NULL.

If you're returning a struct by value, then you're probably using C++, since C programmers rarely return an entire structure by value. The obvious thing to do is to either return an integer error code and take a reference to the thing to be modified, or use C++ exceptions. Either way, problem solved.

Being able to return multiple values at once is nice, but it's hardly the biggest challenge when using C or C++.

Zeuthen: Writing a C library, part 1

Posted Jul 7, 2011 12:08 UTC (Thu) by jwakely (subscriber, #60262) [Link]

> In general, you can't return both results *and* error codes.

std::future<double> squerrt(double x)
{
  std::promise<double> p;
  if (x < 0)
    p.set_exception(copy_exception(std::domain_error("negative")));
  else
    p.set_value(sqrt(x));
  return p.get_future();
}

int main()
{
  double i = squerrt(-1).get();  // boom
}

Zeuthen: Writing a C library, part 1

Posted Jun 28, 2011 9:58 UTC (Tue) by alexl (subscriber, #19068) [Link] (6 responses)

When you get an out-of-memory error in an overcommited state there is no way to free up resources, nor ask the user how to proceed (and anyway that would generally require allocations). What happens is that on out-of-memory the oom killer wakes up and kills a semi-random process, with you having no say in this at all.

For the case where you have resources that can be safely freed in an out of memory situation the right thing to do is not OOM from allocation at all, but rather have some kind of signal for memory pressure when memory is tight (but not full). Then apps could handle this by cleaning up caches and other resources. That way you will not run into the OOM killer problem.

There is one kind of allocation failure that is not oom-killer related though, and thats where a single allocation is larger than the physical memory or the mappable region. This can happen for instance if you're reading in some random user file (say an image) and it happens to decode to a 8 gigabyte array (maybe because its an exploit, or just large). In these kinds of situation I think it makes sense to check for allocation failures, and glib does in fact have a call for that (g_try_malloc).

However, in most cases (like allocating internal know sized objects) I'm purely in the abort-on-oom school, since adding all the complexity (both to your code and to users of your library) means more bugs, and doesn't help anyway (since oom doesn't get reported, the kernel just kills some process instead). Of course, as david said in the article, there are of course exceptional situations, like core system software (init, dbus, etc) where we can't just have it die and where the complexity is worth it.

Zeuthen: Writing a C library, part 1

Posted Jun 28, 2011 17:48 UTC (Tue) by xtifr (guest, #143) [Link] (5 responses)

> the oom killer wakes up

Assuming A) you have an OOM killer, and B) it hasn't been thoroughly disabled. If you're writing a _general purpose_ library, neither is really a valid assumption, though both remain possibilities you should remain aware of. Aside from that quibble, I basically agree with you, but I'll note that writing libraries for embedded systems comes with a whole additional set of complications of its own. (Basically, my advice would be to not try unless you or someone on your team has some expertise with embedded systems.)

Zeuthen: Writing a C library, part 1

Posted Jun 30, 2011 15:23 UTC (Thu) by nix (subscriber, #2304) [Link] (4 responses)

I'm not sure you *can* completely disable overcommit. Robust Unix programs theoretically have to assume that they might get killed at any instant, either due to OOM in something like the stack (which obviously cannot be trapped), or due to a user sending it a kill signal.

Alas the latter is rare (and misbehaviour might be expected if you kill something maintaining persistent state while it is updating that state), and the former is so rare and so hard to cater to that simply nobody ever bothers. Sufficiently Paranoid Programs could avoid the stack-OOM by doing a massive deep recursion early on, to balloon their stack out to the maximum they might need. A few programs do this. You can avoid being user-killed by being installed setuid or setgid, but this has other disadvantages and is basically never done (at least not solely for this reason).

This is probably a fault of some kind in POSIX, but I have not even the faintest glimmerings of a clue as to how to fix it.

Zeuthen: Writing a C library, part 1

Posted Jul 1, 2011 9:46 UTC (Fri) by dgm (subscriber, #49227) [Link] (3 responses)

I believe that what really paranoid programs have to do is keep critical state in non-volatile memory (a disk, a remote machine, etc), and do everything possible to ensure that it's always consistent. That way it doesn't matter if the program goes away because of a programming error, a kill signal or the power going down in the middle of a system call.

Zeuthen: Writing a C library, part 1

Posted Jul 1, 2011 13:40 UTC (Fri) by nix (subscriber, #2304) [Link] (2 responses)

Yes, probably. And then we can get into fsync() flamewars instead! Isn't POSIX fun?

Zeuthen: Writing a C library, part 1

Posted Jul 3, 2011 23:09 UTC (Sun) by dgm (subscriber, #49227) [Link] (1 responses)

We can probably shortcircuit the flamewar by using a relational database. Oh, noes! PostgreSQL vs. MySQL anyone? ;-)

Zeuthen: Writing a C library, part 1

Posted Jul 3, 2011 23:40 UTC (Sun) by nix (subscriber, #2304) [Link]

And then we can write a nice high-performance FUSE filesystem on top of the relational database! And then we can run MySQL on top of that! (And then we can run the FUSE filesystem atop that, and run a virtual machine inside that filesystem. And then we have a nice room heater.)

Zeuthen: Writing a C library, part 1

Posted Jun 28, 2011 9:59 UTC (Tue) by dlang (guest, #313) [Link] (11 responses)

with overcommit, you don't get an error when you allocate memory, you get an OOM error later when you change the value of a variable

Zeuthen: Writing a C library, part 1

Posted Jun 28, 2011 10:58 UTC (Tue) by dgm (subscriber, #49227) [Link] (10 responses)

In my machine, malloc() does fail with /proc/sys/vm/overcommit_memory changed to 2 (after allocating just 7 MiB). Also, not all systems are Linux.

Zeuthen: Writing a C library, part 1

Posted Jun 28, 2011 11:11 UTC (Tue) by alexl (subscriber, #19068) [Link] (7 responses)

Its true that not all systems do overcommit, and even on linux you can disable it. However, in practice, for most "desktop" apps running on most OSes memory will be overcommited. Adding large amounts of complexity that will only run in rarely used configurations is largely a waste of time.

(Of course, as said before, there are special cases where its needed, but not in general).

Zeuthen: Writing a C library, part 1

Posted Jun 29, 2011 12:54 UTC (Wed) by mjthayer (guest, #39183) [Link] (5 responses)

> Its true that not all systems do overcommit, and even on linux you can disable it. However, in practice, for most "desktop" apps running on most OSes memory will be overcommited.

At least on Linux overcommitting is a choice the user makes (or at least they can choose not too). And by overcommitting they are saying in a certain sense that they don't care too much about OOM. So I do see a certain sense in targeting the non-overcommitted situation and ignoring overcommit.

Slightly off-topic, but what is overcommit good for apart from forking (or more general copy-on-write)?

Zeuthen: Writing a C library, part 1

Posted Jun 29, 2011 18:29 UTC (Wed) by dlang (guest, #313) [Link] (4 responses)

overcommit is primarily good for copy-on-write situations, but can also help if an application declares a large structure and then doesn't use it (or at least doesn't use it before other apps exit and free up the real memory before it's needed)

it also allows you to deal with cases where a library/binary gets used, but not all of it is ever used. Linux will only read the pages from disk into memory that are actually needed. without overcommit space for the entire binary needs to be allocated, with overcommit it doesn't matter.y

the thing is that the COW situation is extremely common, so in practice overcommit works very well.

Zeuthen: Writing a C library, part 1

Posted Jun 29, 2011 21:59 UTC (Wed) by mjthayer (guest, #39183) [Link] (3 responses)

> it also allows you to deal with cases where a library/binary gets used, but not all of it is ever used.

Is this quite the same thing? Those pages are all backed up by disk storage - assuming you meant the binary text - so they can be ejected from physical RAM again whenever needed. Thrashing instead of OOM-ing...

I suppose what I am wondering is that, given that there are such heavy handed mechanisms for dealing with OOM (the OOM monster) whether it might make sense to have a setting to only allow overcommitting for processes which have just forked, which are probably the main users of really overcommitted memory which they will probably never need. Then they could be the only ones liably to be killed on OOM and other processes could live more predictably.

Zeuthen: Writing a C library, part 1

Posted Jun 29, 2011 23:16 UTC (Wed) by dlang (guest, #313) [Link] (2 responses)

I am not sure if (with overcommit disabled) you have to allocate memory space for the entire binary or not.

the problem with your suggestion (only allow overcommit for processes that just forked), is that I don't see that working. you have no way of knowing if the process is going to exec something (relatively) soon, or if it's apache that forked a child that is going to stay running for the next year.

And I don't think it helps anyway.

the problem scenario is

large process A forks (creating A`), almost all it's memory is COW

process B allocates some memory, but doesn't touch it yet

process A` changes some memory (breaking COW), requiring real memory to hold the result.

process B then tries to use the memory it had previously allocated and finds that it is not available.

if you could somehow define 'forked recently' in a way that could be cheap enough, then you could possibly do it.

All this said, I really don't see many cases in practice where disabling overcommit will really help.

yes, you avoid the OOM killer kicking in and instead the process that tried to allocate memory dies instead.

but the idea that (in the general case), this will make your system more predictable is not something I believe. you have no way of knowing _which_ process (including system daemons) will need to allocate more memory at the instant that you are out, so you really don't know which process will die anyway. (and no, in general processes and libraries don't do anything except die when they run out of memory).

in some ways, it would make it easier to DOS a system, just have your memory hog _not_ die if a malloc fails, instead sleep and try again. eventually something else in the system will need memory and die, then you can repeat the process. you won't even be able to ssh in to the box to fix it, as you won't be able to spawn/fork a new process (as that will require memory allocation)

there's also the problem that without overcommit you need to have significantly more swap enabled in the system (since you have to have enough ram+swap to handle the peak theoretical memory use from large processes doing a fork+exec), and with the increasing gap between memory speed and disk speed, your system will dive into swap to the point of being useless (including the inability to login to it) before you start getting memory failures. With overcommit you can have a small amount of swap (including none) and instead count on the OOM killer + watchdog timers to bring the box down (and possibly even reboot it to recover) rather than having the box 'up' but unable to provide service.

Zeuthen: Writing a C library, part 1

Posted Jun 30, 2011 7:19 UTC (Thu) by mjthayer (guest, #39183) [Link]

> large process A forks (creating A`), almost all it's memory is COW
>
> process B allocates some memory, but doesn't touch it yet
>
> process A` changes some memory (breaking COW), requiring real memory to hold the result.
>
>process B then tries to use the memory it had previously allocated and finds that it is not available.
That I do not see as a problem - when process B allocates the memory it is really allocated, and if A tries to use its COW memory later it will just not be there.

> if you could somehow define 'forked recently' in a way that could be cheap enough, then you could possibly do it.

That I do see as more of a problem. One could have some background thread gradually actually allocating the process's memory, but that is replacing one piece of complexity by another.

> but the idea that (in the general case), this will make your system more predictable is not something I believe. you have no way of knowing _which_ process (including system daemons) will need to allocate more memory at the instant that you are out, so you really don't know which process will die anyway. (and no, in general processes and libraries don't do anything except die when they run out of memory).

True, it doesn't change the fundamental problem that you need enough memory for whatever you want to do.

> in some ways, it would make it easier to DOS a system, just have your memory hog _not_ die if a malloc fails, instead sleep and try again. eventually something else in the system will need memory and die

I thought that ulimits were supposed to solve that. Do they work as intended these days?

> there's also the problem that without overcommit you need to have significantly more swap enabled in the system (since you have to have enough ram+swap to handle the peak theoretical memory use from large processes doing a fork+exec)

The idea was to disable overcommit except for forking, so that shouldn't be such an issue. Thinking about it one could also freeze the overcommitted process if it tries to actually use its memory and it isn't there (making sure there is a bit of memory left over for doing emergency process surgery).

Zeuthen: Writing a C library, part 1

Posted Jun 30, 2011 12:55 UTC (Thu) by nix (subscriber, #2304) [Link]

I am not sure if (with overcommit disabled) you have to allocate memory space for the entire binary or not.

You don't. Overcommit does not apply to read-only file-backed regions, because they can be dropped at any time without harm.

Zeuthen: Writing a C library, part 1

Posted Jul 8, 2011 1:40 UTC (Fri) by kabloom (guest, #59417) [Link]

Does Windows overcommit? Windows programs don't fork the way Linux programs do, so I would guess that overcommitting memory is much less of a problem on Windows.

Zeuthen: Writing a C library, part 1

Posted Jun 28, 2011 20:00 UTC (Tue) by dlang (guest, #313) [Link] (1 responses)

yes, you can get errors with malloc when you have overcommit enabled.

however the normal type of OOM problem you have when overcommit is turned on doesn't happen when you do a malloc, but instead when you attempt to write to a page that was shared and now cannot be (requiring additional memory pages)

Zeuthen: Writing a C library, part 1

Posted Jul 8, 2011 1:41 UTC (Fri) by kabloom (guest, #59417) [Link]

Yes. ulimit is one way to get allocation errors even with overcommit enabled.