LWN.net Logo

I think it's pretty clear, actually...

I think it's pretty clear, actually...

Posted Nov 2, 2011 21:04 UTC (Wed) by khim (subscriber, #9252)
In reply to: libabc: a demonstration library for kernel developers by quotemstr
Parent article: libabc: a demonstration library for kernel developers

This piece of advice is confusing. I think the author meant to make libraries thread-agnostic: don't bend over backwards to accommodate access to the same data from multiple threads, but don't unnecessarily couple different pieces of data either.

The advice is to leave threading issues to the higher-level libraries or to the program itself but to be ready to be called from different threads simultaneously for different datasets. It's explained quite detailed there.

Great advice. We can just use posix_spawn instead: err, wait. How do I call this function under Linux?

Easy: you don't.

The author also should be more specific about pthread_atfork's alleged brokenness.

It's hard to be more specific in a README file. No, really. It's README file, not a PhD thesis. If you'll actually follow the suggested advice and read the POSIX man page you'll see that half of it is dedicated to the detailed explanation for why pthread_atfork should never be used.

Actually, there is a better way: #pragma once (http://en.wikipedia.org/wiki/Pragma_once). It's shorter than traditional header guards (one line versus three), and it eliminates the risk of name collisions.

Sadly it guarantees collisions when your library is embedded in other, bigger, library. #pragma once can be good choice for the application, but it's not acceptable for a library.

Doing work out-of-process gives robustness (and sometimes security) guarantees that are just not possible with calls into libraries.

You know, I think author of PulseAudio will agree with you here - it uses daemon for a reason.

But since there are no way to create "ephemeral context" (no fork/exec, remember) the only way to do that is to create separate daemon which is accessible over some RPC mechanism (dbus, for example).

> separate 'mechanism' from 'policy'

I don't think this phrase means what the author thinks it means.

I think it means exactly what it means. "Price" of a new process may vary wildly in different contexts. If the library creates such processes itself that it embeds the process creation policy - and this is just wrong. Think Android: it's process manager keeps processes around if there are enough resources and kills them if it's under memory or CPU pressure. Library-created processes just mess everything up in such a situation.


(Log in to post comments)

I think it's pretty clear, actually...

Posted Nov 2, 2011 21:08 UTC (Wed) by quotemstr (subscriber, #45331) [Link]

> Sadly it guarantees collisions when your library is embedded in other, bigger, library.

How so?

Simple

Posted Nov 2, 2011 21:31 UTC (Wed) by khim (subscriber, #9252) [Link]

When include files are copied around they sometimes treated as the same from "#pragma once" POV (if copying process keeps the timestamps) and sometimes as different (if you put them in GIT and pull back).

Thus "#pragma once" is great way to create unreproducible build failures. With explicit include guard you sometimes trigger the GCC optimization (GCC does not reread file with include guard it it can understand that it's the same file) and sometimes it fails and GCC actually loads and parses file again - but it only affects compilation speed, never correctness.

Simple

Posted Nov 2, 2011 21:36 UTC (Wed) by quotemstr (subscriber, #45331) [Link]

> When include files are copied around they sometimes treated as the same from "#pragma once" POV (if copying process keeps the timestamps) and sometimes as different (if you put them in GIT and pull back).

I've never seen this behavior. Any decent implementation of #pragma once should not flag two files with different contents as identical. Very old compilers were sometimes confused by various links of link, but this issue hasn't cropped up in a very long time.

Please provide a pointer to a bug report or at a set of setps to demonstrate the behavior you describe.

Hmm... Very simple test...

Posted Nov 3, 2011 13:20 UTC (Thu) by khim (subscriber, #9252) [Link]

$ mkdir test1
$ mkdir test2
$ echo '#pragma once' > test1/test.h
$ echo 'abc' >> test1/test.h
$ cp -ai test1/test.h test2/test.h
$ echo '#include "test1/test.h"' > test.c
$ echo '#include "test2/test.h"' >> test.c
$ gcc -E test.c
# 1 "test.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "test.c"
# 1 "test1/test.h" 1

abc
# 2 "test.c" 2
$ touch test2/test.h
$ gcc -E test.c
# 1 "test.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "test.c"
# 1 "test1/test.h" 1

abc
# 2 "test.c" 2
# 1 "test2/test.h" 1

abc
# 2 "test.c" 2
$ gcc --version
gcc (Ubuntu 4.4.3-4ubuntu5) 4.4.3
Copyright (C) 2009 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

This is just a preprocessor test, but compiler does the same thing.

Note that usually we do want this behavior (these files are identical - they come from the same source, after all) - and usually it works, but sometimes when you do complex manipulations (git in my case, but of course it's not the only possibility) everything blows up.

There is always a well-known solution to every human problem--neat, plausible, and wrong.

Well, "#pragma once" is such a solution - don't use it.

Simple

Posted Nov 3, 2011 0:56 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

We have a project with 20 identically named .h files (don't ask). #pragma once works just fine.

I think it's pretty clear, actually...

Posted Nov 2, 2011 21:45 UTC (Wed) by gus3 (guest, #61103) [Link]

No, really. It's README file, not a PhD thesis.
As a demonstration of "do this, not that," the why's and therefore's are entirely appropriate. Sure, the authors state that "even the POSIX standard admits it's broken," but a URL would be nice.

I think it's pretty clear, actually...

Posted Nov 2, 2011 21:49 UTC (Wed) by quotemstr (subscriber, #45331) [Link]

> half of it is dedicated to the detailed explanation for why pthread_atfork should never be used.

No it isn't. Did you read the linked manpage? Its lengthy rationale section explains why one *would* want to use pthread_atfork, and why a bare fork (i.e. *not* using pthread_atfork) is discouraged.

Also, most of these criticisms don't apply for fork-exec: who cares whether a mutex is held in a child when all that child will ever do is exec?

I think it's pretty clear, actually...

Posted Nov 2, 2011 22:33 UTC (Wed) by gus3 (guest, #61103) [Link]

The pthread_atfork() page (check the link I provide in my earlier comment) does list in the "Rationale" section several reasons why using the function is a dicey proposition.

I think it's pretty clear, actually...

Posted Nov 3, 2011 3:54 UTC (Thu) by RCL (guest, #63264) [Link]

> does list in the "Rationale" section several reasons why using the function is a dicey proposition

Where exactly?

It describes problems with fork() and "solution" of using fork() then exec().

Then it proceeds to describe pthread_atfork() as a means to resolve the problem.

Then it describes an example usage of pthread_atfork().

In the last two lines it describes the order of registering atfork() handlers.

*Nowhere* in the document it warns against using pthread_atfork() nor acknowledges its 'brokenness'.

I think it's pretty clear, actually...

Posted Nov 6, 2011 22:18 UTC (Sun) by foom (subscriber, #14868) [Link]

Well, in my experience (writing private software), every time someone has wanted to use pthread_atfork, I've recommended that they not. For one very simple reason: it does not distinguish between the common activity of spawning a process (fork/exec) and the relatively rare activity of forking and keeping both halves.

So, for example, you might shutdown some auxiliary threads in a pthread_atfork prefork handler, to ensure that the threads aren't in the middle of corrupting your library's state when the child process wants to call into your library. But, that's entirely unnecessary work if the next action was going to be exec! It just causes fork/exec to be slower, for no good reason.

Instead, we use an explicit teardown function that you can call if you like, before non-exec forks.

Actually fork does THAT for you...

Posted Nov 7, 2011 17:32 UTC (Mon) by khim (subscriber, #9252) [Link]

So, for example, you might shutdown some auxiliary threads in a pthread_atfork prefork handler, to ensure that the threads aren't in the middle of corrupting your library's state when the child process wants to call into your library.

You don't need to do that. After fork just one thread survives. But you do need to either restart "zombie" threads or free fresources assigned to them. And to do that correctly different libraries must cooperate - and it's not clear "how". If the whole machinery described in pthread_atfork does not look like something designed to give you countless problems then I'm not sure you must write libraries at all.

Actually fork does THAT for you...

Posted Nov 15, 2011 3:39 UTC (Tue) by foom (subscriber, #14868) [Link]

> You don't need to do that. After fork just one thread survives.

Sure, but cleaning up after whatever the other thread was doing at the instant of the fork is generally impossible. You either need to grab a lock (or similar) in your atfork prefork handler to force the other thread into a known quiescent state, or shut it down. But, you probably don't really want to do either one before a fork-exec, it's just a waste of time.

> If the whole machinery described in pthread_atfork does not look like something designed to give you countless problems then I'm not sure you must write libraries at all.

I think the only thing pthread_atfork is *really* useful for is to ensure that libc's malloc() will keep working after fork().

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds