glibc thread races
glibc thread races
Posted Aug 17, 2015 19:57 UTC (Mon) by wahern (subscriber, #37304)Parent article: Glibc 2.22 released
One of the biggest ongoing problems with glibc is its threading support. If you dlopen a library that links in pthread for the first time, there are races all over the place. glibc _tries_ to make it work, but there are a ridiculous number of open bugs about race conditions. I just opened one this year: dlerror (and by association dlopen and dlclose) aren't thread-safe in this scenario. Most of the NSS code is broken for similar reasons.
They either need to fix it or simply stop trying to make it work. Instead, these very serious threading bugs are left to languish because the fixes require a significant overhaul or a significant change to the documented semantics. Everybody will complain loudly if they official stop supporting this. But they don't have the time to make it work properly. So instead you end up with applications that are fundamentally broken, with most developers and users ignorant of how broken their applications are.
OS X, Solaris, musl-libc, and others solved the issue by simply incorporating libpthread into libc. On Linux, interpreters like Perl and Python are usually linked with libpthread unconditionally, with most people none the wiser, despite the fact that doubtless most people, were they given a veto, would have violently opposed such a move out of exaggerated concern for single-threaded performance. I ran into the above bug because no distribution links its Lua interpreter with libpthread, which means loading a Lua module that uses threading is broken, although it will appear to work in many cases (especially if you don't unload modules before exiting.) I ignored the Valgrind bug reports as false positives until I was forced to track down persistent segfaults on a production server. The bug report is _still_ marked as NEW *sigh*.
Given the pervasive use of multi-threading, and the sheer difficulty of trying to optimize for the non-threaded case without introducing a metric ton of difficult to detect races in threaded code, glibc should just assume it's always running threaded and optimize accordingly to minimize the cost for single-threaded apps. Most of the current code which attempts to detect threading is manifestly broken, and has been for years.
They don't have to literally merge libpthread, just unconditionally make use of the mutexes it already has in place.
Posted Aug 18, 2015 21:52 UTC (Tue)
by jwakely (subscriber, #60262)
[Link] (1 responses)
Why is that a problem? In the glibc bugzilla that just means confirmed-but-not-fixed yet, so since it isn't fixed that seems to be the right status.
Posted Aug 19, 2015 14:54 UTC (Wed)
by wahern (subscriber, #37304)
[Link]
Well when you put it that way ;)
I guess I assumed that because it's still marked as NEW--instead of, say, CONFIRMED--and isn't assigned to anybody or garnered any comments, that it's being neglected. I'll admit that might be a poor assumption. I'm definitely not a student of issue tracking workflow.
Still, as I mentioned in the bug report
For something as critical as the C runtime, correctness should be paramount over features. I just can't imagine sitting on this kind of pervasive issue when managing any of my own projects. At the very least I'd like to assess the situation and strategize, even if I just throw out a caveat in the comments as an interim measure. But I suppose glibc is awash in bug reports and short of hands.
I've really taken a liking to musl-libc. Worse things could happen than musl-libc becoming the predominate libc on Linux. It's almost featureful enough to substitute for glibc, yet small enough to displace uclibc and bionic.
glibc thread races
glibc thread races
strsignal.c has this same problem. where_is_shmfs in shm_open.c also has an
initialization race. gaiconf_init in getaddrinfo.c might be subject to a race, but
I couldn't quickly determine which globals it accesses.