|
|
Subscribe / Log in / New account

Who's afraid of a big bad optimizing compiler?

Who's afraid of a big bad optimizing compiler?

Posted Jul 17, 2019 7:34 UTC (Wed) by wtarreau (subscriber, #51152)
In reply to: Who's afraid of a big bad optimizing compiler? by Paf
Parent article: Who's afraid of a big bad optimizing compiler?

When you're working out of the kernel, you suddenly don't have this collection of tools that were brought by kernel developers anymore, and you figure you have to start the "standard" way, using the pthread API. Then you see that your code is excessively slow, that you're using mutexes every now and then for almost nothing so you switch to good old __sync operations that are compatible between many generations of compilers. You still find them slow because most of the time you manipulate multiple independent variables that don't need barriers in between but you need groups of changes to be atomic respective to other groups. Then you start to play with barriers by hand and resort to the more modern __atomic stuff which drops compatibility with older compilers and/or some less common architectures. At this point you tend to navigate in a gray area where most of the performance sensitive stuff is done by hand using complex macros involving ifdefs and fallbacks for unsupported archs and compilers, where the less sensitive stuff is done using more portable, safer, but slower __sync stuff or spinlocks when there are multiple, and the rare slow operations can be dealt with using mutexes.

So no, it's not irrelevant to performance nor does it only concern those who need to implement their own concurrency primitives. In fact it should be for any developer of concurrent code who notices that his code either gets 20% slower on single-thread performance by just using mutexes at the wrong place, or that the code doesn't scale at all due to excessive cache lines bouncing between cores caused by excess of atomic ops. And sadly there are many people concerned by this, who often discover this the first time from a user report of very bad performance in a corner case.

Please also note that the points there are also valid with signals. And whoever plays with signals to perform various actions (state dump, config reload etc) should be really aware of this before manipulating half-written variables.


to post comments


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds