LWN.net Logo

A perf ABI fix

A perf ABI fix

Posted Sep 25, 2013 19:55 UTC (Wed) by pr1268 (subscriber, #24648)
Parent article: A perf ABI fix

I'm reminded of this gem from some GNU humor Web page I read a few years ago:

#define struct union

I'm not sure that would fix the perf ABI mess, though. ;-)

Seriously, though, why use the bit fields at all? Why not:

__uu64 capabilities;
#define CAP_USR_TIME            (1ULL<<63)
#define CAP_USR_RDPMC           (1ULL<<62)
#define HAS_CAP_USR_TIME(x)     (x)&CAP_USR_TIME
#define HAS_CAP_USR_RDPMC(x)    (x)&CAP_USR_RDPMC
#define SET_CAP_USR_TIME(x)     (x)|=CAP_USR_TIME
#define SET_CAP_USR_RDPMC(x)    (x)|=CAP_USR_RDPMC
#define UNSET_CAP_USR_TIME(x)   (x)&=~CAP_USR_TIME
#define UNSET_CAP_USR_RDPMC(x)  (x)&=~CAP_USR_RDPMC

Now you have the full complement of query, set, and unset operations in beautiful preprocessor code.


(Log in to post comments)

A perf ABI fix

Posted Sep 25, 2013 20:56 UTC (Wed) by geofft (subscriber, #59789) [Link]

Isn't it wonderful that we're writing our kernel in a language where preprocessor macros can defensibly be called "beautiful" by comparison to the alternative?

A perf ABI fix

Posted Sep 26, 2013 0:19 UTC (Thu) by ncm (subscriber, #165) [Link]

Gcc and Clang both support C99 inline functions. All the operations defined above can be expressed directly in type-checked C, without preprocessor macros, with identical runtime performance.

There are still places for CPP macros, but this isn't one of them.

A perf ABI fix

Posted Sep 26, 2013 9:38 UTC (Thu) by etienne (subscriber, #25256) [Link]

> All the operations defined above can be expressed directly in type-checked C

C cannot have function with bitfields parameters (i.e. parameter of 3 bits), so the simple bitfield version:
struct { unsigned dummy : 3; } a_var;
void fct (void) { a_var.dummy = 9; }
generate warning (gcc-4.6.3):
large integer implicitly truncated to unsigned type [-Woverflow]

The equivalent in C is:
unsigned avar;
extern inline void WARN_set_dummy_too_high(void) {
//#warning set_dummy value too high
char overflow __attribute__((unused)) = 1024; // to get a warning
}
inline void set_dummy(unsigned val) {
if (__builtin_constant_p(val) && (val & ~0x7))
WARN_set_dummy_too_high();
avar = (avar & ~0x7) | (val & 0x7);
}
void fct (void) { set_dummy(9); }

If the bitfield is signed, the C function gets even more complex, and prone to off-by-one bugs.

I have seen so much crap with #define (files with 10000+ #define lines, with bugs) that I would say bitfields is the future... let the compiler manage bits and bytes and let the linker manage addresses.

A perf ABI fix

Posted Sep 26, 2013 10:40 UTC (Thu) by mpr22 (subscriber, #60784) [Link]

Any time I see a :1 bitfield, I wonder why the author didn't just set up an unsigned char/short/int/long/long long and define compile-time constants for the bit(s).

Any time I see a :n (n > 1) bitfield, I wonder what makes the author simultaneously believe that (a) it's important to squash that value into a bitfield instead of just using an int*_t or uint*_t (b) it's not important for people to be able to look at the code and predict what it will do.

(And any time I see a bitfield without an explicit signedness specifier, I wonder if I can revoke the author's coding privileges.)

A perf ABI fix

Posted Sep 26, 2013 11:51 UTC (Thu) by etienne (subscriber, #25256) [Link]

> bitfield, I wonder why the author didn't just set up an unsigned char

Well, I was talking about describing the hardware, for instance a PCIe memory mapped window which control complex behaviour.
I do not like to see stuff like:
fpga.output_video.channel[3].sound.dolby.volume = 45;
expressed with #defines:
#define FPGA ((volatile void *)0xFD000000)
#define OUTPUT_VIDEO (FPGA + 0x10000)
#define CHANNEL (OUTPUT_VIDEO + 0x100)
#define SIZEOF_CHANNEL 0x20
#define OUTPUT_VIDEO_CHANNEL(n) (CHANNEL + (n * SIZEOF_CHANNEL))
#define SET_SOUND_DOLBY_VOLUME(channel, v) ((stuff1 & stuff2) << 12) ... etc...

For code unrelated to hardware, and not mapped to a fixed format (like for instance the structure of an Ethernet frame), then using bitfields is a lot less important.

A perf ABI fix

Posted Sep 26, 2013 12:40 UTC (Thu) by mpr22 (subscriber, #60784) [Link]

Yes, you're describing exactly the situation I'm implying with my comment.

I've worked with hardware a lot. I've worked with hardware that has default settings useful to exactly no-one. I've worked with hardware that sometimes fails to assert its interrupt output and then won't attempt to assert an interrupt again until the interrupt it didn't assert has been serviced. I've worked with hardware with complex functional blocks that were pulled in their entirety from a previous device, but only half-documented in the new device's manual. I've worked with hardware with read-to-clear status bits, hardware with write-zero-to-clear status bits, hardware with write-one-to-clear status bits, and hardware with combinations of those.

Thanks to that, I've spent enough time staring at bus analyser traces that I have come to appreciate code of the form ""read register at offset X from BAR Y of PCI device Z'; compose new value; write register at offset X from BAR Y of PCI device Z", because I can directly correlate what I see on the analyser to what I see in the code - and, even better, I can quickly tell when what I see on the analyser doesn't correlate to what I see in the code.

Most hardware isn't bit-addressable. Bitfields in device drivers look an awful lot like a misguided attempt to make it look like it is.

A perf ABI fix

Posted Sep 27, 2013 9:25 UTC (Fri) by etienne (subscriber, #25256) [Link]

A also work with hardware, but mine may be working better.
Maybe FPGAs work better, at least read/write issues are dealt by VHDL teams.
What I am saying is that ten lines of #define to write a memory map register do not scale; once the single block works, FPGA teams just put 2048 of them on one corner of the FPGA.
Then, most of the errors you find is that the wrong "ENABLE_xx" mask has been used with a memory map register, or someone defined
#define FROBNICATE_1 xxx
#define FROBNICATE_2 xxx+2
...
#define FROBNICATE_256 xxx+512
but failed to increment for (only) FROBNICATE_42

When using C described memory mapped registers (with a volatile struct of bitfields), you can read a single bit directly (knowing that the compiler will read the struct once and extract the bit), but when you want to access multiple bits you read the complete volatile struct into a locally declared (non volatile) struct (of the same type).
If you want to modify and write you do it on your locally declared struct and write the complete struct back.
The reading and writing of volatiles appear clearly in the source, and you can follow on your analyser, but the compiler is still free to optimize any treatment of non-volatile structs.

A perf ABI fix

Posted Sep 27, 2013 9:54 UTC (Fri) by mpr22 (subscriber, #60784) [Link]

What I am saying is that ten lines of #define to write a memory map register do not scale; once the single block works, FPGA teams just put 2048 of them on one corner of the FPGA.

It seems to me that dealing with an FPGA containing 2048 instance of the same functional block should only require defining two or three more macros than dealing with an FPGA containing one instance of that block. If it doesn't... you need to have a quiet word or six with your FPGA teams about little things like "address space layout".

A perf ABI fix

Posted Sep 27, 2013 11:33 UTC (Fri) by etienne (subscriber, #25256) [Link]

> require defining two or three more macros

In that case the 10000's lines of #define is automatically generated by some TCL command nobody really is interested of reading, while "compiling" the VHDL.
You have the choice as a software engineer either to use that file or not use it; if you do not use it by what do you replace it.
For me, having an array of 2048 structures, each of them containing one hundred different control/status bits, few read and few write buffer, fully memory mapped and most area not even declared volatile leads to a source code ten times smaller with a lot less bugs.
Obviously my knowledge of the preprocessor is sufficient to use the 10000's line file and "concat" names to counters in macros to access all the defines if my employer want to. I can do so for the 20 different parts of the VHDL chip, on each of the chips.
Note that there is always an exception to every rule, and someone will modify the automatically TCL generated file, in the future.

A perf ABI fix

Posted Sep 26, 2013 20:13 UTC (Thu) by ncm (subscriber, #165) [Link]

What mpr said. Further, any use of bitfields to control hardware makes the driver non-portable to any other architecture. Further further, there is no way to know, ABI notwithstanding, how any particular compiler version will implement a series of bitfield operations, so use of bitfields makes your driver code non-portable even to the next release of the same compiler.

Categorically, there is never any excuse to use bitfields to operate hardware registers. Use of bitfields in a driver is a marker of crippling incompetence. Publishing code written that way will blight your career more reliably than publishing designs for amateur road mines.

A perf ABI fix

Posted Sep 26, 2013 20:06 UTC (Thu) by pr1268 (subscriber, #24648) [Link]

Perhaps I shouldn't have said "Seriously"... My facetiousness extended to the second part of my original comment. Not to mention a typo: s/__uu64/__u64/. Of course, I could simply do a typedef __u64 __uu64; and voilà! Typo gone. :-D

I'm actually intrigued by the fact some above mention using bitfields is perhaps preferred to preprocessor macros. I was under the perception (based on my 2003-2005 undergraduate CS education) that they're frowned upon. As are unions. (Personally, I'm not bothered by either; I have used bitfields and unions, even very recently, in code I've written for demonstrating IEEE-754 floating point representation in binary. A quick look at /usr/include/ieee754.h will show lots of bitfields.)

P.S.1: Even COBOL has a union programming structure (the REDEFINES keyword).

P.S.2: I do think the Perf developers' solution is quite elegant. Well done, folks!

A perf ABI fix

Posted Sep 26, 2013 20:25 UTC (Thu) by ncm (subscriber, #165) [Link]

I despair.

Might such travesties have led Brian Kernighan to say that Linux kernel code was even worse than Microsoft NT kernel code he had seen?

At X.org, they take long-time brokenness of a feature to demonstrate that the feature is unused and may be eliminated. That would not be inappropriate in this case. If the feature is expected to be useful in the future, the sensible approach is to design another interface with another name, and leave the busted one the hell alone.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds