Posted Sep 27, 2013 9:25 UTC (Fri) by etienne (subscriber, #25256)
In reply to: A perf ABI fix by mpr22
Parent article: A perf ABI fix
A also work with hardware, but mine may be working better.
Maybe FPGAs work better, at least read/write issues are dealt by VHDL teams.
What I am saying is that ten lines of #define to write a memory map register do not scale; once the single block works, FPGA teams just put 2048 of them on one corner of the FPGA.
Then, most of the errors you find is that the wrong "ENABLE_xx" mask has been used with a memory map register, or someone defined
#define FROBNICATE_1 xxx
#define FROBNICATE_2 xxx+2
...
#define FROBNICATE_256 xxx+512
but failed to increment for (only) FROBNICATE_42
When using C described memory mapped registers (with a volatile struct of bitfields), you can read a single bit directly (knowing that the compiler will read the struct once and extract the bit), but when you want to access multiple bits you read the complete volatile struct into a locally declared (non volatile) struct (of the same type).
If you want to modify and write you do it on your locally declared struct and write the complete struct back.
The reading and writing of volatiles appear clearly in the source, and you can follow on your analyser, but the compiler is still free to optimize any treatment of non-volatile structs.
Posted Sep 27, 2013 9:54 UTC (Fri) by mpr22 (subscriber, #60784)
[Link]
What I am saying is that ten lines of #define to write a memory map register do not scale; once the single block works, FPGA teams just put 2048 of them on one corner of the FPGA.
It seems to me that dealing with an FPGA containing 2048 instance of the same functional block should only require defining two or three more macros than dealing with an FPGA containing one instance of that block. If it doesn't... you need to have a quiet word or six with your FPGA teams about little things like "address space layout".
A perf ABI fix
Posted Sep 27, 2013 11:33 UTC (Fri) by etienne (subscriber, #25256)
[Link]
> require defining two or three more macros
In that case the 10000's lines of #define is automatically generated by some TCL command nobody really is interested of reading, while "compiling" the VHDL.
You have the choice as a software engineer either to use that file or not use it; if you do not use it by what do you replace it.
For me, having an array of 2048 structures, each of them containing one hundred different control/status bits, few read and few write buffer, fully memory mapped and most area not even declared volatile leads to a source code ten times smaller with a lot less bugs.
Obviously my knowledge of the preprocessor is sufficient to use the 10000's line file and "concat" names to counters in macros to access all the defines if my employer want to. I can do so for the 20 different parts of the VHDL chip, on each of the chips.
Note that there is always an exception to every rule, and someone will modify the automatically TCL generated file, in the future.