LWN: Comments on "Little-endian PowerPC" http://lwn.net/Articles/408845/ This is a special feed containing comments posted to the individual LWN article titled "Little-endian PowerPC". hourly 2 Little-endian PowerPC http://lwn.net/Articles/409999/rss 2010-10-14T16:58:58+00:00 mrpippy <div class="FormattedComment"> As I remember, Virtual PC (emulating x86 on PowerPC Macs) used this feature as a core part of the emulation engine. This was a big problem when the PPC 970 (G5) came out without the little-endian mode<br> </div> Little-endian PowerPC http://lwn.net/Articles/409654/rss 2010-10-12T16:11:07+00:00 jzbiciak <P>I came here to say this. You're tilting at windmills if you're worrying too much about the cycle cost of byte swapping on modern hardware in most situations. About the only place it matters deeply is if you have a huge amount of data in the non-native endian (as would be the case for a frame buffer, hence the motivation of LE PCC).</P> <P>The real cost most of the time is making sure you've not introduced endian dependencies in your code unwittingly, and have managed them properly where you have introduced them. Correctness is the tricky part.</P> Little-endian PowerPC http://lwn.net/Articles/409227/rss 2010-10-08T15:29:17+00:00 jengelh <div class="FormattedComment"> <font class="QuotedText">&gt;but I did not find a way to make GCC use it (i.e. either inline the bswap() or recognise the three shift sequence) on a i386 host...</font><br> <p> Works here. Optimizer starts recognizing int s = ((((argc) &amp; 0xff000000) &gt;&gt; 24) | (((argc) &amp; 0x00ff0000) &gt;&gt; 8) | (((argc) &amp; 0x0000ff00) &lt;&lt; 8) | (((argc) &amp; 0x000000ff) &lt;&lt; 24)); on -O2 and issues a bswap for it.<br> <p> Alternatively, you could use the same trick as glibc's htonl: int s = __bswap_32(argc). Or even directly __builtin_bswap32 as documented in gcc.info.<br> </div> Little-endian PowerPC http://lwn.net/Articles/409221/rss 2010-10-08T14:25:02+00:00 etienne <div class="FormattedComment"> <font class="QuotedText">&gt; I doubt that</font><br> <p> GCC do know about the bswap instruction since 4.0 (I just checked the source), but I did not find a way to make GCC use it (i.e. either inline the bswap() or recognise the three shift sequence) on a i386 host...<br> </div> Little-endian PowerPC http://lwn.net/Articles/409148/rss 2010-10-08T01:19:46+00:00 busterb <div class="FormattedComment"> Also, the IXP425 has a proprietary firmware that runs on the microengines that drive the networking subsystem. These always run big-endian, so it makes sense to have the whole chip be big endian (the microengines are not byte-swappable like the ARM core is.)<br> <p> I did a lot of coding for the IXP2855, which is the IXP425's discontinued big brother.<br> </div> Little-endian PowerPC http://lwn.net/Articles/409135/rss 2010-10-07T23:07:38+00:00 jhhaller <div class="FormattedComment"> Byte swapping is not a particularly expensive operation. Pulling the data from RAM is the biggest delay in modern CPU, with a minor component related to the extra instructions. Since the instructions tend to get into the cache, the extra instructions don't detract significantly from the speed. Memory access dominates program execution speed for many if not most programs. We got this result while comparing execution speed for software with and without byte-swapping instructions while evaluating how to port some big-endian software to a little-endian processor.<br> <p> </div> Little-endian PowerPC http://lwn.net/Articles/409104/rss 2010-10-07T18:55:07+00:00 daniel <div class="FormattedComment"> &lt;i&gt;Last time I looked gcc was not able to generate (did not know about) the bswap instruction.&lt;/i&gt;<br> <p> I doubt that, however what GCC does need is support for endian variable attributes with appropriate code generation. Last I looked, GCC has no such feature.<br> </div> Little-endian PowerPC http://lwn.net/Articles/409071/rss 2010-10-07T16:51:11+00:00 linuxjacques <div class="FormattedComment"> <p> The NSLU2 SoC, an Intel IXP425 defaulted to BE because it's a<br> "network processor" and network byte order is BE.<br> <p> There was a slight but measurable network throughput gain when<br> running in BE vs LE.<br> <p> </div> Little-endian PowerPC http://lwn.net/Articles/409070/rss 2010-10-07T16:46:55+00:00 linuxjacques <div class="FormattedComment"> <p> The NSLU2 (like most (all?)) ARMs can run either endian.<br> <p> I've run my slugs both ways.<br> <p> </div> Little-endian PowerPC http://lwn.net/Articles/409049/rss 2010-10-07T15:38:31+00:00 etienne <div class="FormattedComment"> <font class="QuotedText">&gt; Perhaps you need to update from that ancient gcc version ;-)</font><br> <p> Still no GCC-4.5 here and on ia32, if your example no more calls the function "htonl" (which for GCC-4.4.5 is a library function written manually in assembler - I just checked the whole calling sequence by objdump), that is a very welcome improvement!<br> <p> <font class="QuotedText">&gt; Well, wouldn't that be just AND r0, 0xFF.</font><br> <p> Unfortunately a register do not have an address to pass to a function, so you need to allocate some temporary space on the stack and copy your register there...<br> </div> Little-endian PowerPC http://lwn.net/Articles/409040/rss 2010-10-07T14:47:03+00:00 jengelh <div class="FormattedComment"> <font class="QuotedText">&gt;Last time I looked gcc was not able to generate (did not know about) the bswap instruction.</font><br> <p> Perhaps you need to update from that ancient gcc version ;-)<br> <p> int main(int argc, char **argv) { printf("%d\n", htonl(argc)); return 0; } compiled with gcc-4.5 -O3 -static on x86_64 gives me a bswap in objdump. bswap has been there since 80486.<br> <p> <font class="QuotedText">&gt;some assembly instuctions to cast the value 3 in a 16 bits word to the value 3 in a byte,</font><br> <p> Well, wouldn't that be just AND r0, 0xFF.<br> </div> Little-endian PowerPC http://lwn.net/Articles/409026/rss 2010-10-07T14:21:04+00:00 etienne <div class="FormattedComment"> Last time I looked gcc was not able to generate (did not know about) the bswap instruction.<br> I do not think gcc knows about "load/store word with byte reversed" of PPC neither.<br> For BE CPU, you still need some assembly instuctions to cast the value 3 in a 16 bits word to the value 3 in a byte, and pass its address... at least in gcc you need an explicit temporary.<br> It is easy to assume (source of a lot of bugs) that when an constant integer contains the value 15 you can cast the address of that integer to a char or short pointer; to get rid of those bugs it seems people prefer little endian.<br> But basically your byte endianess is directly depending on the endianess of all the environment, not only the network interface.<br> The "best" I worked with was a big endian main processor connected to a little endian coprocessor connected to a big endian FPGA: at the time you know that the message in the coprocessor contains a 32 bits value and you should byte-swap it, you should byte-swap it back to write it to the FPGA... Basically do not byte-swap it at all but for display/debug.<br> <p> </div> Little-endian PowerPC http://lwn.net/Articles/409017/rss 2010-10-07T13:11:03+00:00 jengelh <div class="FormattedComment"> <font class="QuotedText">&gt;Not all the fields, just the 32 bits fields - while the 16 bits fields have to be rotated by 8</font><br> <p> Well, for 32-bit fields you use "BSWAP reg32" instruction, for 16 bit you use "ROR reg16, 8", so for simplicity of the argument, it is equally costly.<br> <p> But my point was that BE CPUs need not do either of these for most networking protocols.<br> </div> Little-endian PowerPC http://lwn.net/Articles/408998/rss 2010-10-07T12:57:27+00:00 etienne <div class="FormattedComment"> <font class="QuotedText">&gt; since LE archs have to byteswap all the fields in network packets.</font><br> <p> Not all the fields, just the 32 bits fields - while the 16 bits fields have to be rotated by 8 and the bytes have to be left untouched.<br> What is costly is not the byteswapping, it is all the code to process fields length (each field length and position is only known in the protocol layer where the field is defined, you cannot do a big function messages_swap() at all).<br> I would like to have the attribute((big-endian)) in gcc, but I guess the market share of big endian processor no more worth the improvement...<br> </div> Little-endian PowerPC http://lwn.net/Articles/408968/rss 2010-10-07T08:55:45+00:00 jond <div class="FormattedComment"> ARM comes in both flavours. The Linksys NSLU2 is big-endian ARM, for example.<br> </div> Little-endian PowerPC http://lwn.net/Articles/408954/rss 2010-10-07T05:50:27+00:00 ikm <div class="FormattedComment"> <font class="QuotedText">&gt; the fact that one obscure architecture - x86 - is little-endian</font><br> <p> Not to mention ARM. Those together hold a lion share, such big that endian-correctness in programs isn't too big a deal anymore.<br> </div> Little-endian PowerPC http://lwn.net/Articles/408952/rss 2010-10-07T05:22:49+00:00 jengelh <div class="FormattedComment"> <font class="QuotedText">&gt;little-endian apparently feels cheap</font><br> <p> Apparently, LE is anything but cheap, since LE archs have to byteswap all the fields in network packets. Costly, costly.<br> </div>