Posted Jun 25, 2012 19:57 UTC (Mon) by butlerm (subscriber, #13312)
Parent article: Pettenò: Debunking x32 myths
The writer makes some interesting points, but ultimately his analysis fails because he neglects to address the main reason why anyone would use x32 instead of x86-64 in the first place - the reduction in size and cache impact of pointer heavy dynamically allocated data structures.
This makes a big difference on some applications, on the order of 40% on both Atom and Core i7, as demonstrated by non-trivial benchmarks run by the x32 ABI developers. Is that enough for a distribution to support x32? I don't know, but it is certainly not something to scoff at.
One worthwhile point he makes is that most x32 applications will run on an x86-64 kernel, so kernel performance will not be improved at all. I don't know how many embedded systems are kernel performance bound, but if it is important enough surely some way could be found to support an x32 native kernel as well.
Posted Jun 25, 2012 20:22 UTC (Mon) by scientes (guest, #83068)
[Link]
> but if it is important enough surely some way could be found to support an x32 native kernel as well.
That won't, and should not happen.
Pettenò: Debunking x32 myths
Posted Jun 25, 2012 21:54 UTC (Mon) by slashdot (guest, #22014)
[Link]
It would be a good thing for virtual machines and embedded devices with little RAM.
Not sure if the Linux x86 maintainers would accept it though, but it could be a fun project.
Pettenò: Debunking x32 myths
Posted Jun 26, 2012 2:52 UTC (Tue) by ringerc (subscriber, #3071)
[Link]
These days "little RAM" is still often 265MB or more. x32 kernels would be a rather short term proposition given that the comfortable limit of 32-bit is 2GB, and maximum is 4GB, only 8-9 times less than that limit.
Even phones and tablets are pushing the 2GB mark.
Sure, truly small embedded Linux devices continue to exist and will for a long time to come. They're rarely x86 or x64, and aren't likely to be, so x32 is irrelevant for them.
Pettenò: Debunking x32 myths
Posted Jun 26, 2012 3:37 UTC (Tue) by Cyberax (✭ supporter ✭, #52523)
[Link]
x32 can use almost 4Gb _per_ _process_, it's not limited to 2Gb since there's no user/kernel address space split (kernel is completely 64-bit). And 4GB _per_ _process_ is still pretty big. Even most of games use less RAM (thanks to RAM-starved consoles).
Right now the biggest process on my development machine is a Java process running IntelliJ IDEA with a large project (about 1MLOC) opened. It's a whopping 600Mb monster using 1096Mb of address space.
Pettenò: Debunking x32 myths
Posted Jun 26, 2012 5:04 UTC (Tue) by ringerc (subscriber, #3071)
[Link]
The context probably got lost as it's several parent posts up. I wasn't referring to x32 in general being pointless, but to the development of an x32 kernel. I don't see the notion of a kernel that lives in the lower 4GB and uses mostly 32-bit pointers while using the native x64 mode to be particularly useful.
I can maybe see x32 with a 64-bit kernel, which is the only thing the x32 folks ever proposed, being useful.
Pettenò: Debunking x32 myths
Posted Jun 26, 2012 6:24 UTC (Tue) by Cyberax (✭ supporter ✭, #52523)
[Link]
Oh, yes, sorry.
Of course, I completely agree with you given the context of x32 kernel.
Pettenò: Debunking x32 myths
Posted Jun 26, 2012 7:26 UTC (Tue) by elanthis (guest, #6227)
[Link]
The hilarious part here is that I'm pretty sure that poster was simply talking about the kernel maintainers accepting the x32 personality (necessary for an x86_64 kernel to run x32 programs). :)
Pettenò: Debunking x32 myths
Posted Jun 27, 2012 16:05 UTC (Wed) by butlerm (subscriber, #13312)
[Link]
No, I actually think an x32 native kernel (or the equivalent for ARM64) would be an excellent idea for a large class of embedded systems - routers and file servers in particular. It would also be promising for use with hosted virtual machines, where the impact of running dozens of kernels starts to add up.
Pettenò: Debunking x32 myths
Posted Jun 26, 2012 3:38 UTC (Tue) by dlang (✭ supporter ✭, #313)
[Link]
one thing you are missing, you use x32 with a 64 bit kernel, so the kernel doesn't have memory limits, your only limit is 4G of address space per process (and the kernel needs almost none of that)
As a result, any system up to 4G is perfectly happy as x32, and if the system is doing more than one thing, you could easily get 16G or larger systems without needing 64 bit binaries.
And if you are doing VMs, this is the size of the VM, not the size of the overall system.
Pettenò: Debunking x32 myths
Posted Jun 26, 2012 5:06 UTC (Tue) by ringerc (subscriber, #3071)
[Link]
The context of the reply was someone proposing an "x32" kernel.
x32 userspace with an x64 kernel makes sense (ish) and that's all the x32 folks themselves ever proposed.
Pettenò: Debunking x32 myths
Posted Jun 25, 2012 20:33 UTC (Mon) by tialaramex (subscriber, #21167)
[Link]
I think the idea of this post was mainly "Hey, ricers, this is not for you" and it's just that it looks weird here on LWN out of that context.
I guess this because it's Gentoo (which is infested with ricers) and because the "debunking" seems to spend a lot of time on things nobody I know assumed had anything to do with x32, like shrinking binaries on disk.
Pettenò: Debunking x32 myths
Posted Jun 25, 2012 20:48 UTC (Mon) by mikemol (subscriber, #83507)
[Link]
I've been reading Diego's blog for a few years. It's not generally about performance tuning or "ricing", but about low-level things like ABI, systemic testing, autotools and a bunch of stuff.
His focus is on systemic compile-time cleanliness and stability.
Incidentally, a <em>lot</em> of stuff breaks under x32.
Pettenò: Debunking x32 myths
Posted Jun 25, 2012 21:20 UTC (Mon) by Flameeyes (subscriber, #51238)
[Link]
Uhm no, again, let's remember that Intel themselves are considering x32 a "closed system" ABI, an embedded ABI, something that Gentoo is very useful for (I know that for experience having worked multiple times before on embedded Gentoo Linux devices).
Please see the other linked article as well, it's easier to see the two of them together.
Also, for what concerns "binaries on disk", the comparison between the two libc files is actually done on the _allocated_ sizes, which is memory, not disk. The fact that it refers to files on disk is definitely not the point.
Pettenò: Debunking x32 myths
Posted Jun 25, 2012 22:47 UTC (Mon) by tialaramex (subscriber, #21167)
[Link]
If my earlier interpretation was wrong then this is an even more disappointing use of LWN's space than I thought.
Putting the two articles together I am even more inclined to think it's supposed to be targeted at ricers. Who else could imagine that recompiling some assembly-heavy video codecs for x32 was somehow even remotely relevant? At no point in these articles does it appear that you've really grasped what the people who proposed, developed and shipped this ABI were trying to achieve. You almost touch on it in the second article, but only long enough to lurch onto the afore-mentioned tangent about the C standard library and files on disk.
A lot of the "myths" look more like strawmen. They aren't common misconceptions anywhere, some aren't even mentioned in the not-so-bright comments to the previous article.
Pettenò: Debunking x32 myths
Posted Jun 26, 2012 5:25 UTC (Tue) by cmccabe (guest, #60281)
[Link]
I'm not sure why you're being so hard on Diego. The whole point of the x32 ABI was increased performance. Diego is questioning whether the API will actually achieve that. You may disagree with his conclusions, but there isn't any need for name-calling.
You don't have to be a "ricer" (whatever that is) to be interested in performance. You could just be an engineer who gets paid to make servers or embedded devices go faster and use less battery. Or someone who is interested in the topic in general.
Pettenò: Debunking x32 myths
Posted Jun 26, 2012 17:08 UTC (Tue) by mikemol (subscriber, #83507)
[Link]
"ricer" is a pejorative term intended to describe (and be dismissive of) someone who is interested in speed, does things to get more speed, but has no actual understanding of what it is they're doing.
Pettenò: Debunking x32 myths
Posted Jun 26, 2012 19:06 UTC (Tue) by dlang (✭ supporter ✭, #313)
[Link]
from how I've seen the term thrown around, there's no attempt to know if the person has any understanding of that they are doing or not. Just the assumption that it's a waste of time.
Pettenò: Debunking x32 myths
Posted Jun 28, 2012 17:34 UTC (Thu) by jzbiciak (✭ supporter ✭, #5246)
[Link]
In any case, whether it's true or not, the perception that "ricers prefer Gentoo" is far from new. There's an entire website devoted to mocking them, and that website is ooooooollld. (Almost as old as Gentoo.)
There are plenty of folk who know what they're doing, I'm sure. At the same time, there is (or at least was) a visible crowd whose competence seems to barely rise above "script kiddie," applying the equivalent of "go faster stripes" to their computer. (20 years ago, I probably would have been no different, I must admit.) If the latter crowd thinks x32 is the latest "go faster stripes" that will magically make their computer a gazillion times faster, then they need to be told why that's not the case.
Anyway, I don't really have an opinion on whether Diego's post was aimed at that crowd. I saw some valid criticisms, and some odd attention to multiply latency. *shrug*
What I'd really like to see is some comprehensive benchmarks. Now *that* might be interesting. One I'd particularly like to see is "memory footprint of Firefox after loading these 100 tabs." ;-) These days, that seems to be the most common resource hog on my own machine, cycle or RAM-wise.
Pettenò: Debunking x32 myths
Posted Jun 26, 2012 0:30 UTC (Tue) by rich0 (guest, #55509)
[Link]
You really should read more of Diego's blog. He could be wrong, but I wouldn't be so quick to just write him off, especially when talking about ABIs and such. He is VERY competent when it comes to C, ABIs, ELF, autotools, and a number of fairly low-level details. By all means feel free to disagree, but you should weigh his arguments carefully.
I never got the whole Gentoo ricers thing. I'd say that Gentoo isn't populated with ricers so much as with people who tend to do unusual things with their linux systems, including obsessing over performance, running Gentoo Prefix, coming up with x32, or designing embedded systems. Gentoo tends to be a very malleable distro, and it tends to appeal to those for whom the typical 99% solution just isn't good enough. If Ubuntu or whatever floats your boat by all means enjoy using it. I wouldn't be so quick to write off those who invest in Gentoo - if you ever run into an oddball problem one of them might be able to help you out...
Pettenò: Debunking x32 myths
Posted Jun 26, 2012 18:38 UTC (Tue) by butlerm (subscriber, #13312)
[Link]
I don't doubt that. The problem here is only that he apparently didn't actually benchmark anything, or seriously address the only issue likely to make a substantive difference - data cache impact.
Pettenò: Debunking x32 myths
Posted Jun 25, 2012 21:17 UTC (Mon) by Flameeyes (subscriber, #51238)
[Link]
> This makes a big difference on some applications, on the order of 40% on both Atom and Core i7, as demonstrated by non-trivial benchmarks run by the x32 ABI developers.
Where are you reading that 40%? And what are you comparing that against? And how do you know it's non-trivial?
I happen to like being thorough on what I do, and see if I made mistakes so if I missed a non-trivial benchmark that shows a 40% against what was the best possible situation before x32 came to be, I'd be thrilled.
Unfortunately, the LPC talk by Intel's engineers from September 2011 lists a 5-11% increase in performance _against i386_ and a 5-8% against amd64 on the SPEC2k benchmark, which is by far not what I'd call a "non-trivial benchmark". (Note: some previous papers do refer to a much bigger improvement, but that was against x86 as well, not amd64, _and_ even Intel is downplaying those numbers now.)
The problem is that the only ones touting benchmark numbers are the very same guys who're trying to "sell" the idea — which is never a good idea to listen to by default.
About the size of dynamically allocated pointer-heavy structures — it might make a substantial difference, but I honestly don't think so, I noted something about it on the post before that, it's something that people seem to refer to, but nobody has numbers for.
Pettenò: Debunking x32 myths
Posted Jun 25, 2012 21:45 UTC (Mon) by mansr (guest, #85328)
[Link]
>> This makes a big difference on some applications, on the order of 40% on both Atom and Core i7, as demonstrated by non-trivial benchmarks run by the x32 ABI developers.
>
> Where are you reading that 40%?
This is a pointer chasing benchmark where the entire data set consists of pointers, or hardly a realistic test case. It is furthermore an amazingly poorly written piece of software. In many places it needlessly uses global variables in inner loops, which can thwart compiler optimisations (particularly aliasing-based ones). It also uses 'long' exclusively where any sane programmer would use 'int', possibly leading to more expensive 64-bit operations being used where there really is no need.
Moreover, the website does not mention which compiler, let alone which compiler flags, were used, nor does it provide any raw numbers from the benchmark run. A lone percentage figure as presented there means absolutely nothing whatsoever.
Finally, out of all the spec2k modules, they chose to showcase only two, presumably because those two showed the most favourable results. The second one is 186.crafty, showing a meagre 3% improvement (4% on Atom). This leaves one wondering what the results of the remainder looked like. Something tells me they showed improvements of less than 3%, if any at all.
Pettenò: Debunking x32 myths
Posted Jun 25, 2012 22:05 UTC (Mon) by butlerm (subscriber, #13312)
[Link]
181.mcf doesn't sound like a toy benchmark to me:
"For the considered single-depot case, the problem can be formulated as a large-scale minimum-cost flow problem that we solve with a network simplex algorithm accelerated with a column generation. The core of the benchmark 181.mcf is the network simplex code 'MCF Version 1.2 -- A network simplex implementation', For this benchmark, MCF is embedded in the column generation process." http://www.spec.org/cpu2000/CINT2000/181.mcf/docs/181.mcf...
In any case, if 181.mcf is so poorly written, perhaps someone could do us the favor of benchmarking other pointer intensive code instead of dismissing x32 without bothering to conduct a single relevant benchmark.
Pettenò: Debunking x32 myths
Posted Jun 25, 2012 21:48 UTC (Mon) by JoeBuck (subscriber, #2330)
[Link]
To give an example from one field (though one that matters very much to the proponents of this API, I suspect): the current state of affairs is that it is common for EDA users to use both a 32-bit and a 64-bit version of the same application (simulator, model checker, synthesis tool, etc) on x86-64 architecture, preferring the former if the problem size will fit into 32 bits. That's because these applications are pointer-heavy, data access heavy and limited by what will fit into physical memory (it's not just the cache size, though of course that is a factor as well). The result is often that the 32-bit version is faster, at least for a certain range of problem size that is rather common, because it takes less memory. The 64-bit version is often reserved for test cases that require more than 4GB, because the performance of a server farm running lots of simulations is often constrained by how many jobs will fit into physical memory.
With x32 there would still be two different versions of the executable. For that reason, the relevant comparison is between the x32 version and a traditional x86 32-bit version. There are a number of wins: the larger register set means much less penalty for PIC code; 64 bit operations are available, there is much less register-spilling code. I would still recommend a 64-bit kernel, but I find x32 to be very interesting and don't think that the author of this piece really understands why the proponents' employer might be investing heavily in this.
Pettenò: Debunking x32 myths
Posted Jun 26, 2012 5:18 UTC (Tue) by alonz (subscriber, #815)
[Link]
I wonder – are there any benchmarks based on Icarus Verilog? (It may not be a state-of-the-art EDA tool, but it is open-source, therefore accessible for a benchmark... And its operation is very pointer-heavy, esp. if you load a significant model)
Unfortunately I don't have access to an x32 system (nor the time to start tinkering with Gentoo), so I just run the benchmark myself :(
Big Advantage of x32 myths
Posted Jun 29, 2012 10:56 UTC (Fri) by brianomahoney (subscriber, #6206)
[Link]
LLP 64 cause a LOT of old software to break (P>I), now foxing that is (a) right, (b) good but ALWAYS of pain, the x32 option will make this p.
MFG, ombroblem go away
Big Advantage of x32 myths
Posted Jun 29, 2012 12:46 UTC (Fri) by mansr (guest, #85328)
[Link]
That was a problem when 64-bit CPUs first came out. Now, 20 years later, all remotely important software has been fixed, so this isn't really an argument any more.