LWN.net Logo

My kid hates Linux (ZDNet)

My kid hates Linux (ZDNet)

Posted Apr 14, 2008 19:03 UTC (Mon) by JoeBuck (subscriber, #2330)
In reply to: My kid hates Linux (ZDNet) by BenHutchings
Parent article: My kid hates Linux (ZDNet)

It's short-sighted to believe that the only reason for wanting to run a 32-bit executable on a 64-bit platform is to run non-free code. For a program that must manipulate a huge pointer-heavy data structure, such as an electronic design automation program or mechanical CAD application, the 64-bit version needs nearly 2x the memory, and a 32-bit app whose data fits in memory beats a 64-bit app that page-faults a lot, by a factor of 10 or more.


(Log in to post comments)

My kid hates Linux (ZDNet)

Posted Apr 14, 2008 19:34 UTC (Mon) by andikleen (subscriber, #39006) [Link]

Also you might not just want recompile your old software, even if it's free. 

Why should you be forced to when both the CPU and the kernel have no problem at all still
executing it fine at basically native speed. I often
copy over old binaries from other systems that I compiled ages ago.
Why should I redo that work?

This "32bit compat is only for non free" software excuse really doesn't
make much sense.

My kid hates Linux (ZDNet)

Posted Apr 14, 2008 19:42 UTC (Mon) by pizza (subscriber, #46) [Link]

For "generic" 64 vs 32-bit comparisons, I'd agree, but we're talking about x86 and x86_64
here, and the latter has many many architectural improvements over the former (such as double
the number of registers).  As a result, most software tends to run slightly faster due to less
register contention, despite the additional overhead of larger pointers.  Additionally, if
you're doing lots of 64-bit math (as CAD/EDA is wont to do), things get *much* faster.

And yes, I've benchmarked this for myself.  The one thing that's trivial for me to recompile
now (dcraw) gives me an 11% improvement when built as a 64-bit binary.

My kid hates Linux (ZDNet)

Posted Apr 14, 2008 23:47 UTC (Mon) by djabsolut (guest, #12799) [Link]

Not to disparage the generally good idea of moving towards x86_64, however an improvement of 11% is not really worth the hassle of incompatabilities. What are the speedups like on average ?

(AFAIK, modern processors "translate" the crufty x86_32 code into their own internal code, and along with a large cache this makes issues such as lack of registers not really a problem. The only practical reason one would want to use x86_64 is larger available memory space and/or 64 bit math -- the number of applications needing this is dwarfed by plain-jane applications).

My kid hates Linux (ZDNet)

Posted Apr 15, 2008 3:44 UTC (Tue) by jwb (subscriber, #15467) [Link]

There are major differences with x86_64 that show up everywhere, not just for math.  The
calling convention on the 64-bit system are far cleaner.  More arguments can be passed to
functions in the registers (6, I think) than on 32-bit systems, where the extra arguments have
to be placed on the stack.  Stack management function on x86 are not free; they take one or a
few cycles during every function call and function return.  This can add up.

x86_64 also allows more and better ways of addressing data that can save an explicit load to
register.

These are not theoretical improvements.  Lots of programs run much better on x86_64 than on
plain old x86.

The 64-bit systems do still have the problem of larger pointers which can crowd the cache, but
some programmers find ways around this.  BEA, for example, uses short heap pointers in their
JVM, which gives them all the speedups of the x86_64 programming model (described above)
without paying the cost of 64-bit pointers.

Sorry, Mr. pizza ...

Posted Apr 15, 2008 1:17 UTC (Tue) by JoeBuck (subscriber, #2330) [Link]

... but I wasn't speaking theoretically. I work in electronic design automation.

The doubled-memory effect really does overwhelm the effect of having more registers, 64-bit math and a better machine architecture in many real cases, particularly when the program's working set is in the gigabytes. The time to move that data through the CPU overwhelms all other considerations. The 64-bit executable wins when the working set exceeds the 32-bit address space, of course, but in the range where the 32-bit program requires 1-2 Gbytes and the 64-bit program needs nearly double that.

For this reason, many EDA applications are available in both 32-bit and 64-bit versions, and the recommendation to the customer is to use the 32-bit version even on the 64-bit machine except where the problem is too large.

Sorry, Mr. pizza ...

Posted Apr 15, 2008 6:20 UTC (Tue) by motk (subscriber, #51120) [Link]

Counterpoint, RAM is pretty cheap these days. Just Add More.

Of course, you do come across motherboard limitations occasionally.

RAM is not the problem

Posted Apr 15, 2008 14:46 UTC (Tue) by GreyWizard (subscriber, #1026) [Link]

CPU cache and bandwidth limitations are the issue here, not RAM size.

Sorry, Mr. pizza ...

Posted Apr 15, 2008 6:36 UTC (Tue) by bronson (subscriber, #4806) [Link]

If 64 bit pointers are really that big a deal, how come the EDA guys don't use 4GB memory
pools with 32-bit offsets?  That's way you get the speed and huge memory space of 64 bits with
the space efficiency of 32 bit.  Seems like a win-win.

It's been quite a while since I've done EDA (some VLSI layout and simulation back in 2003).  I
remember some seriously crufty software produced by vendors who would do anything to avoid an
update.  Some of the tools I used were written *and compiled* pre-1998!  It was a nightmare
trying to get that junk to run.  I eventually got the toolchain working and then I never let
anybody touch that box again.  Not so much as a security update or a package upgrade lest it
break anything.

So...  If the EDA industry is indeed pushing back against 64 bit, there might be more to it
than just pointer size inflating the working set.  :)

Sorry, Mr. pizza ...

Posted Apr 17, 2008 20:34 UTC (Thu) by im14u2c (subscriber, #5246) [Link]

If 64 bit pointers are really that big a deal, how come the EDA guys don't use 4GB memory pools with 32-bit offsets? That's way you get the speed and huge memory space of 64 bits with the space efficiency of 32 bit. Seems like a win-win.

Sounds like a maintenance nightmare to me, particularly if the code base is shared between 32-bit and 64-bit worlds, and if any portion of the data set has an index larger than 232-1. The reason I say "index" is that these pools could be homogeneous pools of structures, and so the addressed memory in that pool could actually be as large as 232 * sizeof(struct whatever), rather than just 232 bytes.

Sure, on 64-bit machines you get the compact representation. But, on all machines that share that code base, you add an additional indirection to compute your final pointer, and you've thrown up partitions in your memory map based on where these pools are. If your problem doesn't partition into pools nicely, you're hosed.

Sorry, Mr. pizza ...

Posted Apr 15, 2008 13:21 UTC (Tue) by pizza (subscriber, #46) [Link]

Fair enough; your particular daily-use EDA app (proprietary?  you've never actually mentioned
what it is) performs worse.  You use what best supports your needs, after all.

However, my daily-use apps perform significantly better under 64-bit.  That 11% improvement
with dcraw was the only one I could recreate the benchmarks on immediately, as it's trivial to
recompile.

My main daily use app (GCC cross-compiler building a multi-million line codebase) runs
considerably faster under x86_64.  However, I no longer have an identical 32-bit system for
comparison any longer, so I can't supply benchmarks without blowing half a day on it.  (The
64-bit gnome desktop *feels* faster too, but that's obviously subjective)

One of the folks I work with has also raved about the improvements he saw using the 64-bit
versions of the particular FPGA synthesizer tools.  

Not to mention the speedup one gets by not needing bounce buffers (and other games) for I/O.

64-bit performance

Posted Apr 14, 2008 21:57 UTC (Mon) by epa (subscriber, #39769) [Link]

Do you have any documented cases of x86_64 code running slower than i386 or needing more
memory?  After all 32-bit ints are still available on x86_64.  Some code might use twice the
memory when pointers are twice as big, but you only really care about memory usage for
memory-heavy apps, and those are the exact ones where you really want a 64-bit system to allow
more than 4Gibyte addressable memory for each app.

64-bit performance

Posted Apr 15, 2008 4:39 UTC (Tue) by JoeBuck (subscriber, #2330) [Link]

Take a box with 2 Gb of memory. Run a program that requires a 1.5 Gb working set to avoid paging with 32 bit code, where the in-memory structure is heavy with pointers. Now recompile with -m64. Voila, it now needs maybe 2.5 Gb to avoid paging. You might well see the 64 bit code run 100 times slower.

We run workloads like that all the time.

64-bit performance

Posted Apr 15, 2008 6:08 UTC (Tue) by bronson (subscriber, #4806) [Link]

I suppose it's true that if you use an artificial 2.5 GB dataset and impose a 2 GB memory
limit, 32 bit would be faster.

In the real world, why wouldn't you just spend $50 for a 2GB memory upgrade?  Then the 64 bit
box would fly.  If you're not convinced, let's try this exercise again with a hypothetical 3.2
GB data set.  :)

In my experience, modern 64 bit boxes with stuffed with lots of ram are really cheap and
really damn fast.  I can't think of any reason to deploy 32 bit for servers/HPC these days.

64-bit performance

Posted Apr 15, 2008 18:38 UTC (Tue) by JoeBuck (subscriber, #2330) [Link]

"If you're not convinced, let's try this exercise again with a hypothetical 3.2 GB data set."

You've just answered your own question. Now you can run a problem that requires a 3.2GB working set with your 32 bit executable quickly (if you can squeeze it into the 4GB address space, and you'll need what Red Hat used to call the hugemem patch to make it work), but it takes maybe 5GB with the 64 bit executable, and the 32 bit version runs quicker.

Of course, you need the 64 bit executable when the problem size exceeds 4GB. The point is, it is useful for the developer to provide the user with both executables, to run on an operating system that can run both.

64-bit performance

Posted Apr 15, 2008 7:32 UTC (Tue) by laf0rge (subscriber, #6469) [Link]

The entire linux networking stack, and especially netfilter/iptables with connection tracking
is running 10-15% slower on x86_64 than on i386 kernels.

The main reason being that all pointers are suddenly twice as large, and thus most data
structures need at least one more cache line, resulting in significantly less of the working
set being present in cache, increasing cache misses, etc.

I think any code that has a lot of pointers in data structures should see the same effect.

64-bit performance

Posted Apr 15, 2008 19:39 UTC (Tue) by bronson (subscriber, #4806) [Link]

That's very interesting.  Has anybody tried working around this?

One solution would be convert hot 64-bit ptr fields to 32-bit offsets pointing into a single
memory pool.  I'm not familiar enough with the networking code to know how traumatic this
would be.  (I'm definitely not saying do this everywhere; just where it really matters).

This topic might make a fairly fascinating paper.  :)

64-bit performance

Posted Apr 18, 2008 6:07 UTC (Fri) by alankila (subscriber, #47141) [Link]

We should have pointers of a size intermediate between 32 and 64 bits, let's say 40-bit
pointers. The point being that the it'd be large enough to address the RAM necessary but
doesn't waste so much space.

I really don't think we'll ever grow to the point where we'll use all of the 64-bit pointer
address space, and pointers with top 20-30 bits unused are just wasted space.

Too bad that the whole world thinks in 2^n.

Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds