|
|
Subscribe / Log in / New account

A Gentoo x32 release candidate

A Gentoo x32 release candidate

Posted Jun 6, 2012 17:05 UTC (Wed) by gmaxwell (guest, #30048)
In reply to: A Gentoo x32 release candidate by gmaxwell
Parent article: A Gentoo x32 release candidate

> Moreover, the programs that use tons of pointers where x32 would be a big savings aren't using much memory to begin with.

Gah. I meant to say that they're either not using much memory, in which case the savings doesn't matter— or they are and they're the kind of workload where 64 vs 32 has scaling implications. (e.g. the browser)


to post comments

A Gentoo x32 release candidate

Posted Jun 6, 2012 17:31 UTC (Wed) by mikemol (guest, #83507) [Link] (13 responses)

I very intentionally didn't mention the browser as an application you'd want to be 32-bit. I thought about Chrome's model of one-process-per-tab, and decided I still liked the larger address for mmap and IPC purposes. The browser (or, at least, most of it) should be 64-bit. Perhaps there'd be sufficiently low overhead to have just the JS engine 32-bit.

Your browser is only one out of hundreds (if not thousands) of programs on your computer. Many (most?) of them only run for a few moments, or otherwise don't (or don't derive meaningful benefit from) consume huge amounts of memory memory.

Take the 'dd' command. top. ls. bash. dash. cp. mv. echo. cat. tee. cupsd. dbus-daemon. lpr. grep. find. xargs.

The programs you spend hours every day staring at? Yeah, those probably benefit from having a 64-bit address space. The programs you don't think about, often when you're not even actively using them? They probably don't.

A Gentoo x32 release candidate

Posted Jun 6, 2012 17:39 UTC (Wed) by gmaxwell (guest, #30048) [Link] (9 responses)

Yes, and how much benefit is there from making dd, top, ls, bash, cp, mv. echo, cat, tee, etc. x32 instead of x86_64? They have (and should have) very few pointers. So there should be very little memory savings, very little cpu cycle reduction from memcpying smaller pointers. (and if not, those programs should be fixed— certainly it would be easier to fix them to not copy huge pointer arrays than it would be to fix the big tools not to need a lot of vm)

But they do link shared libraries— at least libc— which is rather large. So if you're going to have a mix of x32 and x86_64 programs running you're going to end up with another copy of libc in memory for those things, passing through your caches, etc... which should easily offset the tiny gains from making those programs x32.

A Gentoo x32 release candidate

Posted Jun 6, 2012 18:13 UTC (Wed) by mikemol (guest, #83507) [Link]

Anything that uses linked-lists or tree data structures stands to benefit. And if you're dealing in dense packs of pointers in a data structure, you'll probably benefit from that fitting more tightly into a cache line.

A Gentoo x32 release candidate

Posted Jun 6, 2012 18:47 UTC (Wed) by and (guest, #2883) [Link]

I don't want to hurt anyone's feelings, but I'm working on CFD simulation code. The problem which I encounter on a daily basis, is that these programs are _very_ clearly CPU-bound (read: they eat up all your CPU time and use still way below 1GBit per core). Thus I'm really enthusiastic to try x32. (Once it's available in a mainstream distribution, that is. I've given up on Gentoo a few years ago...)

A Gentoo x32 release candidate

Posted Jun 6, 2012 22:27 UTC (Wed) by butlerm (subscriber, #13312) [Link] (6 responses)

> which should easily offset the tiny gains from making those programs x32.

Those programs, yes. There is a significant class of other programs that can be sped up by as much as 40% compared to x86-64. The advantage is so great that x32 is reasonably likely to predominate over the latter in the future, outside a relatively narrow set of applications.

A Gentoo x32 release candidate

Posted Jun 6, 2012 23:28 UTC (Wed) by andrel (guest, #5166) [Link] (5 responses)

I'll bite -- what are the classes of programs for which x32 gets a 40% speedup over x86-64?

A Gentoo x32 release candidate

Posted Jun 7, 2012 0:34 UTC (Thu) by dlang (guest, #313) [Link] (2 responses)

pointer heavy programs where the smaller pointer size lets more data fit in the cpu cache instead of the app having to wait for the data to be read in from memory.

I don't know any specific programs, but there are people who have reported that using 32 bit apps on 64 bit systems results in better performance than using 64 bit apps.

This seldom applies on the AMD64 architecture as 64 bit mode also gives you twice as many registers to use, but on Sparc and Power* systems this is a very common situation.

x32 is creating an equivalent architecture for the AMD64 systems.

A Gentoo x32 release candidate

Posted Jun 7, 2012 9:06 UTC (Thu) by dvandeun (guest, #24273) [Link] (1 responses)

I develop an interpreter for a toy language in Haskell on an old i3 540 MacBook with 32 bit ghc. When I compile it on a development server at the university, with fast Xeons and lots of cache and RAM, and 64 bit ghc, it is not faster on a quicksort benchmark. (This is of course a double effect: Haskell code uses lots of pointers, and quicksort on linked lists uses lots of pointers. On other benchmarks of my interpreter, the 64 bit server does better than the MacBook, but not spectacularly better.)

A Gentoo x32 release candidate

Posted Jun 10, 2012 3:48 UTC (Sun) by vonbrand (subscriber, #4458) [Link]

... not to mention that quicksort (which is designed for arrays) makes next to no sense on lists...

A Gentoo x32 release candidate

Posted Jun 7, 2012 21:48 UTC (Thu) by paulj (subscriber, #341) [Link]

I've measured the v8 JavaScript JIT to be slightly faster with i686 than AMD64, on javascript benchmarks. I'd expect x32 to be slightly faster again. Anything where memory usage is dominated by pointer rich data-structures (e.g. complex indices over small units of data) will be faster with x32, if it doesn't need the 32bit address space.

Also, as overall system memory usage is generally lower with x32, it allows, e.g., more VMs to be run for the same amount of memory.

A Gentoo x32 release candidate

Posted Jun 8, 2012 20:54 UTC (Fri) by butlerm (subscriber, #13312) [Link]

> I'll bite -- what are the classes of programs for which x32 gets a 40% speedup over x86-64?

The specific example I had in mind is 181.mcf, part of the SPEC 2000 CPU benchmark.

http://www.spec.org/cpu2000/CINT2000/181.mcf/docs/181.mcf...

I imagine that many Perl, Python, and Java programs will show comparable improvements, in addition to compilers, linkers, web browsers, xml processors, interpreters, x32 native kernels, and garbage collected languages in general.

With support for near and far pointers it is conceivable one could dramatically improve kernel performance as well, making an x32/x86-64 hybrid kernel perform nearly as well as an x32 native one, without losing the ability to support 64 bit applications.

A Gentoo x32 release candidate

Posted Jun 7, 2012 13:54 UTC (Thu) by foom (subscriber, #14868) [Link]

Chrome on Windows is only available as a 32bit binary. Chrome on linux is likely only available as x86-64 because 32-bit libraries are not always readily available on a x86-64 linux distributions, so it was necessary.

Why do you think that Chrome on Linux would actually need the 64-bit address space when the vast majority of the installs (Windows) are all 32bit and work great?

A Gentoo x32 release candidate

Posted Jun 18, 2012 7:47 UTC (Mon) by massimiliano (subscriber, #3048) [Link]

I very intentionally didn't mention the browser as an application you'd want to be 32-bit. I thought about Chrome's model of one-process-per-tab, and decided I still liked the larger address for mmap and IPC purposes. The browser (or, at least, most of it) should be 64-bit. Perhaps there'd be sufficiently low overhead to have just the JS engine 32-bit.

Well, for most of the world "Chrome" means "Chrome on Windows", and "Chrome on Windows" means "the 32bit Chrome build".

And since Chrome works pretty well on Windows I guess a 32bit build should work well also on our beloved Linux desktops...

In fact here (V8 development team) we work on 64bit Linux hosts but we test and develop 32bit x86 before anything else, and then make sure that also amd64 and arm work perfectly. But when we look at performance numbers we do it mainly on the 32bit builds.

A Gentoo x32 release candidate

Posted Jun 18, 2012 11:29 UTC (Mon) by hummassa (subscriber, #307) [Link]

The browser works by storing the DOM in a data structure that is crowded with pointers; add to that the fact that if you have one sandbox with over 3GB of data you are pretty much in the insane corner, I would guesstimate Chrome/Chromium as benefitting deeply from being 32bit.

A Gentoo x32 release candidate

Posted Jun 6, 2012 20:23 UTC (Wed) by jpnp (guest, #63341) [Link] (5 responses)

The issue is not the 4Gb RAM limit, but the few precious few Mb of cache. I have data structure (pointer) heavy code which has moderate memory requirements but requires a lot of manipulation. Smaller pointers equals better cache locality.

I'm confident they would benefit as they already benchmark better running as 32bit code on AMD64, adding the extra registers from the 64bit ABI can only aid the compiler.

Mind you, I don't see a great need for the whole OS to be X32, just support for X32 applications running on AMD64 for those workloads where it helps.

A Gentoo x32 release candidate

Posted Jun 7, 2012 0:43 UTC (Thu) by nybble41 (subscriber, #55106) [Link] (4 responses)

> The issue is not the 4Gb RAM limit, but the few precious few Mb of cache. I have data structure (pointer) heavy code which has moderate memory requirements but requires a lot of manipulation. Smaller pointers equals better cache locality.

You've probably already considered this, but for workloads like this, why not pre-allocate a moderate-sized pool of memory for this data and store just the offsets? That seems like a less intrusive solution than requiring multiple copies of system libraries to support amd64 and x32 side-by-side.

Also, is it too much to ask that x32 applications be capable of interacting with amd64 libraries? Perhaps merge x32 and x86_64 into a single ABI with "near" and "far" pointers? If mixed code always limits itself to a 32-bit address space, and x32 code uses the 64-bit system call ABI, then it should be possible to convert between "near" and "far" pointers transparently and use a single set of libraries for both modes. The only remaining issue that I can see is making sure the compiler knows which pointers need to be "far" pointers even when compiled in a x32 context (e.g. shared library header files).

A Gentoo x32 release candidate

Posted Jun 7, 2012 2:51 UTC (Thu) by butlerm (subscriber, #13312) [Link] (3 responses)

> You've probably already considered this, but for workloads like this, why not pre-allocate a moderate-sized pool of memory for this data and store just the offsets?

You can recompile well written programs for an ABI like this without any source code changes. Manually adding offsets, on the other hand, is slower and makes for unusually ugly looking code.

> Also, is it too much to ask that x32 applications be capable of interacting with amd64 libraries?

It is conceivable that shims could be provided for some 64-bit libraries, but in the general case (C++ libraries for example) it is not even practical.

Most initial x32 systems are likely to be x32 only. I wouldn't expect a desktop distribution to come with full libraries for both x32 and x86-64, one would probably either have x32 releases that come with a handful of 64 bit packages, or x86-64 releases that come with a handful of x32 packages.

A Gentoo x32 release candidate

Posted Jun 7, 2012 3:57 UTC (Thu) by nybble41 (subscriber, #55106) [Link] (2 responses)

> It is conceivable that shims could be provided for some 64-bit libraries, but in the general case (C++ libraries for example) it is not even practical.

I wasn't actually talking about providing shims. Rather, shared libraries would be compiled just as they are now in amd64 mode. The x32 programs would use 64-bit pointers in shared data structures and APIs, and 32-bit pointers in their own internal structures and APIs. Obviously, for this to work either the x32 parts or the dual-mode parts have to be marked somehow, e.g. with an attribute or a pragma line, so the compiler knows to use the larger pointers when compiling shared APIs for x32. Since any application with x32 components is guaranteed to run in a 32-bit address space, converting between the 64-bit and 32-bit pointers is trivial--the most significant 32 bits of the full-size pointers are always zero. Apart from marking the boundaries, the compiler can do all of the work.

> I wouldn't expect a desktop distribution to come with full libraries for both x32 and x86-64, one would probably either have x32 releases that come with a handful of 64 bit packages, or x86-64 releases that come with a handful of x32 packages.

The problem is the dependencies. To add just one moderately complex "foreign" package and you may end up needing duplicates of most of the system libraries. Some packages are relatively standalone, but what if you wanted, say, an x32 build of Chromium on an amd64 system? You'd need x32 builds of around 133 other packages[1] just to provide that one application.

[1] Estimated with: apt-cache depends --recurse -i chromium|awk '/^\s*Depends:\s+lib/{print $2;}'|sort -u

A Gentoo x32 release candidate

Posted Jun 7, 2012 4:15 UTC (Thu) by mikemol (guest, #83507) [Link]

The x32 ABI is, in part, a redefinition of how the C and C++ languages operate on x86-64. You're telling the compiler that hey, pointers and 'long' are 32-bit.

You're *not* going to be able to interlink x32 and x86-64 binaries while sharing headers unless you make those headers aware of the differing binary representations of the types...and if you do that, you're making things significantly more complicated over a broad cross-section of code. That means tons of bugs.

As for having per-arch copies of the same binaries...that's already status quo on multilib systems. Not that big of a problem, really. x32 is poised to replace the old 32-bit ABI, with its segmented memory model and relatively limited register and CPU instruction set, with a 32-bit ABI with more registers and a higher-level guaranteed minimum for CPU instruction set availability. x32, in a sense, represents the new "i686" minimum compiler target for x86 systems with a 32-bit ABI.

A Gentoo x32 release candidate

Posted Jun 8, 2012 7:34 UTC (Fri) by khim (subscriber, #9252) [Link]

Apart from marking the boundaries, the compiler can do all of the work.

Nope. Think about standard library. memcpy quite obviously does not need to convert pointers, but aio_read needs to do that. And if you pass structures with pointers to functions around then it becomes real ugly real fast.

x86-64 NaCl is independent reimplementation of x32 architecture (we plan to rebase our change on top of x32 when it'll be stable) and for initial benchmarks we used standard x86-64 glibc linked with our x32-like binary. This was a disaster: it was possible to compile and run few simpler SPEC CPU2000 benchmarks this way, but things like 253.perlbmk just refused to work properly.

When we've finally got the loader and libc ported we've dropped this mixed mode as a hot potato. It's not worth it, believe me.

Some packages are relatively standalone, but what if you wanted, say, an x32 build of Chromium on an amd64 system? You'd need x32 builds of around 133 other packages[1] just to provide that one application.

Right. This is a lot of work. But it's still simpler then to try to stitch Chromium from x32 pieces and x86-64 pieces.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds