LWN.net Logo

The 3.4 merge window is closed

Linus has released the 3.4-rc1 prepatch and closed the merge window for this development cycle. He had said this merge window could run a little long, but that's not how it turned out: "And yes, if you actually counted, it has only been 13 days. And if somebody delayed their pull request until the last day of the merge window, I'm sure they'll be even happier to delay it another two months until the next merge window. Yay!" Perhaps the most interesting thing to be merged since last week's merge window summary is support for the x32 ABI. There is still a lot of work to be done at the higher levels of the system, but x32 should eventually provide a higher-performing mode for x86-64 processors.
(Log in to post comments)

The 3.4 merge window is closed

Posted Apr 1, 2012 15:15 UTC (Sun) by zdzichu (subscriber, #17118) [Link]

X32 ABI? I hate April Fools' Day.

The 3.4 merge window is closed

Posted Apr 1, 2012 15:44 UTC (Sun) by Zenith (subscriber, #24899) [Link]

Read the linked article from LWN on x32, it seems legit.

Legit?

Posted Apr 1, 2012 15:55 UTC (Sun) by corbet (editor, #1) [Link]

x32 is indeed legit; I expect distributions to start supporting it, but not for a little while yet.

OTOH, this is not legit. Neither are this or this.

UUIDs for all!

Posted Apr 1, 2012 21:18 UTC (Sun) by jzbiciak (✭ supporter ✭, #5246) [Link]

Regarding that last one, perhaps they can integrate it with GConf like this.

the gconf registry

Posted Apr 2, 2012 2:17 UTC (Mon) by abartlet (✭ supporter ✭, #3928) [Link]

Honestly, we tried, but we just could not make it work (or make sense ;-)

Legit?

Posted Apr 3, 2012 13:29 UTC (Tue) by jengelh (subscriber, #33263) [Link]

>https://lwn.net/Articles/490042/ [replace static names by UUID]

Why, this has some merit even if not legit. (For certain values of merit.) After all, people are already using e.g. /dev/disk/by-uuid/. Might as well turn the symlinks into real block device nodes :-)

The 3.4 merge window is closed

Posted Apr 1, 2012 16:15 UTC (Sun) by gmaxwell (subscriber, #30048) [Link]

I wish. The bad old days of software being limited to 4GiB ram by default, having two sets of libraries, and lots of code not working when sizeof(int)!=sizeof(void*) are on their way back. :(

The 3.4 merge window is closed

Posted Apr 1, 2012 21:32 UTC (Sun) by jzbiciak (✭ supporter ✭, #5246) [Link]

I'm a little confused. We already have two sets of libraries on most 64-bit systems. (At least, the ones I use do.) This adds a third set, actually.

I happen to like the idea of something like this, personally. Pointer-heavy programs blow up in size on 64-bit machines with little to show for it if they don't actually need 2+ gigabytes. (I say 2GB, because that's the more usual cutoff for x86 32-bit programs.) I've only superficially reviewed specifics of x32, though, so I can't say for certain whether I like their approach overall. What warts I did see mainly seem to be driven by practical issues, so it's hard for me to argue against them. It looks promising.

I've wanted a "small mode 64-bit" environment for awhile--that is, 32-bit pointers, but otherwise you get all the other goodies such as native 64 bit arithmetic when you need it, more registers to play with, and a modernized calling convention.

Of course, you could always rewrite your pointer-heavy application to manage memory manually and maintain "pointers" as offsets relative to the application heap. But, you lose out on a ton of language help here making it work, and debuggers won't know what you're up to either. Sure, in C++ you can make a type that overrides operator* and friends, but every abstraction is leaky, and that one would be leakier than most.

The 3.4 merge window is closed

Posted Apr 1, 2012 21:49 UTC (Sun) by gmaxwell (subscriber, #30048) [Link]

I don't know what you're running— but Fedora runs fine without having any 32-bit libraries at all. I know that debian was seriously behind on the initial x86_64 transition but I assume that they would have caught up by now.

Of course, there are programs that actually need less than 2 gigabytes— but perhaps fewer than you think because memory usage is a runtime requirement for most non-trivial programs. I like the fact that my browser no longer crashes due to running out of VM when I have many hundreds of tabs open, thank you!

(And I like the fact that when I want to work on larger data sets than the author of the software expected I usually don't have to waste hours porting the software to 64-bit anymore...)

There is savings— but whats the set of pointer-heavy programs which are large enough that the savings of halving pointer sizes matters but which are small enough that the loss of virtual memory space isn't a material restriction? And is that set big enough to offset the constant cost of yet another copy of all the common shared libraries in memory the moment you open something that can make use of more memory (like a browser)?

(and— I fully expect that the cost of yet another copy of all the libraries will be enough that a lot of things which really should be build as 64 bit to be built as x32— but I suppose we'll see).

Of course you could rewrite your uses-lots-of-memory application to manage memory more intelligently or do its own swapping. But just like the pointer compression that often makes a mess— and its development work that isn't likely to happen.

I too see the advantages of having a small memory model, but I think the realities of shared libraries don't make it a realistic tradeoff on a whole system level. I'll be happy to have my negative expectations disproven.

The 3.4 merge window is closed

Posted Apr 1, 2012 22:06 UTC (Sun) by jzbiciak (✭ supporter ✭, #5246) [Link]

Compilers, for one thing, are quite pointer heavy. Granted, recent features like LTO can really blow up the memory usage. (I hear GCC recently trimmed the footprint for LTOing Firefox from monumental 8GB to a merely staggering 3GB.) But for my own work, I don't remember ever having a compile fail due to exhaustion of the virtual address space on a 32-bit machine. I have, however, had compiles fail due to exhausting physical memory, when trying to build a GNU toolchain some years back on a PC 7300. That was a bit different.

Interpreters and simulators are also both pointer heavy, but not necessarily memory heavy. I run a ton of interpreted code (Perl scripts, mostly), and none of that really benefits from the large address space. My perl scripts are either moving files around, or streaming information through. I also run (and occasionally write) instruction set simulators. (We make new processors at work.) Lots of pointers there, especially function pointers.

As for what system I'm running: Ubuntu at home, RHEL and SLES at work. All three have 32-bit libraries installed. At home, it's for convenience--I have some 32-bit binaries kicking around. At work, it's because we have many, many binary-only packages that are 32-bit. Even though we may have upgraded 64-bit versions available also, we often need to carry multiple versions around for long-running projects that fix on a tool version. And some have no 64-bit version at present. So, yeah, x32 would add a third set of libraries on those systems.

The 3.4 merge window is closed

Posted Apr 2, 2012 5:05 UTC (Mon) by JoeBuck (subscriber, #2330) [Link]

Nevertheless, x32 can be a large enough win that many might find it to be worth the pain. What x32 gives you is 32-bit pointers, but the large register set as well as the 64-bit operations provided by the x86-64 architecture. The performance is significantly better than that of traditional x86 32-bit code, without the 2x size penalty for pointer-heavy programs. It's getting a big push from Intel, and that's what HJ Lu is spending much of his time on these days.

If you're an architect for a distro, you might want to spend a few cycles thinking about how you will eventually accommodate it.

The 3.4 merge window is closed

Posted Apr 2, 2012 5:51 UTC (Mon) by jzbiciak (✭ supporter ✭, #5246) [Link]

I think you and I are in 100% agreement here. I'm OK with the third set of binaries, really. Actually, I just wish Linux would get over itself and allow for fat binaries, or at least fat libraries.

The 3.4 merge window is closed

Posted Apr 2, 2012 5:52 UTC (Mon) by jzbiciak (✭ supporter ✭, #5246) [Link]

errr.... third set of libraries.

The 3.4 merge window is closed

Posted Apr 2, 2012 7:24 UTC (Mon) by HelloWorld (guest, #56129) [Link]

What for? Debian's Multiarch works just fine, there's no need for fat binaries.

The 3.4 merge window is closed

Posted Apr 3, 2012 13:32 UTC (Tue) by jengelh (subscriber, #33263) [Link]

What will be the MultiArch tuple for x32 binaries, btw? And what will we use for ./configure --host=? In the SPARC land, "sparcv9" seems to have become the tag to use when referring to ELF32 binaries able to use 64-bit instructions.

The 3.4 merge window is closed

Posted Apr 3, 2012 16:51 UTC (Tue) by dlang (✭ supporter ✭, #313) [Link]

it would seem to me that 'x32' in place of 'amd64' or 'i386' would be the obviously right answer (they may do something else, but why?)

The 3.4 merge window is closed

Posted Apr 3, 2012 17:02 UTC (Tue) by jengelh (subscriber, #33263) [Link]

Just wanted to be sure. After all, they already chose amd64 instead of x86_64 :-) [Then again, amd64 would be a lot more creditgiving would it appear in uname -m]

The 3.4 merge window is closed

Posted Apr 3, 2012 17:10 UTC (Tue) by dlang (✭ supporter ✭, #313) [Link]

remember that at the time they named it amd64, Intel was very opposed to it, only AMD was shipping it, and there wasn't a x86 architecture, instead there were i[3456]86 architectures

The 3.4 merge window is closed

Posted Apr 2, 2012 11:21 UTC (Mon) by farnz (guest, #17727) [Link]

For those of us with MIPS experience, this is a lot like o32/n32/n64 ABIs. The established i386/x86-32 ABI is equivalent to o32. The amd64/x86-64 ABI is equivalent to n64; x32 is an attempt to add an n32 equivalent to x86-64.

For those without MIPS experience; o32 is the legacy ABI for 32-bit only processors, and is only used on systems that can't run n64 binaries. n64 is the full fat 64-bit ABI. n32 is equivalent to n64 (and interworking between the two ABIs is simple, with care over pointers coming from the n64 to n32 world), but with 32-bit pointers instead of 64-bit pointers, and is reasonably common as a result.

The 3.4 merge window is closed

Posted Apr 2, 2012 10:21 UTC (Mon) by ballombe (subscriber, #9523) [Link]

> I don't know what you're running— but Fedora runs fine without having any 32-bit libraries at all. I know that debian was seriously behind on the initial x86_64 transition but I assume that they would have caught up by now.

Actually, initial Debian x86_64 releases only included the 64bit libraries and almost no 32bit libraries, so certainly system can run without 32bit libraries at all.

The 3.4 merge window is closed

Posted Apr 2, 2012 11:13 UTC (Mon) by jzbiciak (✭ supporter ✭, #5246) [Link]

Ok, so I'll withdraw my comment about "most systems." My experience with 64-bit Linux has only been with systems that did install quite a few 32-bit compatibility libs. They weren't there for the OS itself, but rather for everything we were running on them.

Then again, the machines at work are part of a heterogeneous compute farm (mixture of Linux and Solaris, 32-bit and 64-bit), so it's not too surprising. We also have plenty of binary-only 32-bit apps kicking around.

(All that said, our newest SLES systems don't seem to have quite as many 32-bit compat libraries installed. I believe we're all being nudged in the direction of full 64-bit environments. I still have a 32-bit machine under my desk, though.)

The 3.4 merge window is closed

Posted Apr 2, 2012 11:44 UTC (Mon) by paulj (subscriber, #341) [Link]

It's not just about absolute memory-size savings, but also about speed. If your application is memory-traffic (not size) intensive, having to traverse heavily linked data-structures, then 32bit pointers can help make your data more compact and so make more efficient use of the memory bandwidth, which can lead to your programme performing faster.

The 3.4 merge window is closed

Posted Apr 2, 2012 13:02 UTC (Mon) by james (subscriber, #1325) [Link]

...and more efficient use of cache, of course.

Multiprocess web-browsers

Posted Apr 2, 2012 16:12 UTC (Mon) by gmatht (guest, #58961) [Link]

I understand chrome has a process per website, so it can allocate more than 4GB in total. Firefox will probably do this eventually too, as it can help security.

Personally, I usually prefer an application to crash rather force my entire system into swap-death. There are a number of processes that have no justification for allocating even 1GB, but feel the need to emulate a "while(1){malloc(1)}". x32 could be quite nice arch for my 2GB netbook, since it doesn't really have memory or CPU to waste, and perhaps also for some VMs I run (though I am not sure x86 is actually faster than x32/x64 code when running in a VM).

The 3.4 merge window is closed

Posted Apr 2, 2012 3:39 UTC (Mon) by kevinm (guest, #69913) [Link]

The cutoff for x86 programs running on an x86-64 kernel is actually 4GB, since the x86-64 kernel lives entirely outside of the first 32 bits worth of address space (apart from the vdso page).

The 3.4 merge window is closed

Posted Apr 2, 2012 3:50 UTC (Mon) by jzbiciak (✭ supporter ✭, #5246) [Link]

I did not realize that. And, sure enough, I tried it with a simple program and was able to allocate 4GB. Consider me better informed.

elysium:/tmp$ cat alloc.c 
#include <stdio.h>
#include <stdlib.h>

int main()
{
    int alloc = 0;

    while (malloc(4096))
        alloc++;

    printf("Allocated %lld bytes\n", (unsigned long long)alloc * 4096ull);

    return 0;
}
elysium:/tmp$ gcc -m32 alloc.c 
elysium:/tmp$ file a.out 
a.out: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.15, not stripped
elysium:/tmp$ ./a.out 
Allocated 4282732544 bytes

Minimal DoS

Posted Apr 3, 2012 19:41 UTC (Tue) by geuder (subscriber, #62854) [Link]

Somewhat unrelated, but your program can even be made shorter to produce a minimal denial-of-service attack.
#include <stdlib.h>

int main()
{
  while (malloc(4096)) {}
  return 0;
}
When executing it my GUI freezes immediately for a couple of minutes, until the 4 GB have been allocated (in the case of a 32 bit executable) or OOM kills the process (in the case of a 64 bit executable). Well, at least on the not so brand new 2.6.37 I happened to read this on. Not sure whether the famous scheduler fix or writeback throttling or some other new feature would have prevented that experience.

Minimal DoS

Posted Apr 3, 2012 19:55 UTC (Tue) by jzbiciak (✭ supporter ✭, #5246) [Link]

I admit to not really wanting to try it in 64-bit mode on my machine. I've got 16GB of RAM and 48GB of swap, so it could be a while before my machine's usable again once it did start getting into swap. I'm on an even older kernel (2.6.35).

Minimal DoS

Posted Apr 4, 2012 10:23 UTC (Wed) by niner (subscriber, #26151) [Link]

ulimits are your friends ;) Learned that in a painful way but they simply eliminate this problem completely

Minimal DoS

Posted Apr 4, 2012 10:29 UTC (Wed) by jzbiciak (✭ supporter ✭, #5246) [Link]

Well, sure, but that doesn't really let that program go truly hog wild, now does it? Once I establish a ulimit, I'm only testing that ulimit works, right? (That is unless my limit is too high...)

Minimal DoS

Posted Apr 5, 2012 7:07 UTC (Thu) by geuder (subscriber, #62854) [Link]

> ulimits are your friends

Of course, if the goal were to forbid memory consumption over a certain limit.

But my goal would be to slow down the memory hog just enough such that the overall system remains responsive. That should be possible in a multi-tasking system. (I have 4GB of RAM and 5 GB of swap, more than enough to run the 32 bit binary without fatally impacting the rest of the system. The 64 bit binary needs to be killed at some point, so setting a ulimit of ~ 6 GB virtual memory might be appropriate. But having the system unresponsive right away when it's started is clearly suboptimal.)

Maybe I could do that with cgroups and freezing, I have not looked into it now. My naive expectation would just have been that this is already been taken care of in a major distro.

Minimal DoS

Posted Apr 5, 2012 7:11 UTC (Thu) by geuder (subscriber, #62854) [Link]

Hmm, actually there is a ulimit, which should have killed the 64 bit executable before OOM

> $ ulimit -a | grep virt
> virtual memory (kbytes, -v) 4861680

Not sure why it did not work, no time to investigate it now (given that every attempt will turn this (production) machine unusable for a couple of minutes.

The 3.4 merge window is closed

Posted Apr 1, 2012 16:50 UTC (Sun) by grobian (guest, #83608) [Link]

It's actually working already:

markus@x4 ~ % g++ -mx32 -w -O3 -ffast-math -march=native tramp3d-v4.cpp
markus@x4 ~ % ./a.out --cartvis 1.0 0.0 --rhomin 1e-8 -n 20
...
Time spent in iteration: 5.19397

markus@x4 ~ % ldd ./a.out
linux-vdso.so.1 (0xffbff000)
libstdc++.so.6 => /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.0/x32/libstdc++.so.6 (0xf76a1000)
libm.so.6 => /libx32/libm.so.6 (0xf7312000)
libgcc_s.so.1 => /libx32/libgcc_s.so.1 (0xf768a000)
libc.so.6 => /libx32/libc.so.6 (0xf6fb2000)
/libx32/ld-linux-x32.so.2 (0xf7595000)

markus@x4 ~ % ll ./a.out
1572324 Apr 1 18:31 ./a.out

(64bit default, slightly slower and the binaries are bigger)
markus@x4 ~ % g++ -w -O3 -ffast-math -march=native tramp3d-v4.cpp
markus@x4 ~ % ./a.out --cartvis 1.0 0.0 --rhomin 1e-8 -n 20
...
Time spent in iteration: 5.22383

markus@x4 ~ % ll ./a.out
1634176 Apr 1 18:32 ./a.out

x16 ABI coming up

Posted Apr 2, 2012 2:47 UTC (Mon) by ncm (subscriber, #165) [Link]

While a 32-bit address space is needed for many programs, many others would run fine in a 16-bit space. With 16-bit pointers and 16-bit int, data structures would be much smaller than on x64 or x32. Better yet, the complete data sets of dozens of programs would fit in L3 cache. A machine with no RAM at all, just a CPU, would be useful in many applications where the need for separate RAM and a RAM controller add prohibitive expense.

Carefully designed, an x16 mode would enable four-slice SIMD programming on commodity hardware, using a quarter of each register for each slice. Indeed, x32 could run with two slices. A kernel confined to 4G is little inconvenienced, but a kernel that can do twice as many operations in many cycles may be noticeably faster. Gcc already generates code for Itanic; can sliced x32 be difficult to add?

x16 ABI coming up

Posted Apr 2, 2012 3:05 UTC (Mon) by dmarti (subscriber, #11625) [Link]

"Carefully designed" -- I'm putting that in for Understatement of the Year. It would be an amazing project though.

x16 ABI coming up

Posted Apr 4, 2012 4:55 UTC (Wed) by ncm (subscriber, #165) [Link]

Probably you'd only get to use half of each register -- odd-numbered registers for odd slices, even for even -- when doing carry arithmetic.

A great advantage of an SSE ABI is that the kernel promises not touch those registers. If it could then be persuaded not to push the other registers, context switches ought to get very quick -- great for interrupt latency. That is, until you build the kernel using the SSE ABI, too...

x16 ABI coming up

Posted Apr 2, 2012 3:57 UTC (Mon) by jzbiciak (✭ supporter ✭, #5246) [Link]

L3? A 64K working set could fit in L1 in many modern processors, and in L2 in the rest. Ok, technically it's "64K 'elements'" where an "element" might be a char, short, int, long or long long. That gives you a potential working set up to 512K in the case of long long and a potential working set of 128K for the more common int. That's still within the bounds of most L2s, though.

Carefully designed, an x16 mode would enable four-slice SIMD programming on commodity hardware, using a quarter of each register for each slice.

Provided you figure out how to make them all branch together. ;-)

I'm sure whatever you come up with will be Lirpa 1 compliant, should you attempt it.

x16 ABI coming up

Posted Apr 2, 2012 4:44 UTC (Mon) by ncm (subscriber, #165) [Link]

Note "dozens". But yes, it's a good idea to fit your whole program and all its data structures in L1 cache.

Speaking with entire seriousness, I have read of a brilliant implementation of AES that uses 8 128-bit SSSE registers - one register per bit - and enciphers 128 bytes in 256 cycles. It's sadly obsoleted by aes-ni instructions.

x16 ABI coming up

Posted Apr 2, 2012 5:48 UTC (Mon) by jzbiciak (✭ supporter ✭, #5246) [Link]

Ah yes, bitslicing. I was able to implement a certain stream cipher in about 1 cycle per bit on our DSP by doing 32 parallel blocks in a bit-slice configuration. This was about 25x as fast as the best non-bitsliced implementation. At the time (and to my knowledge), our implementation was the only software implementation they ever certified.

(I won't say which, but the company that owns the algorithm will happily sell you a synthesizeable accelerator, and their algorithm is in the standard. Furthermore, they're responsible for system certification, so... The software implementation was practical because of how I sped it up. It was certifiable because of various hardware security features we had developed.)

Still... 2 cycles/bit seems a bit slow. IIRC, our DSP achieves that on AES without bitslicing. Granted, though, that's assuming everything is all in cache a priori...

And yes, I noted "dozens." I refrained from trying to list them...

x16 ABI coming up

Posted Apr 2, 2012 5:49 UTC (Mon) by jzbiciak (✭ supporter ✭, #5246) [Link]

I should say "2 cycles/bit seems slow for a bit-sliced AES." Of course, I've never tried to write / benchmark AES on an x86.

x16 ABI coming up

Posted Apr 3, 2012 0:22 UTC (Tue) by ncm (subscriber, #165) [Link]

Two cycles per _byte_, on the other hand, is stellar.

x16 ABI coming up

Posted Apr 3, 2012 1:30 UTC (Tue) by jzbiciak (✭ supporter ✭, #5246) [Link]

Ah yes, with that I must agree. For some reason I read that as 128 bits, not 128 bytes. Mea culpa.

One of these days I'll have to give something like that a try on one of our newer processors that have wider operations. The bitsliced stream cipher I mentioned only had 32 lanes because I used 32-bit registers. The magical thing about bitslicing algorithms like this is that you can go even faster (at least for parallelizable blocks) just by making the variables wider.

Thinking about AES specifically... The S-box must've been a bear! LUTs don't work very well in a bitslice world, and IIRC the AES S-boxes are 8-input, 8-output, so rendering them as a system of binary functions of 8 variables can also be messy. (I haven't looked to see just how reduceable they are or aren't, but I suspect they're pretty tough.)

Reducing the logic functions for S-boxes is an enterprise in its own right. For the unnamed algorithm I mentioned previously, I was able to take the total logic operations for all its S-boxes from around 280 down to around 140 using a special solver that tried to find minimal tree-like sequences of instructions to evaluate all possible boolean functions of five variables. (The 280 vs. 140 was measured across the entire set of S-boxes.) I did this after multiple compilers and synthesis tools failed to reduce the logic operation count below ~280.

Of course, I found out after-the-fact that Donald Knuth was playing in the same space at about the same time, and came up with an even better approach than mine.

Aaaaanyway... I'm horribly off topic. I'll stop now.

x16 ABI coming up

Posted Apr 4, 2012 2:32 UTC (Wed) by ncm (subscriber, #165) [Link]

To list them, use ps(1).

Probably I should have written "dozens of processes", instead.

Has anybody booted Linux on a desktop CPU with no RAM, yet, using only L3 cache for volatile storage? Maybe it's still possible to show off.

x16 ABI coming up

Posted Apr 4, 2012 4:13 UTC (Wed) by Fowl (subscriber, #65667) [Link]

The bootloaders burned into most motherboards are a bit fussy about ram being installed, unfortunately.

x16 ABI coming up

Posted Apr 4, 2012 4:45 UTC (Wed) by ncm (subscriber, #165) [Link]

Yes, but as noted below, coreboot might be made more forgiving. Probably any DMA must be avoided...

x16 ABI coming up

Posted Apr 2, 2012 7:22 UTC (Mon) by elanthis (guest, #6227) [Link]

While I'm about 87.3% sure you're foolin'... it's impossible to add, due to the nature of how the x86 instruction set actually works, and "slicing" as you imply it is impossible to do without a lot of instruction overhead to emulate it. x32 does not allow for addressing just the upper half of a 32-bit register, in particular. You'd have to copy values into other register, shift and mask them, modify them, shift them back, combine them with the destination register, and then finally store them. The extra x86_64 register space would be offset by the excess registers needed to get anything done.

Also, I really doubt that many useful applications can fit into a 16-bit address space anymore, given that the code size of many essential system libraries is already larger than 64k. The data sets you can work on are small, the algorithms small, and hence are suitable to just be written with 16-bit ints and 16-bit offsets into buffers. This is vastly different than the 32-bit world, which is still large enough to handle large data-sets and very huge, complex codebases.

x32 is just about giving the ISA improvements to applications that perform better with 32-bit addressing. x16 would be about inventing a new retarded emulated ISA for applications that would perform better with 32-bit addressing.

x16 ABI coming up

Posted Apr 2, 2012 7:29 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

You can do slicing just fine with SIMD instructions.

In fact, there's an ultra-fast XML parser that works based on this technique: http://parabix.costar.sfu.ca/

And yes, it's really really fast.

x16 ABI coming up

Posted Apr 2, 2012 13:28 UTC (Mon) by etienne (subscriber, #25256) [Link]

Maybe you would not be able to memory-map the libraries you would use in 64 Kbytes, but sometimes I miss variable-size pointers in C, like:
char *ptr; // 32 bits (in fact system default)
char *short shortptr; // 16 bits
char *long longptr; // 64 bits
The shortptr is usefull for ia32 "mov (%bx),%eax" but also for risc processors where you cannot load a 32 bits immediate value to a register in a single instruction " lis r9,0x1234 ; ori r9,0x5678 ".
Sometimes you know that the upper 16 bits will not change in two different pointers, so the lower 16 bits would be a "short pointer".
Also, in some C structures defined by a standard, some addresses are 64 bits wide even on 32 bit environment, it would be nice to be able to declare those fields as 64 bits pointers...

x16 ABI coming up

Posted Apr 3, 2012 13:37 UTC (Tue) by jengelh (subscriber, #33263) [Link]

>char *ptr; // 32 bits (in fact system default)
>char *short shortptr; // 16 bits
>char *long longptr; // 64 bits

Why not just reuse "near char *shortptr" and "far char *longptr" :-)

x16 ABI coming up

Posted Apr 4, 2012 10:52 UTC (Wed) by etienne (subscriber, #25256) [Link]

Well I did not want to add the concept of the old FAR pointer (16 bits segments + 16 bits offsets), and pointer attributes (const, volatile) are already written at the position I proposed:
char *const constptr;
But source code would be simpler (less asm("") statements) if we had more choices of pointers, like:
pointers to I/O space (inb/outb)
pointers to MSR space (rdmsr/wrmsr; mfspr/mtspr)
pointers to PCI space
pointers to segmented space (16+32 bits gs:(%ebx) )
pointers to kernel/user space (even for x86 architecture)
pointers to physical memory vs virtual memory
Maybe the source code of GCC would not be as simple...

x16 ABI coming up

Posted Apr 4, 2012 12:42 UTC (Wed) by PaXTeam (subscriber, #24616) [Link]

gcc 4.6+ has some support for the C11 named address space feature, some on your list could be simulated that way (in PaX there's a plugin that (ab)uses this mechanism to implement __user/__kernel/etc).

x16 ABI coming up

Posted Apr 2, 2012 18:20 UTC (Mon) by njs (guest, #40338) [Link]

I think this is what the GPU programming folks are actually doing.

x16 ABI coming up

Posted Apr 2, 2012 18:50 UTC (Mon) by khim (subscriber, #9252) [Link]

Not just GPU programmers. But that's not the same. Addresses are usually kept as 32bit or 64bit in this scheme. Only data is reduced to 16bit.

x16 ABI coming up

Posted Apr 3, 2012 0:31 UTC (Tue) by ncm (subscriber, #165) [Link]

I, also, am only 87.3% sure I was fooling. An ABI that (mostly?) only used the SSSE registers could be interesting, given good compiler support.

But it's getting harder and harder to write April Fools' jokes. Perhaps the death knell for the form was The Onion's c.2000 headline "Long National Nightmare of Peace and Prosperity Finally Over".

x16 ABI coming up

Posted Apr 3, 2012 0:50 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

Hm!

There IS such an ABI and a compiler. CoreBoot uses it for the code to initialize RAM controller.

x16 ABI coming up

Posted Apr 3, 2012 4:26 UTC (Tue) by ncm (subscriber, #165) [Link]

See, this is exactly what I mean. There's no room for japes any more. Linux on 6502? Done. Linux on x86 emulator coded in Javascript running under Firefox? Done. Probably this is a direct corollary of Rule 34.

The 3.4 merge window is closed

Posted Apr 4, 2012 11:05 UTC (Wed) by mads (subscriber, #55377) [Link]

Pardon me for asking maybe a stupid question, but ... this ABI does have the ability to use more RAM than the 4GB limit, I hope? I mean, it's only each application's virtual memory that's limited to 4GB?

Can you have several programs occupying 4GB of virtual memory each if you have enough RAM for it?

The 3.4 merge window is closed

Posted Apr 4, 2012 11:51 UTC (Wed) by Jonno (subscriber, #49613) [Link]

yes, it does support more than 4GB total, just not more than 4GB per process. But then, so does running x86_32 code on an x86_64 kernel.

The 3.4 merge window is closed

Posted Apr 4, 2012 12:36 UTC (Wed) by khim (subscriber, #9252) [Link]

Actually it looks like it should be possible to use >4GB from a single process. But you'll need to call syscalls directly.

The 3.4 merge window is closed

Posted Apr 5, 2012 15:11 UTC (Thu) by Jonno (subscriber, #49613) [Link]

Well, you also need to redefine any structs those syscals use using uint64_t instead of pointers, as the structs (and syscall number constants) in your system headerd are for the 32-bit syscalls. And you will not be able to get pointers or referenses to that memory, so doing any math on that data is out of the question. It could be usefull for caches though, as doing memory allocation and memcopies over syscalls should be doable.

The 3.4 merge window is closed

Posted Apr 6, 2012 15:25 UTC (Fri) by slashdot (guest, #22014) [Link]

Uh? Any decent OS on any decent CPU (i.e. those with either an MMU or segmentation) with any ABI can do this.

Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds