The x32 subarchitecture may be removed [LWN.net]

The x32 subarchitecture may be removed

Posted Dec 12, 2018 17:05 UTC (Wed) by fuhchee (guest, #40059) [Link] (7 responses)

Too bad x32 came so late --- by that time, people were willing to accept the extra memory consumption.

The x32 subarchitecture may be removed

Posted Dec 13, 2018 15:44 UTC (Thu) by jimuazu (guest, #129212) [Link] (6 responses)

Exactly. For a long time I ran a multi-arch i386 + x86_64 setup with Debian, just so that I could run browsers in 32-bit mode, because they were the greatest resource hogs. For my next machine I got 20GB of RAM and switched to 64-bit everything, because x32 wasn't ready. I'd still use x32 if it was available in the distro, but even now I don't see viable support (or maybe I'm looking at the wrong pages). So how can they say it's not popular enough when it hasn't even reached Joe User yet?

The x32 subarchitecture may be removed

Posted Dec 13, 2018 23:03 UTC (Thu) by farnz (subscriber, #17727) [Link] (5 responses)

It's been around for at least 6 years, and it's still not popular enough to reach Joe User. I'd say that qualifies as "not popular enough", personally - if it was worthwhile, surely Debian would have an x32 port by now?

The x32 subarchitecture may be removed

Posted Dec 14, 2018 10:43 UTC (Fri) by gspr (subscriber, #91542) [Link] (4 responses)

Debian does have an x32 port, although it is not a release architecture: https://wiki.debian.org/X32Port

The x32 subarchitecture may be removed

Posted Dec 14, 2018 12:41 UTC (Fri) by lkundrak (subscriber, #43452) [Link] (3 responses)

From https://debian-x32.org/:

> What doesn't [work]: Gnome3, Iceweasel (WIP in #775321), Chromium (needs llvm), libreoffice (some java JNI issue). KDE works but is buggy. Sound ...

There's a warning on the site that the information might be outdated, but this might still explain why x32 isn't all that popular.

Also, the proposed Iceweasel (Firefox) patches just disable optimizations, which sort of defeats the purpose of an ABI whose solemn purpose seems to be to squeeze out extra bits of performance.

The x32 subarchitecture may be removed

Posted Dec 14, 2018 12:43 UTC (Fri) by lkundrak (subscriber, #43452) [Link]

> proposed Iceweasel (Firefox) patches

some of proposed Iceweasel (Firefox) patches

The x32 subarchitecture may be removed

Posted Dec 14, 2018 21:41 UTC (Fri) by sorokin (guest, #88478) [Link] (1 responses)

> Also, the proposed Iceweasel (Firefox) patches just disable optimizations, which sort of defeats the purpose of an ABI

Disabling optimization is a bit ambiguous term. One might think it is about compiler optimization. Apparently the patches are about disabling assembler code in libjpeg-turbo because yasm doesn't support x32. The other change is disabling JIT for javascript.

The x32 subarchitecture may be removed

Posted Dec 16, 2018 0:51 UTC (Sun) by louie (guest, #3285) [Link]

"The other change is disabling JIT for javascript."

Other than that, Mrs. Lincoln, how was the play?

The x32 subarchitecture may be removed

Posted Dec 12, 2018 17:19 UTC (Wed) by jccleaver (guest, #127418) [Link] (3 responses)

> Yes, but more and more ${foo}64ilp32 architectures are popping up.

The discussion about similarity to other architectures with different sizes brings up good points. Rather than remove x32, I feel like Rich's point about this being an important test case should provide motivation for keeping it around.

https://lwn.net/ml/linux-kernel/20181211233316.GN23599@br...

If there's funkiness in x32 *per se*, then maybe abstract it out so that it and other architecture variants that need to translate syscalls do so in a more maintainable manner, but don't rip the whole thing out due to perceived lack of use if other groups still will have to solve that problem somehow. That just makes more work overall.

The x32 subarchitecture may be removed

Posted Dec 13, 2018 0:46 UTC (Thu) by luto (guest, #39314) [Link] (2 responses)

Alas, the main lesson that seems to have been learned from x32 is “don’t do that.” The as-yet-unmerged arm64 ILP32 ABI explicitly does *not* work like x32.

The x32 subarchitecture may be removed

Posted Dec 13, 2018 16:51 UTC (Thu) by jensend (guest, #1385) [Link] (1 responses)

Would you be willing to enlighten us a little as to how arm64 ILP32 is so different?

The x32 subarchitecture may be removed

Posted Jan 8, 2019 18:57 UTC (Tue) by plugwash (subscriber, #29694) [Link]

The definitions of many key data structures change between 32-bit and 64-bit versions of a given architecture. The design of some APIs (for example ioctl) mean that the data structures used by userland can reach deeply into the kernel. To support this the Linux kernel has a compatibility flag which runs equally deeply into the kernel.

x32 was designed (under Linus's direction) to be as similar to x86-64 as possible, only differing from it in cases where doing so was nessacery to follow posix. Unfortunately this lead to it falling unconfortably between the two stools, it's data structures end up being different from both x86 and x64 but the framework designed to support 32-bit compatibility on 64-bit kernels was only designed to support a single backward compatibility mode.

As I understand it arm64ilp32 instead takes the approach of making the new ilp32 mode as similar to the old 32-bit mode as possible. This approach keeps the changes required more localised but Linus did not like it because it also brings forward legacy baggage from the 32-bit architecture (the example Linus used was the year 2038 issue).

The x32 subarchitecture may be removed

Posted Dec 12, 2018 17:45 UTC (Wed) by sorokin (guest, #88478) [Link] (15 responses)

As I understand the main complain about x32 is its fragile and messy implementation in kernel. Perhaps this can be addressed by some refactoring inside the kernel without ditching the ABI support completely?

Removing x32 support completely seems like a step backward to me. The benefits of using 32-bit pointers in memory bandwidth limited application are measurable. From my understanding CPU are becoming more and more memory bandwidth limited over time. Also typically the higher the abstraction level of the programs the more memory bandwidth demanding the program is. I guess halving the size of pointer should definitely help and I'm absolutely sure that x32 abi is a much cleaner way of having 32-bit pointers on 64-bit machines than any other approach.

It's a pity that x32 haven't got bigger adoption yet, but my understanding it's more a chicken and egg problem than a technical problem.

The x32 subarchitecture may be removed

Posted Dec 12, 2018 17:56 UTC (Wed) by josh (subscriber, #17465) [Link] (11 responses)

> The benefits of using 32-bit pointers in memory bandwidth limited application are measurable.

You can still use 32-bit pointers (or indexes) in your own application in x86-64 mode, if you want to minimize memory usage. Some applications do that.

The x32 subarchitecture may be removed

Posted Dec 12, 2018 18:23 UTC (Wed) by sorokin (guest, #88478) [Link] (9 responses)

> You can still use 32-bit pointers (or indexes) in your own application in x86-64 mode, if you want to minimize memory usage. Some applications do that.

In theory yes. This is what I referred as "less clean other approaches". One can replace pointers with indices if all data is stored in one big array. Not all programs are like this. GCC is an example of a pointer-heavy application that has no easy way to replace all pointers with indices.

The x32 subarchitecture may be removed

Posted Dec 12, 2018 18:34 UTC (Wed) by josh (subscriber, #17465) [Link] (8 responses)

You don't have to keep all data in one big array. You could allocate memory using your own custom allocator and know that all pointers are in the same 32-bit range. (mmap with MAP_32BIT would make that relatively easy.)

It's not necessarily a good idea, but then, most applications shouldn't be doing it.

The x32 subarchitecture may be removed

Posted Dec 12, 2018 18:53 UTC (Wed) by sorokin (guest, #88478) [Link] (7 responses)

> You don't have to keep all data in one big array. You could allocate memory using your own custom allocator and know that all pointers are in the same 32-bit range. (mmap with MAP_32BIT would make that relatively easy.)
> It's not necessarily a good idea, but then, most applications shouldn't be doing it.

Definitely not a good idea. Consider one has a pointer-heavy application and one wants to move it to 32-bit pointers. He has to replace pointers (T* with something like lowmem_ptr<T>), smart pointers (unique_ptr<T> with lowmem_unique_ptr<T>), smart point factories (make_unique<T> with lowmem_make_unique<T>), containers (does specifying lowmem_allocator work here?), providing lowmem_malloc (right?).

This process is invasive and it is not revertible easily. One can say "Still this is possible". Yes it is possible, but it is so complicated that no one will do this. That why I said "In theory yes". In practice it means no.

Compiling for x32 abi is a much cleaner solution.

The x32 subarchitecture may be removed

Posted Dec 12, 2018 23:37 UTC (Wed) by linuxrocks123 (subscriber, #34648) [Link] (5 responses)

I'd think probably the way to do it would be at the compiler and glibc level. It seems like the compiler should be able to make pointers 32-bit on its own, and glibc could be made to "thunk" the system calls by zero-extending all the pointers.

I'm guessing there might be something I'm missing here, because, otherwise, I don't know why they wouldn't have done it that way to begin with.

The x32 subarchitecture may be removed

Posted Dec 13, 2018 5:56 UTC (Thu) by eru (subscriber, #2753) [Link] (4 responses)

It seems like the compiler should be able to make pointers 32-bit on its own, [...]

Sounds like the "memory models" of MS-DOS and 16-bit Windows C compilers. You can certainly make it work, but you must make sure that all modules are compiled with the same memory model option, and any precompiled libraries you link against are for the same memory model as your code, or you need extensions or #pragmas to define "far" function prototypes. I'm not sure it is worth it. In MS-DOS this caused all sorts of fun, but was necessary for allowing C programs access more than 64k of memory. For x32_64 there is no such compelling reason.

The x32 subarchitecture may be removed

Posted Dec 13, 2018 7:26 UTC (Thu) by ibukanov (subscriber, #3942) [Link] (2 responses)

In MS-DOS and Windows 3.* there were mostly no need to annotate pointers with __near and __far keywords. One configured the compiler to have data/code pointers as necessary, like 32 bit for data and 16 bit for code and that was it. The far/near keywords were only necessary when one wanted to optimize things. For example with 32 bit for code it was occasionally useful to optimize function pointers to be 16 bit if one new that functions, for example, came from the same source and the code span was less than 64K. Another problem was 16 bit data and 32 bit code. Then one could not store a functional pointer in void* and that had to be annotated as void __far*.

The x32 subarchitecture may be removed

Posted Dec 13, 2018 13:15 UTC (Thu) by eru (subscriber, #2753) [Link] (1 responses)

In MS-DOS and Windows 3.* there were mostly no need to annotate pointers with __near and __far keywords.

True, when you compiled all your code with the same memory model options. But you did need them for external libraries (that may or may not use the same memory model), and low-level code. (And of course for optimizations, as you noted). As I recall, the Microsoft compiler had four memory models for all combinations of near/far function and data pointers. (Some compilers had even a fifth, "huge", that permitted arrays larger than 64k). We obviously could reach the same number of models with 32 vs 64 data and function pointers, and relive the memory model mess-ups of the 1980's.... To quote one of the posts above, "Don't go there".

The x32 subarchitecture may be removed

Posted Dec 13, 2018 14:38 UTC (Thu) by ibukanov (subscriber, #3942) [Link]

Borland-C++ supported the huge memory model. And I really do not remember that far/near was an issue even with external code as those were coming compiled for all relevant models.

A memory model with 32-bit code and 64-bit data can be useful. We are still far away from 4GB executables even when accounting for JIT.

The x32 subarchitecture may be removed

Posted Dec 13, 2018 16:26 UTC (Thu) by nybble41 (subscriber, #55106) [Link]

> you must make sure that all modules are compiled with the same memory model option, and any precompiled libraries you link against are for the same memory model as your code

But that's already true for x32 code. The ABI proposed by linuxrocks123 (32-bit pointers but x86_64 system calls) would be similar to x32 but implemented entirely in userspace, with pointer size translation at the user <-> kernel boundary.

The x32 subarchitecture may be removed

Posted Jan 6, 2019 18:01 UTC (Sun) by jwakely (subscriber, #60262) [Link]

> containers (does specifying lowmem_allocator work here?)

Yes, in theory. In practice not all of GCC's std:: containers do the right thing yet, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57272

The x32 subarchitecture may be removed

Posted Dec 22, 2018 13:18 UTC (Sat) by berndp (guest, #52035) [Link]

And then are function pointers/virtual functions ...

The x32 subarchitecture may be removed

Posted Dec 14, 2018 4:12 UTC (Fri) by mst@redhat.com (subscriber, #60682) [Link] (2 responses)

It does seem that most people want the security benefits of ASLR. Applications that never touch any un-trusted input aren't that common. And IIUC there's not enough bits in a 32 bit pointer for ASLR to be effective.

The x32 subarchitecture may be removed

Posted Dec 14, 2018 22:41 UTC (Fri) by ballombe (subscriber, #9523) [Link] (1 responses)

But is ALSR actually effective ? It seems every month someone reports a hardware bug that makes it ineffective.

The x32 subarchitecture may be removed

Posted Dec 15, 2018 7:52 UTC (Sat) by jrn (subscriber, #64214) [Link]

I'd be interested in the answer. Like many security features, it is breakable, but does it slow attackers down enough to be useful?

Other flavours

Posted Dec 12, 2018 18:26 UTC (Wed) by epa (subscriber, #39769) [Link] (18 responses)

Might you not want 64-bit pointers, but 32-bit arithmetic? Or vice versa?

Other flavours

Posted Dec 12, 2018 18:41 UTC (Wed) by josh (subscriber, #17465) [Link] (17 responses)

You can trivially do 32-bit arithmetic in userspace; just declare your variables as uint32_t or int32_t.

The only reason the kernel needed to care about the x32 architecture was for cases where the kernel supplies memory to userspace. Beyond that, it was an entirely userspace concern for how to compile binaries.

I half wonder if you could replace x32 support in the kernel with a prctl that makes the kernel default to MAP_32BIT for your process.

Other flavours

Posted Dec 12, 2018 19:23 UTC (Wed) by plugwash (subscriber, #29694) [Link] (15 responses)

> The only reason the kernel needed to care about the x32 architecture was for cases where the kernel supplies memory to userspace. Beyond that, it was an entirely userspace concern for how to compile binaries.

Unfortunately that is not true, if all the kernel had to care about for x32 was limiting memory allocations we wouldn't be having this discussion.

The problem is that data structures that pass between kernel and userland are defined in terms of the standard C types including long and pointer. The Linux kernel has a mechanism designed for handling a "backwards compatibility architecture", but said mechanism was only designed to handle one such architecture. x32 is different from both i386 and amd64. The result is that at least according to the lkml thread that led to this article the handling of x32 syscalls is a hacky mess.

An alternative approach would be to define data structures used for kernel to userland interfacing without using the standard C long and poitner types. This would make the system non-poxix compliant and require a compiler extension to allow compiling code with mixed 32-bit and 64-bit pointers but it may be a more workable way forward than the current approach.

Other flavours

Posted Dec 12, 2018 20:18 UTC (Wed) by ibukanov (subscriber, #3942) [Link] (14 responses)

But then what makes x32 ABI different from x86 that the kernel has in any case?

Other flavours

Posted Dec 12, 2018 21:32 UTC (Wed) by mathstuf (subscriber, #69389) [Link] (6 responses)

Data types are still 64-bit in x32 where they might be 32-bit in x86. Time, file offsets, stat fields, etc.

Other flavours

Posted Dec 13, 2018 7:01 UTC (Thu) by ibukanov (subscriber, #3942) [Link] (5 responses)

In retrospect this was a wrong decision. If x32 used the same ABI as x86, the maintenance burden would be lower.

Other flavours

Posted Dec 13, 2018 8:18 UTC (Thu) by cladisch (✭ supporter ✭, #50193) [Link] (4 responses)

At that time, Linus said (https://lwn.net/Articles/456750/):
> And I really do think that a new 32-bit ABI is *much* better off trying to be compatible
> with x86-64 (and avoiding things like 2038) than it is trying to be compatible with the
> old-style x86-32 binaries. I realize that it may be *easier* to be compatible with x86-32
> and just add a few new system calls, but I think it's wrong.
>
> 2038 is a long time away for legacy binaries. It's *not* all that long away if you are
> introducing a new 32-bit mode for performance.

As far as I can see, this assessment has not changed since then.

Other flavours

Posted Dec 13, 2018 13:31 UTC (Thu) by plugwash (subscriber, #29694) [Link] (3 responses)

What has changed since then is a mechanism has been introduced to allow 32-bit architectures to use 64-bit time. There has been a mechanism for 32-bit architectures to use 64-bit file offsets for a long time.

Other than time and file offsets I can't think of much that has a pressing need to be 64-bit on 32-bit systems.

Other flavours

Posted Dec 13, 2018 19:02 UTC (Thu) by jccleaver (guest, #127418) [Link] (2 responses)

> What has changed since then is a mechanism has been introduced to allow 32-bit architectures to use 64-bit time. There has been a mechanism for 32-bit architectures to use 64-bit file offsets for a long time.

This absolutely is key. If this decision were to be rolled back in the interests of a meaningfully usable x32, it would be a great step.

Other flavours

Posted Dec 13, 2018 23:22 UTC (Thu) by farnz (subscriber, #17727) [Link] (1 responses)

Why? 64-bit time for 32-bit architectures is worth the effort because there still exist 32-bit only systems being sold today (ARM Cortex-R range, including new designs, for example, not to mention embedded systems using older ARMv7-A cores, plus anything designed around the DM&P Vortex86 SoCs or RDC's Emkore and IAD chips which are x86 CPUs with no 64-bit support). Thus, we need to address this anyway; these chips are going to be around for a while, and saying that brand new hardware designed in 2019 (or probably 2020) is going to be worthless before 2038 isn't exactly nice.

OTOH, x32 is just a potential speedup for users who could use amd64 or i386 ABIs; it doesn't expand the user base by any significant amount, and does involve engineering effort.

Other flavours

Posted Dec 14, 2018 8:30 UTC (Fri) by joib (subscriber, #8541) [Link]

I think the argument was that it would be nicer if X32 would use the generic support for 64-bit time_t for 32-bit targets rather than having its own custom way of doing it.

Other flavours

Posted Dec 12, 2018 21:37 UTC (Wed) by ken (subscriber, #625) [Link] (6 responses)

64 bit instructions and more registers. x32 is x86-64 minus the 64 bit address space. So the only speed gain is the small amount of extra speed you get by not using so much cache space to store the pointers.

Other flavours

Posted Dec 13, 2018 8:36 UTC (Thu) by epa (subscriber, #39769) [Link] (5 responses)

If you are working with 32-bit integers and 32-bit pointers, but running in the CPU's 64-bit mode, could you in principle get even more registers by treating each 64-bit register as two 32-bit values? The compiler would need to handle this for you.

Other flavours

Posted Dec 13, 2018 11:01 UTC (Thu) by NAR (subscriber, #1313) [Link]

In that case registers would overflow into each other - sounds like big mess.

Other flavours

Posted Dec 13, 2018 11:47 UTC (Thu) by excors (subscriber, #95769) [Link] (2 responses)

Most x86-64 instructions with 32-bit outputs will set the high 32 bits of the output register to zero, so you'd need to do a lot of extra work (masking and shifting and oring) to preserve those high bits, which sounds like it would eliminate the performance benefits of storing more data in registers.

ARM NEON does let you split up registers like that - the 32-bit registers S0 and S1 are the two halves of the 64-bit register D0, and D0/D1 are the two halves of 128-bit Q0, and so on up to D30/D31 = Q15. But that makes it much harder for an out-of-order CPU to accurately determine dependencies between instructions and do correct register renaming, so AArch64 got rid of that aliasing - now S0/D0/Q0 share one physical register, S1/D1/Q1 share another, etc. Better to sacrifice some utilisation of register space in exchange for a simpler and more efficient microarchitecture.

Other flavours

Posted Dec 13, 2018 16:38 UTC (Thu) by epa (subscriber, #39769) [Link] (1 responses)

I suppose the only performance benefit you might get is this. Instead of loading one 32-bit value from memory into a register, and then another 32-bit value from the following four bytes of memory into a second register, do a single 64-bit load and then shuffle the registers around. The shuffling wouldn't need to go to memory at all, provided the instructions to do it stay in the instruction cache (and perhaps you have a scratch register to use). And if neither of the two four-byte memory locations was in the data cache, a single 64-bit load might be faster than two 32-bit loads.

I agree, it doesn't really seem worth the effort in general, but perhaps in some tight inner loop that works with 32-bit arithmetic it might make a difference.

Other flavours

Posted Dec 13, 2018 19:36 UTC (Thu) by excors (subscriber, #95769) [Link]

If you're doing two loads from consecutive addresses, the first one will pull the whole 64B cache line into L1 cache, so the second load will be extremely quick. And I expect an OoO CPU would execute both loads in parallel anyway, and the L1 cache would merge them into a single cache line request to L2, so there would be almost no difference to a single 64-bit load.

If you have a tight inner loop where such tiny differences matter, you should be using SSE/AVX anyway so that you're loading 128/256/512 bits at once and doing all your arithmetic with SIMD instructions.

Other flavours

Posted Dec 13, 2018 12:01 UTC (Thu) by joib (subscriber, #8541) [Link]

If the world has learnt something from the partial register stall mess of ye olde x86, then it's "lets not do that anymore".

For x86 with mem-op instructions and register renaming, 16 GPR's is for most purposes enough.

Other flavours

Posted Dec 12, 2018 19:48 UTC (Wed) by jem (subscriber, #24231) [Link]

> You can trivially do 32-bit arithmetic in userspace; just declare your variables as uint32_t or int32_t.

Or simply declare them 'int'. Integers are 32-bit on x86_64 Linux; pointers and longs are 64-bit. (On 64-bit Windows even longs are still 32-bit).

With 64-bit ints, declaring your variables int32_t wouldn't help, because of C's type conversion rules: arithmetic operands are converted to at least int before the operation.

The x32 subarchitecture may be removed

Posted Dec 13, 2018 17:07 UTC (Thu) by jalla (guest, #101175) [Link] (3 responses)

I don’t have many personal systems running x32; but I do indeed still have a couple...

The x32 subarchitecture may be removed

Posted Dec 19, 2018 11:29 UTC (Wed) by arnd (subscriber, #8866) [Link] (2 responses)

Are you sure this is x32 and not a plain old x86-32? (some others misunderstood here)

If you indeed run x32, can you please reply to the email thread and explain what distro you have, why you chose x32, and what keeps you from migrating to either x86-32 or x86-64?

The x32 subarchitecture may be removed

Posted Jan 6, 2019 17:58 UTC (Sun) by jwakely (subscriber, #60262) [Link] (1 responses)

> Are you sure this is x32 and not a plain old x86-32? (some others misunderstood here)

The confusion isn't helped by Windows-land using "x64" to refer to x86_64, which unsurprisingly leads to people thinking that "x32" is just another name for x86-32 aka IA32 aka x86. Sigh.

The x32 subarchitecture may be removed

Posted Jan 6, 2019 18:33 UTC (Sun) by excors (subscriber, #95769) [Link]

> x86-32 aka IA32 aka x86

And watch out for IA-32 (x86) vs IA-64 (Itanium) vs IA-32e (AMD64). It's like a time capsule of Intel's ambitions in the early 00s.

The x32 subarchitecture may be removed

Posted Dec 13, 2018 17:30 UTC (Thu) by bircoph (guest, #117170) [Link]

It's time for x32 to go, since it provided no significant benefits over amd64 in real life as was experienced on x32 Gentoo setup:
https://flameeyes.blog/2012/06/23/debunking-x32-myths/

Of course in some synthetic cases results are better, but x32 complexity and maintenance burden is not worth it.

The x32 subarchitecture may be removed

Posted Dec 14, 2018 18:21 UTC (Fri) by thyrsus (guest, #21004) [Link] (1 responses)

May I infer from the "nobody uses it" characterization that this has nothing to do with the i686 support offered by major distributions in their x86_64 variants, and that i686 binaries and their libraries will continue to work?

The x32 subarchitecture may be removed

Posted Dec 14, 2018 19:58 UTC (Fri) by k8to (guest, #15413) [Link]

Yes, and i686/x86 in general is not x32. See the above discussions for more details.

The x32 subarchitecture may be removed

Posted Dec 27, 2018 8:57 UTC (Thu) by sorokin (guest, #88478) [Link]

Perhaps it will be interesting to someone. On the net there are mentions of architecture called arm64_32 which uses 32-bit pointers using 64-bit arm instructions (aarch64) ^{[1] [2]}. Googling for it doesn't show many results. Presumably it is used by Apple in their smart watches ^{[3] [4]}.

What I understand from the little details found on web. Programs for Apple Watch used to be compiled for armv7 architecture. Apple anticipated the transition to aarch64 and they required all programs to be shipped not only with normal binaries, but also with a dump of llvm intermediate representation. At some point Apple updated their hardware to aarch64 and they retranslated the intermediate representation of all programs to aarch64.

One should note here that llvm intermediate representation doesn't abstract away pointer sizes and is not architecture independent. In fact it can not be architecture independent because C code after preprocessing is not architecture independent. So as I understand the programs are compiled as if for armv7 and therefore the layout of all structures in arm64_32 matches exactly the layout of structures in armv7 with all its upsides and downsides.

From what I get they use it not for efficiency, but for compatibility with existing 32-bit binaries. Still I think it is an interesting approach to having 32-bit pointers on 64-bit architecture.