GCC 12.1 Released

Posted May 8, 2022 18:57 UTC (Sun) by NYKevin (subscriber, #129325)
In reply to: GCC 12.1 Released by hmh
Parent article: GCC 12.1 Released

Of course, such code was already broken on most(?) non-x86 targets because the x86 is the weirdo. But I imagine quite a few developers are of the "unless it breaks on my laptop, I don't care" mentality...

GCC 12.1 Released

Posted May 8, 2022 21:14 UTC (Sun) by wtarreau (subscriber, #51152) [Link] (3 responses)

It's even worse (or better), ARM is also excellent with unaligned accesses nowadays, so you if you don't run your code on a wide variety of platforms, you can have broken code that runs fine on the two most popular platforms without ever noticing.

GCC 12.1 Released

Posted May 8, 2022 22:47 UTC (Sun) by Paf (subscriber, #91811) [Link] (2 responses)

It’s also not unrealistic to write code only aimed at those platforms…. I’m involved in a decent size project and we target those two plus a variant of PowerPC and that last is for weird semi-historical reasons.

For a specific software project, it’s not crazy to only aim at ARM and x86, or even just x86 or ARM depending on what you’re up to.

How many non-embedded systems aren’t one of those two? Is it even 0.1% any more? I’m sure it’s not 1%.

GCC 12.1 Released

Posted May 8, 2022 23:38 UTC (Sun) by NYKevin (subscriber, #129325) [Link]

This is a valid position for application code to take, but library code IMHO generally should not be in the business of dictating architecture support unless it is doing something hardware-specific (e.g. if your library provides fast lock-free data structures, it's fair enough to say "the hardware must support certain atomic primitives," if your library does float math, it's fair enough to say "the hardware must conform to IEEE 754," and so on). Thing is, there's a lot of library code out there[citation needed], and it's hard to say with absolute certainty which libraries are getting used on more esoteric hardware configurations.

GCC 12.1 Released

Posted May 15, 2022 16:53 UTC (Sun) by wtarreau (subscriber, #51152) [Link]

That's typically what I'm doing with asm or arch-specific optimizations in general: try to make sure the code works on generic platforms (since it helps detect bugs) and only make efforts on relevant ones, typically x86 and armv8 in my case.

GCC 12.1 Released

Posted May 15, 2022 9:23 UTC (Sun) by anton (subscriber, #25547) [Link] (5 responses)

Of course, such code was already broken on most(?) non-x86 targets because the x86 is the weirdo.

Of course, this is one of the claims commonly made by those who advocate that compilers break programs with undefined behaviour.

First of all, if a program works on some machine, and the compiler breaks it on that machine, the fact that earlier it may not have worked on some other machine does not help the user and is pure whataboutism.

Next, is it actually true? The surviving general-purpose architectures are AMD64, Aarch64, RV64GC, Power, s390. I just tried it on an Aarch64 (Odroid N2) and RV64GC (Starfive Visionfive) machine, and they performed the unaligned access without complaint. Power has supported unaligned accesses in big-endian mode for a long time, and AFAIK they also support it in their new little-endian mode (and the old little-endian mode has not been used in general-purpose computers). Even on the Alphas from the last century, unaligned accesses were supported in Linux by default, albeit very slowly (and with a report in dmesg), and I had to take special measures to trap unaligned accesses.

So, these days an architecture that traps on unaligned accesses is weirdo. In particular, SSE is weirdo (Intel did not repeat this misdesign with AVX, and AMD (but unfortunately not Intel) even supports a fix for SSE), but even SSE includes instructions that tolerate unaligned accesses, so the gcc maintainers could choose to use those to avoid the breakage.

Concerning the claim (not made here) that using the trap-on-unaligned-access instructions are faster, such claims usually come without any empirical support. I microbenchmarked that (with a microbenchmark based on code in a bug report where Jakub Jelinek had justified gcc's use of these instructions with this claim), and found that the claim is not true for this microbenchmark.

GCC 12.1 Released

Posted May 15, 2022 11:23 UTC (Sun) by excors (subscriber, #95769) [Link] (4 responses)

> The surviving general-purpose architectures are AMD64, Aarch64, RV64GC, Power, s390.

It does get a lot easier if you exclude ARMv7, though that transition is either pretty recent or hasn't happened yet, depending on what field you're working in.

If I'm reading it right, ARMv8-A says: Unaligned accesses to Device memory (i.e. MMIO) always fault. Most loads/stores to unaligned Normal memory are okay, but multi-register loads/stores will fault if the SCTLR_ELx.A bit is set (though I believe Linux doesn't set that), and Exclusive/Acquire/Release/Atomic accesses will fault unless your CPU is ARMv8.4 (or older with an optional feature) (but even when unaligned atomics are supported, they may (unpredictably) fault if they cross a 16-byte boundary).

ARMv7-A will fault in much less obscure cases, e.g. any unaligned multi-word access (LDM, LDRD, etc) regardless of SCTLR.A. That's a problem whenever you're loading an int64_t, or even two adjacent int32_ts (because the compiler likes to merge them into one instruction), and if it's not aligned you'll need to tell the compiler with __attribute__((packed)).

ARMv8-M also faults on unaligned multi-word accesses. An ARMv8-M Baseline implementation (which I think is the modern replacement for ARMv6-M) will even fault on unaligned single-word accesses.

GCC 12.1 Released

Posted May 15, 2022 12:48 UTC (Sun) by anton (subscriber, #25547) [Link] (2 responses)

It does get a lot easier if you exclude ARMv7, though that transition is either pretty recent or hasn't happened yet, depending on what field you're working in.

In general-purpose computers, the transition to ARMv8-A has happened quite a while ago (e.g., with Raspi3 in 2016).

However, maybe it has more to do with the instruction set. In that case, Aarch32 seems to be pretty alive on RaspiOS (although even they have started releasing an Aarch64 version). However the Cortex-X2 and Cortex-A510 announced by ARM almost a year ago don't support Aarch32, so Aarch32 is a second-class citizen already, and I expect that there will be no hardware support on general-purpose computers for it in the not-too-distant future.

Personal experience: I just tried to run an EABI5 binary on all four ARMv8-A machines (with various distributions) we have around. On three I get "no such file or directory" (apparently the kernel does not understand the binary at all), the fourth (a Raspi4 with 64-bit Debian 10) eventually chokes on a missing library. It seems that Aarch32 is not very important for 64-bit Linux distributions.

Concerning the SCTLR_ELx.A bit, IA-32 and AMD64 have a similar bit since the 486, which I tried to use (for portability checking in a development environment), but had to give up on, because on IA-32 the ABI puts doubles at 4-byte boundaries, and the flag would cause fault on such accesses. Another attempt with AMD64 failed because gcc produces unaligned accesses from pairs of user-written aligned accesses. So if Linux has not set SCTLR_ELx.A in the past, setting it now would probably cause quite a bit of breakage.

Concerning atomics, they are no excuse for breaking code that does not perform atomic accesses (I doubt that the auto-vectorizer dares auto-vectorizing atomics).

ARMv8-M is irrelevant for general-purpose computers. To those who think it has anything to do with ARMv8-A: it has not. E.g., there is no Aarch64 (the headline feature of ARMv8-A) in ARMv8-M. Yes ARM's naming is confusing.

GCC 12.1 Released

Posted May 18, 2022 14:23 UTC (Wed) by excors (subscriber, #95769) [Link] (1 responses)

> In general-purpose computers, the transition to ARMv8-A has happened quite a while ago (e.g., with Raspi3 in 2016).
>
> However, maybe it has more to do with the instruction set. In that case, Aarch32 seems to be pretty alive on RaspiOS (although even they have started releasing an Aarch64 version).

True, my previous comment should have said "ARMv8-A AArch64" (not "ARMv8-A") - the rules for ARMv8-A AArch32 look essentially identical to ARMv7-A, so unaligned LDRD/LDM/etc will fault as you showed in a later comment. (And the compiler will happily transform assumedly-aligned loads into LDRD/LDM.)

> ARMv8-M is irrelevant for general-purpose computers.

Also true (well, assuming you mean the main user-visible processor and ignore the potentially dozens of microcontrollers in the same computer), but I'm not sure "general-purpose computer" is that useful a distinction in practice. There are plenty of libraries originally designed for Linux userspace that are quite usable and useful on higher-end microcontrollers, and it would be a shame if the only thing preventing them from working in that environment was an accidental reliance on misaligned data. It would also be a shame if GCC wasted performance on those microcontrollers by assuming all data might be misaligned and never using LDRD/LDM, given the vast majority of existing code does follow the alignment rules correctly and is currently benefiting from that optimisation. So I believe there's still value in following those alignment rules in new code, for portability to real systems that may realistically want to reuse your code.

GCC 12.1 Released

Posted May 18, 2022 21:07 UTC (Wed) by anton (subscriber, #25547) [Link]

(And the compiler will happily transform assumedly-aligned loads into LDRD/LDM.)

I was somewhat surprised how hard it was to find ldrds in the binary in order to exercise them: only 32 non-sp/fp ldrds and 58 ldms in 19587 instructions. For comparison, an Aarch64 binary of (a later version of) the same program has 257 non-sp/fp ldps in 21745 instructions. By general-purpose I mean the, e.g. Zen3 core that's targeted by free software developers and/or ISVs, not, e.g., AMDs PSPs which are indeed Aarch32 cores last I heard, but which we unfortunately cannot program.

There are plenty of libraries originally designed for Linux userspace that are quite usable and useful on higher-end microcontrollers, and it would be a shame if the only thing preventing them from working in that environment was an accidental reliance on misaligned data.

Indeed, ideally already the GPL prevents them from being used in such locked-down environments. But if gcc maintainers' willingness to break programs hurts the proprietary crowd for a change, that's less of a concern to me than when they hurt free software developers and users.

It would also be a shame if GCC wasted performance on those microcontrollers by assuming all data might be misaligned and never using LDRD/LDM, given the vast majority of existing code does follow the alignment rules correctly and is currently benefiting from that optimisation.

On the contrary, I would find it a shame if programmers who know how to get good performance by using unaligned accesses would slow down their programs in order to cater for gcc's sillyness.

GCC 12.1 Released

Posted May 17, 2022 20:39 UTC (Tue) by anton (subscriber, #25547) [Link]

I have now managed to indeed get a SIGBUS on a Raspi4 by running 32-bit code that uses ldrd with an unaligned address (but regular ldr does not produce such an exception). So SSE/SSE2, ldrd (and friends) are the last die-hards in a general-purpose world dominated by instructions that work with unaligned addresses.