Linker limitations on 32-bit architectures

August 27, 2019

This article was contributed by Alexander E. Patrakov

Before a program can be run, it needs to be built. It's a well-known fact that modern software, in general, consumes more runtime resources than before, sometimes to the point of forcing users to upgrade their computers. But it also consumes more resources at build time, forcing operators of the distributions' build farms to invest in new hardware, with faster CPUs and more memory. For 32-bit architectures, however, there exists a fundamental limit on the amount of virtual memory, which is never going to disappear. That is leading to some problems for distributions trying to build packages for those architectures.

Indeed, with only 32-bit addresses, there is no way for a process to refer to more than 4GB of memory. For some architectures, the limit is even less — for example, MIPS hardware without the Enhanced Virtual Addressing (EVA) extensions is hardwired to make the upper 2GB of the virtual address space accessible from the kernel or supervisor mode only. When linking large programs or libraries from object files, ld sometimes needs more than 2GB and therefore fails.

Building on 32-bit machines

This class of problems was recently brought up by Aurelien Jarno in a cross-post to multiple Debian mailing lists. Of course, he is not the first person to hit this; a good example would be in a Gentoo forum post about webkit-gtk from 2012. Let's follow through this example, even though there are other packages (Jarno mentions Firefox, Glasgow Haskell Compiler, Rust, and scientific software) where a 32-bit build has been reported to run out of virtual address space.

If one attempts to build webkit-gtk now, first the C++ source files are grouped and concatenated, producing so-called "unified sources". This is done in order to reduce compilation time. See this blog post by WebKit developer Michael Catanzaro for more details. Interestingly, a similar approach had been suggested for WebKit earlier, but it targeted reducing memory requirements at the linking stage.

Unified sources are compiled into the object code files by g++, which knows how to invoke the actual compiler (cc1plus) and assembler (as). At this stage, the most memory-hungry program is cc1plus, which usually consumes ~500-800MB of RAM, but there are some especially complex files that cause it to take more than 1.5GB. Therefore, on a system with 32-bit physical addresses, building webkit-gtk with parallelism greater than -j2 or maybe -j3 might well fail. In other words, due to insufficient memory, on "real" 32-bit systems one may not be able to take the full advantage of multi-processing. Thankfully, both on 32-bit x86 and MIPS, CPU architecture extensions exist that allow the system as a whole (but not an individual process) to use more than 4GB of physical RAM.

It is a policy of Debian, Fedora, and lots of other distributions (but, notably, not Gentoo) to generate debugging information while compiling C/C++ sources. This debugging information is useful when interpreting crash dumps — it allows seeing which line of the source code a failing instruction corresponds to, observing how exactly local and global variables are located in memory, and determining how to interpret various data structures that the program creates. If the -g compiler option is used to produce debugging information, the resulting object files have sizes between several KB and 10MB. For example, WebDriverService.cpp.o takes 2.1MB of disk space. Of that, only 208KB are left if debugging symbols are discarded. That's right — 90% of the size of this particular file is taken by debugging information.

When compilation has created all of the object files that go into an executable or library, ld is invoked to link them together. It combines code, data, and debug information from object files into a single ELF file and resolves cross-references. During the final link of libwebkit2gtk-4.0.so.37, 3GB of object files are passed to ld at once. This doesn't fail on x86 with Physical Address Extension (PAE), but comes quite close. According to the report produced with -Wl,--stats in LDFLAGS:

    ld.gold: total space allocated by malloc: 626630656 bytes
    total bytes mapped for read: 3050048838
    output file size: 1727812432 bytes

Adding the first two lines gives 3.6GB allocated by ld.gold. There are some legitimate questions here regarding resource usage. First, what is all this memory (or address space) used for? Second, can ld use less memory?

Well, there are two implementations of ld in the binutils package: ld.bfd (old, but still the default) and ld.gold (newer). By default, both implementations use mmap() to read object files as inputs. That is, ld associates a region of its virtual address space with every object file passed to it so that memory operations on these regions are redirected by the kernel, through page faults, to the files. This programming model is convenient and, in theory, reduces the physical memory requirements. That is because the kernel can, at its discretion, repurpose physical memory that keeps the cached content of object files for something else and then transparently reread bytes from the file again when ld needs them. Also, ld.gold, by default, uses mmap() to write to the output file. The downside is that there must be sufficient address space to assign to the input and output files or the mmap() operation will fail.

Both ld.bfd and ld.gold offer options that are intended to conserve memory, but actually help only sometimes:

    $ ld.bfd --help
      --no-keep-memory            Use less memory and more disk I/O
      --reduce-memory-overheads   Reduce memory overheads, possibly taking much longer
      --hash-size=<NUMBER>        Set default hash table size close to <NUMBER>
    $ ld.gold --help
      --no-keep-files-mapped      Release mapped files after each pass
      --no-map-whole-files        Map only relevant file parts to memory (default)
      --no-mmap-output-file       Do not map the output file for writing

When a distribution maintainer sees a build failure on a 32-bit architecture, and one of the options listed above helps, a patch gets quickly submitted upstream so that the option is used by default in the affected application or library. For webkit-gtk, that is indeed the case. In other words, all the low-hanging fruit is already collected.

A package without full debugging information is still better than no package at all. So, reluctantly, maintainers are forced to compress debugging information, reduce its level of detail, or, in extreme cases, completely remove it. Still, in Debian, some packages are excluded from some architectures because they cannot be built due to running out of virtual memory. "We are at a point were we should probably look for a real solution instead of relying on tricks", Jarno concluded.

Cross-compiling

A reader with some background in embedded systems might ask: why does Debian attempt to build packages on weak and limited 32-bit systems at all? Wouldn't it be better to cross-compile them on a much larger amd64 machine? Indeed, that's what the Yocto Project and Buildroot do, and they have no problem producing a webkit-gtk package with full debugging information for any architecture. However, cross-compilation (in the sense used by these projects) invariably means the inability to run compiled code. So packages that test properties of the target system by building and running small test programs are no longer able to do so; they are forced to rely on external hints, hard-coded information, or (often pessimistic) assumptions. Moreover, the maintainer is no longer able to run the test suite as a part of the package build. Such limitations are unacceptable for the Debian release team, and therefore native builds are required. And running tests under QEMU-based emulation is out of question, too, because of possible flaws in the emulator itself.

Jarno proposed to use Debian's existing support for "multiarch", which is the ability to co-install packages for different architectures. For example, on a system with an amd64 kernel, both i386 and amd64 packages can be installed and both types of programs will run. The same applies to 32-bit and 64-bit MIPS variants. It would be, therefore, enough to produce cross-compilers that are 64-bit executables but target the corresponding 32-bit architecture. For gcc it is as easy as creating a wrapper that adds the -m32 switch. However, gcc is not the only compiler — there are also compilers for Haskell, Rust, OCaml, and other languages not supported by gcc; they may need to be provided as well.

If one were to implement this plan, a question arises how to make those compilers available. That is, should a package that demands extraordinary amounts of memory at build time explicitly depend on the cross-compiler or should it be provided implicitly? Also, there is an obstacle on the RISC-V architectures: a 64-bit CPU does not implement the 32-bit instructions, so one cannot assume that just-compiled 32-bit programs will run. The same is sometimes true for Arm: there is arm64 hardware that does not implement 32-bit instructions. However, this obstacle is not really relevant, because Debian does not support 32-bit RISC-V systems anyway, and arm64 hardware that supports 32-bit instructions is readily available, is not going away, and thus can be used on the build systems. So, for the cases where this approach can work, Ivo De Decker (a member of the Debian release team) expressed readiness to reevaluate the policy.

On the other hand, Sam Hartman, the current Debian Project Leader, admitted that he is not convinced that it is a good idea to run the build tools (including compilers and linkers) on 32-bit systems natively. Of course, this viewpoint is at odds with the release team position, and the problem of getting accurate build-time test results still needs to be solved. One solution mentioned was a distcc setup, but this turned out to be a false lead, because with distcc, linking is done locally.

There was also a subthread started by Luke Kenneth Casson Leighton, where he brought up ecological concerns — namely, that perfectly good hardware is forced to go to landfills because of the resource-wasteful algorithms now employed in the toolchain. In Leighton's viewpoint, the toolchain must be fixed in order to be able to deal with files bigger than the available virtual memory, especially because it was able to deal with such files decades ago.

Right now, 32-bit systems are still useful and can run general-purpose operating systems. Only a few big packages are directly affected by the limitation of available address space at build time. But ultimately, we have to agree with Hartman's conclusion: "if no one does the work, then we will lose the 32-bit architectures" — at least for Debian.

Index entries for this article
GuestArticles	Patrakov, Alexander E.

Linker limitations on 32-bit architectures

Posted Aug 27, 2019 13:53 UTC (Tue) by X-san (guest, #133973) [Link] (14 responses)

Anyone still have 32-bit hardware? I have 32-bit chipsets, but not CPUs for some time.

Linker limitations on 32-bit architectures

Posted Aug 27, 2019 13:55 UTC (Tue) by X-san (guest, #133973) [Link]

Well, not x86 CPUs at the least.

Linker limitations on 32-bit architectures

Posted Aug 27, 2019 15:07 UTC (Tue) by arnd (subscriber, #8866) [Link] (3 responses)

32-bit ARM hardware is extremely common, it's still everywhere: low-end phones, tablets, networking equipment, industrial, etc. They are slowly moving to 64-bit, but for now they are about 50:50 even for new stuff. When I looked at distro support for y2038, I concluded that we will likely still need a Debian for ARMv7 for another 15 to 20 years. Most other 32-bit distros will give up long before, but embedded systems often hang around long after the software stops getting updated.

32-bit x86 hardware has been completely irrelevant from a commercial point of view for a while now, but some people still need to run 32-bit binaries (see https://lwn.net/Articles/791936/), or they might have old specialized hardware.

Most other CPU architectures supported in Linux are also 32-bit; many are fading away slowly. We removed the ones that are clearly unused (see https://lwn.net/Articles/748074/), the rest have at least one person that still cares. mips32 is still popular in some markets that tend to use older chips, powerpc32 and superh have similar but smaller niches. Some configurable CPUs like arc, xtensa, microblaze and now rv32 seem to have a large customer base in special-purpose chips, but you never hear from them, either because they don't run Linux or they don't run mainline kernels.

Linker limitations on 32-bit architectures

Posted Aug 28, 2019 19:20 UTC (Wed) by nix (subscriber, #2304) [Link] (2 responses)

32-bit x86 hardware has been completely irrelevant from a commercial point of view for a while now

What? Intel is still making and selling 32-bit CPUs. Perhaps not at very large scale, but it seems unlikely that none of those machines are running Linux. (I'd bet that most of them are.)

Linker limitations on 32-bit architectures

Posted Aug 28, 2019 19:56 UTC (Wed) by k8to (guest, #15413) [Link] (1 responses)

Are you sure? I thought the last atom chips were discontinued. I'm sure there's someone making a SoC still, but I thought physical 32bit x86 was pretty dead.

Of course service life for some systems is pretty long.

Linker limitations on 32-bit architectures

Posted Aug 28, 2019 20:39 UTC (Wed) by arnd (subscriber, #8866) [Link]

I tried finding anything for sale recently, with no luck.

https://www.mouser.de/Semiconductors/Embedded-Processors-... still lists Quark SoCs, and while those can run embedded Linux, normal distros like Debian typically won't work. Lots of Atom chips and boards are still being sold, but they seem to all be 64-bit in practice. https://ark.intel.com/content/www/us/en/ark/products/code... lists some embedded Atoms from 2010 that were 32-bit only and are not officially discontinued, but are basically nowhere in stock as far as I can tell, neither chips nor boards.

For non-Intel parts it looks even worse:

https://www.heise.de/preisvergleich/?cat=mbson lists two mainboards with 32-bit VIA CPUs (no Intel or AMD), but those are new old stock sold at 10x the price it was at 10 years ago. Zhaoxin took over VIA's x86 line, but they are all 64-bit now.

DM&P Vortex86DX and a few others are theoretically still around, but equally outdated and hard to find for sale anywhere, it's easier to find an ARMv4 or SH3 based system.

Linker limitations on 32-bit architectures

Posted Aug 27, 2019 17:19 UTC (Tue) by mirabilos (subscriber, #84359) [Link] (5 responses)

I don’t even have *any* 64-bit hardware, privately. (I can use the work laptop, which is amd64 and has a Core2duo, but it doesn’t belong to me.)

If Debian stops supporting i386, someone better gift me new hardware.

If Debian stops supporting m68k, I’ll be *SERIOUSLY* pissed because I invested about three years of my life into resurrecting it.

Linker limitations on 32-bit architectures

Posted Aug 27, 2019 18:38 UTC (Tue) by pizza (subscriber, #46) [Link] (3 responses)

> If Debian stops supporting i386, someone better gift me new hardware.

Wow, all your stuff must be seriously old at this point...

> If Debian stops supporting m68k, I’ll be *SERIOUSLY* pissed because I invested about three years of my life into resurrecting it.

In all fairness, you knew at the time that m68k had no future -- The final processor of the line was released 25 years ago (68060 @ 75MHz) and while there are more modern microcontoller derivatives they were never compatible.

Linker limitations on 32-bit architectures

Posted Aug 27, 2019 23:36 UTC (Tue) by atai (subscriber, #10977) [Link] (2 responses)

32-bit limitations? There are people running Linux on 16-bit x86
https://github.com/jbruchon/elks

Linker limitations on 32-bit architectures

Posted Aug 28, 2019 12:51 UTC (Wed) by pizza (subscriber, #46) [Link]

Nobody is talking about running "modern" applications (much less locally compiling them) on those 16-bit x86 CPUs.

And elks is "Linux-like", not "Linux".

Linker limitations on 32-bit architectures

Posted Aug 28, 2019 13:04 UTC (Wed) by mebrown (subscriber, #7960) [Link]

"People" might be generous. There's a guy. :)

Linker limitations on 32-bit architectures

Posted Aug 28, 2019 8:24 UTC (Wed) by Sesse (subscriber, #53779) [Link]

> If Debian stops supporting i386, someone better gift me new hardware.

Why is it anyone else's responsibility to make sure you can run Debian?

> If Debian stops supporting m68k, I’ll be *SERIOUSLY* pissed because I invested about three years of my life into resurrecting it.

You choosing to spend your time on m68k doesn't mean anyone else is obliged to.

Linker limitations on 32-bit architectures

Posted Aug 27, 2019 18:47 UTC (Tue) by hmh (subscriber, #3838) [Link]

Many ThinkPad laptops with Pentium-M and Pentium-4M are still around and working just fine... but you would not want to build anything bigger than the kernel itself on those (I should know, I own a T43).

But x86 is easier, since you can natively build 32-bit using a 64-bit kernel and toolchain *and* run the result locally in 32-bit.

Linker limitations on 32-bit architectures

Posted Sep 5, 2019 16:04 UTC (Thu) by unprinted (guest, #71684) [Link] (1 responses)

All the netbooks here are 32-bit only: Pentium-M or early Atom.

They're not used much, but they're paid for and very portable when needed.

The older Raspberry Pis - the Pi 2 A and B are only four years old.

Linker limitations on 32-bit architectures

Posted Sep 5, 2019 22:45 UTC (Thu) by flussence (guest, #85566) [Link]

My Atom netbook gets used every day. Software is easy to deal with - either it works on my machine, or I throw it away and find another (and as I'm the technical one in most of my friend groups, that usually causes ripple effects that push things like Electron-based IM programs out).

Linker limitations on 32-bit architectures

Posted Aug 27, 2019 14:13 UTC (Tue) by ju3Ceemi (subscriber, #102464) [Link] (9 responses)

I do not understand the issue

Why not simply decorrelate the build, test and packaging phases ?

One would build the binary on a 64b host (make)
Then run the tests on a native host (make test)
Then build the .deb

Basically, running a pipeline, "CI"-style

Of course, this assumes that build tools from those projects are compliant with such setup ..

Linker limitations on 32-bit architectures

Posted Aug 27, 2019 14:38 UTC (Tue) by patrakov (subscriber, #97174) [Link] (8 responses)

The build systems of too many packages are not compliant. E.g., when you compile the PHP interpreter, you need to run ./configure, make, make test, make install (well, oversimplifying here). But at the ./configure stage, it checks whether getaddrinfo() actually works. It does so by compiling and running a test program. If it cannot run a test program (e.g. when cross-compiling), it assumes that getaddrinfo() does not work, and disables code that uses this function - even though it might, in fact, work just fine.

https://github.com/php/php-src/blob/452356af2aa66493daf8f...

Another problem is that many packages, by mistake, check properties of the host system, not the target. E.g. alsa-lib calls the "python-config" script that gets the necessary includes and libs. But that script describes the host, not the target!

https://git.alsa-project.org/?p=alsa-lib.git;a=blob;f=con...

Buildroot and Yocto carry a ton of hacks and workarounds to these classes of problems. Debian avoids them by building natively.

Linker limitations on 32-bit architectures

Posted Aug 28, 2019 2:03 UTC (Wed) by mathstuf (subscriber, #69389) [Link] (7 responses)

One should be able to preload the cache result for that check somehow. However, having worked with build systems a lot (I work on CMake), compile tests are bad, but run tests are worse. They break cross compilation, are really slow (generally) and should be done as preprocessor or runtime checks if possible. All sizeof(builtin_type) things have preprocessor definitions available, broken platform checks should just be done once and statically placed in the code (how much energy has been wasted seeing if "send" is a valid function? Or getting sizekf(float)?). Library viability checks are more problematic, but should be handled with version checks via the preprocessor. But bad habits persist :( .

Basically: send a patch to PHP to stop doing such dumb things. Find out which platforms have a busted getaddrinfo and just #ifdef it in the code. They're not likely to be fixed any time soon anyways and when they do, someone will be throwing parties about it finally getting some love.

Linker limitations on 32-bit architectures

Posted Aug 28, 2019 13:30 UTC (Wed) by Sesse (subscriber, #53779) [Link] (6 responses)

That kind of “table-driven” configure was attempted during the 80s. It's a massive pain to maintain, which led directly to GNU autoconf.

Linker limitations on 32-bit architectures

Posted Aug 28, 2019 13:51 UTC (Wed) by pizza (subscriber, #46) [Link] (4 responses)

Autconf's insanity stems directly from the fact that it relies on the least-common denominator for, well, everything. It can't even assume the presence of a shell that supports function definitions.

But one can make a case for revisiting some of those assumptions -- After all, "Unix-ish" systems are far more hetrogenous than they used to be. Does software produced today need to care about supporting ancient SunOS, IRIX or HPUX systems? Or pre-glibc2 Linux? Or <32-bit systems?

Linker limitations on 32-bit architectures

Posted Aug 28, 2019 15:34 UTC (Wed) by halla (subscriber, #14185) [Link] (2 responses)

I think you mean homogenous?

Linker limitations on 32-bit architectures

Posted Aug 29, 2019 22:55 UTC (Thu) by antiphase (subscriber, #111993) [Link]

Did you mean homogeneous?

Linker limitations on 32-bit architectures

Posted Aug 30, 2019 3:28 UTC (Fri) by pizza (subscriber, #46) [Link]

You are correct; I doublethunk my self into the wrong term.

Linker limitations on 32-bit architectures

Posted Aug 28, 2019 17:15 UTC (Wed) by madscientist (subscriber, #16861) [Link]

Just FYI there already was a first step towards modernizing what autoconf can support... for example configure scripts generated by autoconf these days definitely DO use shell functions. That's been true for >10 years, since autoconf 2.63.

As far as supporting older systems, some of that depends on the software. Some GNU facilities make a very conscious effort to support VERY old systems; this is particularly true for "bootstrap" software. Others simply make assumptions instead, and don't add checks for those facilities into their configure.ac. It's not really up to autoconf what these packages decide to check (or not).

Also, much modern GNU software takes advantage of gnulib which provides portable versions of less-than-portable facilities... sometimes it's not a matter of whether a particular system call is supported, but that it works differently (sometimes subtly differently) on different systems. That's still true today on systems like Solaris, various BSD variants, etc. even without considering IRIX.

Linker limitations on 32-bit architectures

Posted Aug 28, 2019 14:20 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

Toolchains and platforms are much more uniform these days.

- Any significant platform differences usually need conditional codepaths *anyways* (think epoll vs. kqueue)
- POSIX exists and has been effective at the core functionality (see the above for non-POSIX platforms)
- Broken platforms should fix their shit (your test suite should point this stuff out), but workarounds can be placed behind #ifdef for handling such brokenness (with a version constraint when it is fixed)
- Compilers are much more uniform because new compilers have to emulate one of GCC, Clang, or MSVC at the start to show that they are actually working with existing codebases

Linker limitations on 32-bit architectures

Posted Aug 27, 2019 15:21 UTC (Tue) by naptastic (guest, #60139) [Link] (3 responses)

Two questions. First, will this help?

https://lwn.net/Articles/795384/

Second: why is libwebkit2gtk-4.0 so big?

Linker limitations on 32-bit architectures

Posted Aug 27, 2019 15:40 UTC (Tue) by patrakov (subscriber, #97174) [Link] (1 responses)

First: yes, if the distro concludes that the debug info captured in the new format is good enough. Second: it's debug info. A stripped version of libwebkit2gtk-4.0.so.37 takes only 42 megabytes.

Linker limitations on 32-bit architectures

Posted Aug 27, 2019 17:21 UTC (Tue) by mirabilos (subscriber, #84359) [Link]

-gstabs helps, but Aurélien indicated that the gcc/ld problems are the least ones.

As long as i386 et al. are release architectures, software that fails to link on 32-bit architectures is just RC-buggy… and it ought to be treated as such. Approach upstreams to “fix it or it gets removed”… but this needs more community power behind it.

Linker limitations on 32-bit architectures

Posted Aug 28, 2019 19:33 UTC (Wed) by nix (subscriber, #2304) [Link]

It could, in theory, iff people were happy with just type info (and, soon enough, backtrace info -- enough to do a bt and chase down argument types and inspect the args in the debugger). However, there is a tradeoff here: it is harder to link a CTF section than to link most other sections because we don't just append them to each other, we merge them together type-by-type and will soon deduplicate the types as we go. This consumes memory, though I hope to make it less memory to link than would be required to load all the CTF at once and then concatenate it. At the moment, we have no deduplicator to speak of, so it actually needs *more* memory than a concatenator would because of internal hashes for name lookup etc on top of the raw file data. This is a worst case and things will improve very soon.

Essentially we choose to trade off memory in favour of disk space. This is the same decision that most parts of the toolchain take (it takes much more memory to compile or link a program than the size of the resulting binary) -- but I too sometimes build on small machines, and will try not to make the situation too much worse!

I certainly hope to make linking CTF use much less memory than linking DWARF -- but that is mostly because DWARF is usually bigger, not because linking CTF is especially memory-efficient. However, this won't really help, since I expect that distros that adopt CTF will usually build with both CTF *and* DWARF, stripping the DWARF out to debuginfo packages and keeping the CTF around so that simpler stuff works without needing to install debuginfo packages. I can't imagine general-purpose distros abandoning DWARF generation entirely, so adding an extra format isn't going to reduce linker memory usage for them.

Linker limitations on 32-bit architectures

Posted Aug 27, 2019 15:52 UTC (Tue) by bored (subscriber, #125572) [Link] (1 responses)

Some of the no-mmap options are misleading. The ld.gold no-mmap output files flag actually still allocates large mappings, and then reads/writes in the entire image rather than mmap'ing it directly. This of course defeats the purpose of attempting to restrict the RAM usage. So while much of the code is written with a request access/release access model that maps to read()/write(), much of the code is written with the knowledge that its just accessing a large mmap region and doesn't rigorously follow the request/release model.

Linker limitations on 32-bit architectures

Posted Aug 27, 2019 19:34 UTC (Tue) by k8to (guest, #15413) [Link]

Probably those no-mmap options were included for environments where mmap itself has significant limitations, rather than memory limits.

Personally I've hacked up programs to reduce the memory space that building uses, notably old versions of UAE with "newer" compilers that went over the limits of my then x86 hardware. Obviously that's not a general solution, but just to say the experience is not brand new.

Linker limitations on 32-bit architectures

Posted Aug 27, 2019 16:06 UTC (Tue) by mads (subscriber, #55377) [Link]

You decide yourself if you want to keep debug symbols on Gentoo, thats kinda the point with the whole meta-distribution thing. You're the one deciding how you want stuff done.

https://wiki.gentoo.org/wiki/Debugging

Linker limitations on 32-bit architectures

Posted Aug 27, 2019 20:31 UTC (Tue) by roc (subscriber, #30627) [Link] (10 responses)

> And running tests under QEMU-based emulation is out of question, too, because of possible flaws in the emulator itself.

I don't understand the logic here. The probability of a user-space test accidentally passing due to a QEMU bug has to be very remote. The probability of a user-space test accidentally failing due to a QEMU bug is higher, but still very low (and if/when it occurs, it would be detected and QEMU could be fixed). And because even CPUs nominally for the same architecture have lots of differences, these issues also occur when you're not using QEMU (and you can't fix the hardware).

A stronger argument for not running tests under QEMU is surely that it's much slower.

Linker limitations on 32-bit architectures

Posted Aug 27, 2019 20:54 UTC (Tue) by pizza (subscriber, #46) [Link] (3 responses)

> A stronger argument for not running tests under QEMU is surely that it's much slower.

Surely that depends a great deal on the underlying system that's being emulated?

Using m68k as an example, no released 680x0 processor clocked over 75MHz, and Apple's built-in 68k emulator (clock for clock) had a ~6x performance penalty.

A modern x86_64 CPU clocking in over 3GHz (40x the raw clock speed of the fastest m68k, plus much better sustained IPC rates thanks to vastly superior I/O and memory subsystems) ought to do much better than real hardware ever could.

(Granted, Apple's emulator was equivalent to qemu-user rather than a full system emulator, but that's needed in this context..)

Linker limitations on 32-bit architectures

Posted Aug 27, 2019 21:07 UTC (Tue) by pizza (subscriber, #46) [Link]

Blargh. That last sentence should end:

...that's NOT needed in this context.

Linker limitations on 32-bit architectures

Posted Aug 27, 2019 21:47 UTC (Tue) by dezgeg (subscriber, #92243) [Link]

If the fastest m68k processor really clocks at 75MHz, I have to wonder how much use it is to spend time to get modern bloatware like webkit-gtk pass its testsuite there... it doesn't sound like the final package would be usable on a real system anyway...

Linker limitations on 32-bit architectures

Posted Aug 27, 2019 22:20 UTC (Tue) by roc (subscriber, #30627) [Link]

Yes, I agree that in some cases QEMU being slow would also be a bad argument.

Linker limitations on 32-bit architectures

Posted Aug 27, 2019 22:16 UTC (Tue) by dezgeg (subscriber, #92243) [Link] (1 responses)

I wonder what the author meant by "QEMU-based emulation", does it potentially include full system emulation (qemu-system-FOO) or was it specific to process-level emulation (qemu-FOO-static). Because from the build system point of view, the process-level emulation is especially tempting (just replace call to 'make check' with something like 'qemu-FOO-static make check' ) than full system emulation (now you also need a kernel image, a root filesystem image, a way to copy the build tree to the VM etc.). But the process-level emulation also has way more opportunities for bugs and weird behaviour.

For example, if a shell script run under qemu-arm-static runs `cat /proc/cpuinfo` to check for some CPU capabilities, what will happen? Does QEMU have the smarts to notice that a read() system call is being made for /proc/cpuinfo and substitute some emulated values for some ARM CPU, or will it just read the /proc/cpuinfo from the host machine resulting in values for some x86 cpu?

Linker limitations on 32-bit architectures

Posted Aug 27, 2019 22:24 UTC (Tue) by roc (subscriber, #30627) [Link]

> But the process-level emulation also has way more opportunities for bugs and weird behaviour.

Actually I would guess full-system emulation (i.e. running the native kernel) would be more likely to show bugs and weird behavior. The CPU+hardware behavior exposed to the kernel is a lot more complicated and less well tested in general than that exposed to user-space.

> For example, if a shell script run under qemu-arm-static runs `cat /proc/cpuinfo` to check for some CPU capabilities, what will happen? Does QEMU have the smarts to notice that a read() system call is being made for /proc/cpuinfo and substitute some emulated values for some ARM CPU, or will it just read the /proc/cpuinfo from the host machine resulting in values for some x86 cpu?

That is a good question. Issues like that could be fixed outside QEMU by running the process in a chroot environment. You may be doing that for cross-compilation anyway.

Linker limitations on 32-bit architectures

Posted Aug 27, 2019 22:28 UTC (Tue) by foom (subscriber, #14868) [Link] (1 responses)

Qemu has "fails to crash" bugs. They tend to increase emulation speed, by not checking unimportant edge cases.

The most annoying one to me is that it doesn't check pointer alignment for load/store instructions which fail when misaligned on real hardware. E.g. ldm and ldrd on armv7 require 4-byte alignment (even though ldr does not), but qemu does not check this.

It's unfortunately pretty easy to screw up alignment in C code, and cause your program to only crash on real hardware...

Linker limitations on 32-bit architectures

Posted Aug 28, 2019 0:34 UTC (Wed) by roc (subscriber, #30627) [Link]

Interesting, thanks. Sounds like it would be fairly easy to fix with an option.

Linker limitations on 32-bit architectures

Posted Aug 28, 2019 8:31 UTC (Wed) by chris_se (subscriber, #99706) [Link] (1 responses)

> The probability of a user-space test accidentally passing due to a QEMU bug has to be very remote.

That's not quite true - for example, if you try to emulate a platform that faults on unaligned memory accesses on another platform that doesn't do that (or at least doesn't do it for all cases where the former faults), then the emulated version will typically not fault, for performance reasons.

For example, arm64 typically supports unaligned access, but most arm32 systems don't, so running arm32 in an emulator on either e.g. arm64 or x86_64 will typically not catch these kinds of bugs.

And that's just one example of a type of bug that qemu isn't able to catch.

Linker limitations on 32-bit architectures

Posted Aug 28, 2019 11:01 UTC (Wed) by roc (subscriber, #30627) [Link]

That's a good point.

Note, however, that x86 can check alignment via the Alignment Check flag: https://stackoverflow.com/questions/1929588/how-to-catch-...
so it might be possible to emulate alignment faults with no overhead.

ecological concerns and old hardware

Posted Aug 27, 2019 22:38 UTC (Tue) by JoeBuck (subscriber, #2330) [Link] (9 responses)

It is sometimes argued that we should keep using old machines out of concern for the environment, but this ignores the high power consumption (both for direct power consumption and cooling) that is often required for a meager return in compute power. By continuing to use the old machine we don't have e-waste to dispose of, which is a good thing, but the electric bill and the waste heat could wind up being vastly more than a newer low-end machine would require.

ecological concerns and old hardware

Posted Aug 28, 2019 7:38 UTC (Wed) by vadim (subscriber, #35271) [Link] (2 responses)

Yup. Modern hardware can pack into 10W what an old machine couldn't into 200W.

There's no point in keeping that old Pentium around. Get yourself a Pi 4 or an Atom which will be much faster, have much more memory, support virtualization, and be supported by current software just fine all while having a much smaller power bill. Even a modern, desktop CPU is probably better power-wise due to all the advances in power saving.

ecological concerns and old hardware

Posted Aug 28, 2019 20:44 UTC (Wed) by arnd (subscriber, #8866) [Link] (1 responses)

If you only turn it on once a week to check for email, keeping the old Pentium 4 makes ecologically and economically. For any daily use, it does not.

ecological concerns and old hardware

Posted Aug 29, 2019 5:55 UTC (Thu) by eru (subscriber, #2753) [Link]

This is how most old people I know use a computer. They turn it of for some specific tasks, usually for electronic banking in addition to the email, then turn it off and do something in the real world. And are very annoyed or even scared when the bank web site starts complaining the browser is too old, or certificates have expired and scary-looking warnings pop up. Fixing this may then require upgrading the browser, which may require upgrading the OS, which may require a new computer...

ecological concerns and old hardware

Posted Aug 28, 2019 13:47 UTC (Wed) by eru (subscriber, #2753) [Link] (5 responses)

This argument may make sense for old desktop PC:s, but not so much for laptops and other portable devices that had low power requirements to start with.

Asking a user to ditch a perfectly working computer just because of bloated new software just feels wrong. In many cases the advances in the new software are marginal. You can say the user should then stick to old versions, but this is usually not sustainable for other reasons, like no more security fixes for old software, or a network protocol change makes it not interoperable. (Of course the resulting upgrade cycle is what keeps the computer industry humming, so I really should keep my mouth shut).

It may also be the user would prefer to spend his dollars or euros on something else. But in modern societies, one is almost forced to have a computing device that can access the net. And the web pages keep bloating too, and adopting features supported only on the newer browsers, thus contributing to the upgrade treadmill.

It would be interesting to see estimates about the energy needed to make a laptop, and how it compares to its lifetime power usage. Computing devices contain some extremely refined raw materials, and etching and packaging the chips also takes energy.

</rant>

ecological concerns and old hardware

Posted Aug 28, 2019 20:10 UTC (Wed) by k8to (guest, #15413) [Link]

A goal that requires people to stop caring about new features as a driver for adoption and payment seems doomed to failure.

Sadly.

ecological concerns and old hardware

Posted Sep 3, 2019 15:11 UTC (Tue) by mstone_ (subscriber, #66309) [Link] (3 responses)

The argument for keeping old hardware running fails to address the fact that far newer hardware is already being thrown out regardless of what any other individual is doing with their own hardware. A person can upgrade by simply replacing a 5 year old machine heading to the dump with a 15 year old machine, and take advantage of the improved power consumption, speed, and memory, without net increasing the number of machines in the dump.

ecological concerns and old hardware

Posted Sep 5, 2019 10:25 UTC (Thu) by davidgerard (guest, #100304) [Link] (2 responses)

I just had to replace my otherwise perfectly good 2013 desktop at the office with a new box, because the old box can't take more than 8 GB RAM.

I blame this entirely on web page bloat, where what was once HTML is now a virtual machine running several megabytes of JavaScript to lovingly render a few kilobytes of text.

I expect another raft of obsolescence when current CPUs start hitting the 38-bit address limit.

ecological concerns and old hardware

Posted Sep 5, 2019 12:42 UTC (Thu) by HelloWorld (guest, #56129) [Link] (1 responses)

8 GB is plenty for a web browser even nowadays. Most phones have less than that and are perfectly capable of displaying most websites.

I was using a machine with 8 GB until recently and it was perfectly capable of running KDE Plasma, Firefox and IntelliJ at the same time. It wasn't fast (though usable), but that was due to the slow CPU, not lack of RAM. I'd still be using it if it weren't for the fact that I had an opportunity to get a faster (used) machine for free.

ecological concerns and old hardware

Posted Sep 28, 2019 7:56 UTC (Sat) by sammythesnake (guest, #17693) [Link]

I just upgraded from 8GiB to 24 specifically because my web browser was constantly at 20GiB+ of virtual memory.

8GiB for a single web page is plenty, but even with discarded tabs, it's not enough for my usage patterns...

Linker limitations on 32-bit architectures

Posted Sep 3, 2019 13:27 UTC (Tue) by dave4444 (subscriber, #127523) [Link]

Use, -Wl,--hash-size xxxx. That's the knob gcc LD has to adjust the memory/performance ratio. High hash-size results in a larger memory footprint for LD and faster link times for large executables. Low hash-size results in lower memory footprint but longer link times.

Having done large links for 32bit builds, this makes quite a difference for link time (minutes), but if you've got to fit into a 2GB or 3GB virtual address space it can also help (at a longer link time).

Linker limitations on 32-bit architectures

Posted Sep 12, 2019 22:31 UTC (Thu) by frostsnow (subscriber, #114957) [Link]

This reminds me of the effort I had to put into the mfgtools' uuu flasher in order to flash the Librem5 from my 32-bit ARM device (the flasher was trying to mmap a >3GB file): https://www.frostsnow.net/blog/2019-08-04.html

Linker limitations on 32-bit architectures

Posted Sep 13, 2019 15:37 UTC (Fri) by marcH (subscriber, #57642) [Link] (1 responses)

> However, cross-compilation (in the sense used by these projects) invariably means the inability to run compiled code.

So, developers of embedded systems don't regularly run tests on real hardware? That doesn't seem to make sense...

Linker limitations on 32-bit architectures

Posted Sep 13, 2019 15:41 UTC (Fri) by marcH (subscriber, #57642) [Link]

Going back I see I missed one of the comment threads above, sorry for the noise.

Linker limitations on 32-bit architectures

Posted Nov 12, 2019 19:19 UTC (Tue) by mcfrisk (guest, #40131) [Link]

Chromium on Debian has not been compiling on i686 HW since 2014 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=765169 so this isnt a new problem. Solution approved by release etc teams was that i686 was compiled on amd64 machines and kernels in a 32bit chroot. Solutions would have been nice back then but very few people cared enough. Heck, even on amd64 with modern c++ and tens of gigabytes physical RAM, memory often runs out and oom killer wrecks builds. Finding reliable parallel build options is still a black art since tools only count threads and cores, not available memory.
Just ask anyone who bitbakes a lot :)