|
|
Log in / Subscribe / Register

Hardware-assisted Arm VMs for s390

By Daroc Alden
May 5, 2026

A recent patch set from Steffen Eiden and others has set the groundwork for allowing hardware-assisted emulation of Arm CPUs on s390 CPUs. Version two of the posting fixes a handful of smaller problems, but does not differ much. The patches were welcomed by the Arm maintainers, pending some discussion of how the collaboration between the architectures could be structured to prevent maintainability problems on the Arm side. When those details are resolved, the patches could pave the way for transparently running Arm-based virtual machines (VMs) on s390 hosts at native or near-native speeds.

The core of the feature is a patch that adds support for a new s390 instruction called "Start Arm Execution" (SAE). It performs a similar function to the existing "Start Interpretive Execution" instruction on s390 that is used to enter a hardware-assisted virtual machine while keeping the virtual CPU state separate from the host CPU. Both instructions take a pointer to a "control block" that describes how the virtual CPU should be set up and entered. The difference is that a SAE instruction's control block sets the instruction pointer to a block of memory containing Arm instructions and interprets them as such, rather than s390 instructions.

In theory, this allows someone with a s390 CPU new enough to support the feature to run an Arm virtual machine directly. There clearly has to be a translation from Arm machine code to s390 machine code at some point, but the CPU handles that internally. How exactly it does that is not clear from the patch set.

On the kernel side, while the idea is simple, the implementation is a bit more complex. When virtual machines make calls to their hypervisor, they do so using an architecture-specific interface. If Arm virtual machines are to run on s390 unmodified, the KVM code in an s390 kernel needs to be able to interpret Arm hypercalls. To support that, Eiden's patch set moves the interface definition and related header files to include/uapi/arch/arm64/, allowing other architectures to reference them. Amusingly, this allowed some duplicated code to be cleaned up; the patches ended up removing more lines of code than they added.

There is additional work to be done once these patches are merged, however. The s390 architecture is also gaining instructions for manipulating Arm registers, handling interrupts, and so on. Eiden hopes to add support for those over the coming months in conjunction with the rest of the s390 developers.

There was relatively little feedback on the patch set, but Marc Zyngier had a lengthy response written in consultation with Will Deacon. They were supportive of the effort, but worried that moving some of the Arm-specific code to a shared directory was messy and could interfere with the code's maintainability. They suggested that using symbolic links, relative paths, or code generation could allow the existing Arm code to remain it its accustomed location without unduly restricting s390 code.

Eiden's reply explained that they had tried using symbolic links for header files originally, but feared the approach wouldn't gain support since it was somewhat messy. Some of the code is also automatically generated, reusing the Makefile rules from the Arm code. But he agreed to prototype an alternative that makes use of symbolic links and post that for comparison. That patch set was sent on the April 28 and has not received any review as of the time of writing.

A more serious concern was how s390 would keep up with changes to the Arm virtual-machine-guest API, which is still evolving, Zyngier said. For example, there are some CPU-vulnerability mitigations that require cooperation from guests. When a new one is discovered, Arm assigns a hypercall function number for the new interface needed, and KVM implements it. The s390 code will presumably need something similar, both adding stubs for new Arm-specific mitigations and working with Arm to add any s390-specific interfaces needed. Those should be limited to mitigations, however, since the Arm KVM code "makes a point of forbidding any use of implementation specific instruction or system registers," and Zyngier expects the s390 code to do the same.

Eiden was not overly worried about this, saying that a primary goal of the project was to be able to run unmodified Arm guests, and so he and his fellow s390 developers would be treating the Arm code as the source of truth. As such, they had no plans to introduce s390-specific features or abrogate the usual process.

Zyngier and the other Arm maintainers are, understandably, not familiar with the details of s390. So, Zyngier and Deacon also asked for help with documentation, testing, and debugging, to ensure that changes to the Arm code do not have adverse effects on s390.

Finally, we feel it would be beneficial for both projects to swap prisoners and have cross-reviewers in MAINTAINERS, so that there is an s390 reviewer added to KVM/arm64, and an arm64 reviewer added to KVM/s390.

Eiden readily agreed, and suggested that cross-compilation of the kernel for s390 was a good starting point for testing. He offered access to s390 virtual machines hosted by IBM for doing native builds, as well. Eiden plans to add Arm's continuous-integration-testing branches to s390's build infrastructure, to catch any breakage early. He also thought an exchange of maintainers made sense, and added himself as an Arm KVM reviewer in the second revision of the patch series.

Eiden and Zyngier made plans to meet up with the other s390 and Arm kernel developers at the Linux Plumbers Conference Linux Storage, Filesystem, Memory Management, and BPF Summit. Hopefully an in-person meeting will suffice to figure out any remaining details, but since everyone seems happy enough with the change that is quite likely.

[ Thanks to Andi Holmes for bringing this topic to our attention. ]


Index entries for this article
KernelArchitectures/Arm
KernelArchitectures/s390
KernelKVM


to post comments

Like the NEC V30

Posted May 5, 2026 15:43 UTC (Tue) by rwmj (subscriber, #5474) [Link] (10 responses)

Memories of the NEC V30, an 8086 clone chip that could enter 8080 mode (eg to run CP/M)! Of course those two chips were very similar to each other, basically I believe it was just different microcode. Rather unlike S390 vs Arm that have completely different microarchitectures, even different endianness!

Like the NEC V30

Posted May 5, 2026 16:38 UTC (Tue) by joib (subscriber, #8541) [Link] (1 responses)

Considering the cost of an IBM mainframe system, one wonders if the "ARM emulation" isn't implemented by a separate ARM core (cost of this is likely a rounding error compared to the total cost of the system?), and then just the necessary instructions to the main CPU to kick off the ARM CPU.

Like the NEC V30

Posted May 5, 2026 18:02 UTC (Tue) by ballombe (subscriber, #9523) [Link]

Memory access would be extremely slow.

Like the NEC V30

Posted May 5, 2026 18:57 UTC (Tue) by pbonzini (subscriber, #60935) [Link] (7 responses)

The 8080 and 8086 were very different internally, but the 8080 was enough of a subset of the 8086 (it was designed to be source-translatable) that it was probably a separate *decode PLA* producing signals compatible with the 8086 front end. Thus, the separate hard-wired decoder fed a shared microcode engine (plus ALU, registers, etc.).

Like the NEC V30

Posted May 5, 2026 20:18 UTC (Tue) by rwmj (subscriber, #5474) [Link] (6 responses)

Indeed, because of course the microcode wouldn't work with the different instruction format.

I'm half tempted to buy the NEC V30 on eBay right now for £12 and send it to Ken Schirriff :-) Unfortunately I don't have a convenient way to test that it isn't fake.

Like the NEC V30

Posted May 6, 2026 6:17 UTC (Wed) by pbonzini (subscriber, #60935) [Link]

The microcode actually would! 8086 microcode does little or no decoding, it has "indirect" registers that point to whatever comes out of the decode PLA. For example the microcode subroutine for MOV is essentially "A<-B" and the decoder tells the microcode engine what A and B mean. As long as a hypothetical 8080 decode PLA generates the same signals for LD A,B that the 8086 PLA generates for MOV AL,CL you can reuse the same microcode for both.

Like the NEC V30

Posted May 6, 2026 8:56 UTC (Wed) by eru (subscriber, #2753) [Link] (4 responses)

I'm half tempted to buy the NEC V30 on eBay
Unless you have some unusual PC clone with a 8086, you probably need a NEC V20, which was the 8088 clone, and slots right in into most PC/XT clones. V30 was the 8086 clone.

I used to have a "Bondwell" PC/XP clone at home, and at one point I swapped a V20 into it. It runs slightly faster, mainly because the MUL instruction is faster. Also found a CP/M emulator from some FTP site that took advantage of the chip.

Like the NEC V30

Posted May 6, 2026 19:03 UTC (Wed) by jmalcolm (subscriber, #8876) [Link] (1 responses)

> V30 was the 8086 clone
> you probably need a NEC V20, which was the 8088 clone

The original comment by @rwmj was that the V30 that was able to enter an "8080 mode". Like the S390 being discussed in this article, the V30's 8080 emulation was activated by a special assembly instruction from 8086 mode (BRKEM, apparently).

Like the NEC V30

Posted May 7, 2026 6:34 UTC (Thu) by eru (subscriber, #2753) [Link]

Yes, and the same on V20. The V20/V30 had another enhancement over 8088/8086: It implemented the instructions added in 80186, like ENTER/LEAVE and improvements to IMUL. I could compile C programs with MS-C 186 target option, making them run a bit faster.

Like the NEC V30

Posted May 7, 2026 7:41 UTC (Thu) by rwmj (subscriber, #5474) [Link] (1 responses)

The V30 was a popular upgrade for the Amstrad PC1512/1640 line of computers that used an 8086 running at 8MHz normally. These were very common in the UK and probably the first "PC" that most people had ("PC" in quotes because they were not very compatible).

Like the NEC V30

Posted May 7, 2026 9:09 UTC (Thu) by knewt (subscriber, #32124) [Link]

We had a PC1512 back in the day, indeed. Upgraded with a V30, and also an 8087 floating point co-processor! Plus we had a 20MB 'hard card'. You had to manually run a park heads command in DOS before turning the computer off :)

The CPU probably just has different decoders

Posted May 5, 2026 18:16 UTC (Tue) by DemiMarie (subscriber, #164188) [Link] (8 responses)

I doubt the CPU is translating one kind of instruction to another. Instead, I suspect that the CPU just has different decoders that feed a shared backend.

The CPU probably just has different decoders

Posted May 5, 2026 18:53 UTC (Tue) by jmalcolm (subscriber, #8876) [Link] (7 responses)

You are likley correct that this is what "hardware assissted" means.

The ARM instructions are probably being transformed directly into S390 micro-ops. Most of the chip would have no idea if it was executing ARM or S390 code.

It would not make much sense to have an in-hardware ARM -> S390 transpiler.

The CPU probably just has different decoders

Posted May 5, 2026 20:09 UTC (Tue) by WolfWings (subscriber, #56790) [Link] (3 responses)

Wouldn't be the first such chip widely available either, with the RP2350 series having switchable (PER CORE) Arm/RISC-V as options.

Variant decoders feeding the same ALU backends being present even on sub-USD$1 chips says how widely available the tech is these days.

The CPU probably just has different decoders

Posted May 5, 2026 22:41 UTC (Tue) by muase (subscriber, #178466) [Link]

> with the RP2350 series having switchable (PER CORE) Arm/RISC-V as options [...] Variant decoders feeding the same ALU backends [...]

That is interesting; I always was under the impression that the RP2350 has different CPU cores (2x ARM and 2x RISC-V)? And that those cores are just not cooperative in the sense that the surrounding fabric only supports one pair at a time?

The overall wording suggests the same; e.g. "The unique dual-core, dual-architecture capability of RP2350 allows users to choose between a pair of industry-standard Arm Cortex-M33 cores, and a pair of open-hardware
Hazard3 RISC-V cores" [1] or "There are two sockets for cores to attach to the system bus [...] The processor
plugged into each socket is selectable at boot time. [...] Whichever processor is unused is held in reset with its clock gated at the top level. Unused processors use zero dynamic power." [2]

Do you have any sources that both instruction sets share the same ALU?

[1] https://pip-assets.raspberrypi.com/categories/1214-rp2350...
[2] https://pip-assets.raspberrypi.com/categories/1214-rp2350...

The CPU probably just has different decoders

Posted May 6, 2026 18:56 UTC (Wed) by jmalcolm (subscriber, #8876) [Link] (1 responses)

> the RP2350 series having switchable (PER CORE) Arm/RISC-V

A core that implemented both the RISC-V and ARM ISA would be very cool. However, is that what is happening in the RP2350?

Wikipedia describes the RP2350 as "Two different CPU designs sharing the same computer bus and a 150 MHz clock"
https://en.wikipedia.org/wiki/RP2350

The ARM Cortex-M33F is a core design that is licensed from ARM. It would be very surprising if ARM would allow this design to be modified to feed the core with a RISC-V decoder. It also seems unlikely that the microarchitecture could be fed directly by a RISC-V decoder without modification.

Similarly, the RP2350 is described as using Hazard3 on the RISC-V side. Like the M33F, this is a specific core design. It would be strange to describe a core that could handle both ARM and RISC-V ISA specifically as a Hazard3. Even if the Hazard3 could handle an ARM ISA decoder, ARM themselves is very unlikely to have licensed their ISA to be used in this way.

https://github.com/wren6991/hazard3

Both Hazard3 and M33F are 3-stage processors so I suppose it is at least possible. But it seems very unlikely without further evidence.

This is further re-enforced by the fact that, on the RP2350, you have to select which cores to activate at boot time. To switch cores, you have to reset and boot again. Compare this to the S390 implementation described in this story, where the processor can be switched to ARM mode via a specialized S390 instruction at runtime.

I think the M33F and Hazard3 cores on the RP2350 are physically distinct from each other.

The CPU probably just has different decoders

Posted May 13, 2026 15:24 UTC (Wed) by anton (subscriber, #25547) [Link]

It also seems unlikely that the microarchitecture could be fed directly by a RISC-V decoder without modification.
Given the small number of architectural features of RISC-V, this does not appear that problematic to me (certainly not as bad as the other way round, where a microarchitecture designed without flags registers would have to accomodate an ISA that has flags. Thinking about the differences, one thing that comes to mind is that ARM has a different division-by-zero result than RISC-V (I don't remember which architecture produces 0 and which -1). Another thing that comes to mind is that 32-bit instructions on RV64 usually sign-extend, and on ARM A64 zero-extend. Depending on the encoding of the actual microops, this may or may not require an additional sign-extending microop for every such RISC-V instruction.

That being said, I also doubt that the RP2350 is implemented as sharing execution units between decoders.

The CPU probably just has different decoders

Posted May 6, 2026 12:28 UTC (Wed) by maxfragg (subscriber, #122266) [Link] (2 responses)

given the close relationship between S/390 and POWER at times, I had the suspicion for a long time, that the main difference between those 2 cores was a different frontend/decoder, probably microcode driven.
Any hints that S/390 CPUs can directly execute ARM code just reinforces this suspicion.
Given the mandatory complexity of all the S/390 instruction set variants, keeping around support for ARM in the same decoder probably doesn't really hurt.

The CPU probably just has different decoders

Posted May 6, 2026 22:22 UTC (Wed) by willy (subscriber, #9762) [Link] (1 responses)

That's not even close to reality. The IBM POWER and zCPU teams were completely separate and designed very different CPUs. Read, for example, https://www.realworldtech.com/z196-mainframe/

The CPU probably just has different decoders

Posted May 7, 2026 9:35 UTC (Thu) by maxfragg (subscriber, #122266) [Link]

POWER6 and z10 are very close siblings and the article you linked also frequently mentions similarities between the z196 cores and POWER4-7 core designs.

"The binary FPU was originally derived from the POWER6 design, with added support for hexadecimal data. It is a fully pipelined unit that is 9 stages deep and can perform a 64-bit multiply-accumulate. However, there are 2 extra pipeline stages to convert from the native zArchitecture data formats to the internal execution format (which was optimized for PowerPC)."

From what I understand the POWER6/z10 generation probably was one of the more similar ones, while later some parameters in the design where changed depending if the goal was a z-Series or a POWER CPU, but they clearly still stem from the same basic design.
IBM never had the manpower to design 2 independent high end CPU cores.
obviously, there is a lot of uncore/io stuff thats very different, but the cores them self have a lot of shared history and commonalities.

Meeting plans?

Posted May 5, 2026 21:50 UTC (Tue) by mzyngier (subscriber, #32898) [Link] (2 responses)

Eiden and Zyngier made plans to meet up with the other s390 and Arm kernel developers at the Linux Storage, Filesystem, Memory Management, and BPF Summit in Croatia in early May.
Have we? I wish someone told me. LPC, maybe. KVM Forum at a push. But LSFMMBPFS is definitely not on my acronym list.

Meeting plans?

Posted May 6, 2026 6:53 UTC (Wed) by seiden (subscriber, #156657) [Link]

That's news to me as well :)

Meeting plans?

Posted May 6, 2026 7:04 UTC (Wed) by daroc (editor, #160859) [Link]

... huh. I really thought the mailing list said that. Let me find the quote ...

Whoops! The mailing list says 'LPC', and I misread it. I'll correct the article.

Transmeta?

Posted May 5, 2026 21:59 UTC (Tue) by Klaasjan (subscriber, #4951) [Link] (1 responses)

Am I the only/first one to think of the Transmeta Crusoe cpu(s)?

Transmeta?

Posted May 6, 2026 17:52 UTC (Wed) by guillemj (subscriber, #49706) [Link]

Ah, Transmeta also came to my mind, but only after the Arm Jazelle extension (to execute Java bytecode), which seemed more closely related. :)

April fool?

Posted May 6, 2026 6:38 UTC (Wed) by cpitrat (subscriber, #116459) [Link] (3 responses)

The patch set is from April 2nd and reading it really makes me wonder whether this is a late April fool joke. Hasn't s390 been discontinued since the early 2000s? Aren't all s390 users using virtualization for it already?

April fool? No...

Posted May 6, 2026 6:44 UTC (Wed) by flewellyn (subscriber, #5047) [Link]

IBM's z/Architecture is still called s390 in the Linux kernel, for historical reasons, and referred to that way by Linux developers. It's certainly modern Z hardware that's supported here.

April fool?

Posted May 6, 2026 12:14 UTC (Wed) by pbonzini (subscriber, #60935) [Link]

It's absolutely not an April fool joke, and it was not posted on April 1st (partly) to avoid equivocating. Though it would have been awesome in my opinion to leave people hanging!

April fool?

Posted May 6, 2026 16:11 UTC (Wed) by neggles (subscriber, #153254) [Link]

in this case s390 means s390x, the modern 64-bit ISA

Not really so exotic

Posted May 6, 2026 19:36 UTC (Wed) by jmalcolm (subscriber, #8876) [Link] (1 responses)

I am laughing at myself at the moment for being so fascinated with the idea of the S390 decoding the ARM ISA into a shared back-end and even with the V30 described below with its ability to execute both 8086 and 8080 code in a similar way. Why?

Well, I am typing this on a computer with an Intel i7 in it. And that chip behaves in exactly the same way if we are being honest.

Intel x86-64 chips implement both the x86 (32 bit) and x86-64 (64 bit) ISA. They are not the same. In fact, it also executes 8086 code just like the v30! And just the like the chips here, migrating between different ISA options is done in machine language (assembly).

"Real-mode" is 8086 emulation. "Protected mode" is Pentium emulation. "Long mode" is the full x86-64 ISA. I would say that "long mode" is the native ISA but of course, as we all know, the system actually starts in "real mode".

Like the S390 being described in this article, a modern Intel chip supports all of these ISA modes with a common back-end.

Ok, my description above is inaccurate in that "real-mode" is not really 8086 emulation. And the 386 and above have a VM86 mode in addition to "real mode" that is meant to emulate an 8086 while still providing protected mode features. The ISA available in real-mode is much closer to the full Pentium ISA, including FPU, MMX instructions, additional registers and 32 bit math. The Intel real-mode ISA never existed in a chip that did not also have a "protected mode" available. There are still different ISAs to choose from; it is just that the least capable is still more advanced than a real 8086.

Not really so exotic

Posted May 7, 2026 11:44 UTC (Thu) by farnz (subscriber, #17727) [Link]

I would argue that most modern CPUs don't have an exposed native ISA at all.

Instead, they have a chip-specific internal execution model (parallel EUs executing µops as soon as possible, and tracking to ensure that the exposed architectural state does not get out-of-sync with the instructions that have been executed in the ISA model), and a front end decoder that translates ISA instructions to this chip's internal execution model.

The ISA is no longer native, but instead is there because the perfect set of µops for this chip is not only changing generation to generation, but also for different designs in each generation. Basically, we sacrifice significantly less than 1% of peak performance in return for not having to build separate binaries for 12th generation Core, 13th generation Core, Xeon, Celeron, Pentium etc, and not having to rebuild everything for each new CPU that comes out.

Don't forget in your list of modes that your CPU also has the VT-x mode, where it emulates itself in the same sense that VM86 emulates an 8086, and also SMM, where it runs in yet another execution context and may have yet more instructions you can run. And neither real mode nor protected mode emulate older chips; instead, they're distinct variant modes of the current chip's design.


Copyright © 2026, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds