An introduction to RISC-V
LWN has covered the open RISC-V ("risk five") processor architecture before, most recently in this article. As the ecosystem and tools around RISC-V have started coming together, a more detailed look is in order. In a series of two articles, I will look at what RISC-V is and follow up with an article on how we can now port Linux distributions to run on it.
The words "Free and Open RISC Instruction Set Architecture
" are
emblazoned across the web site of the RISC-V
Foundation along with the logos of some possibly surprising companies:
Google, hard disk manufacturer Western Digital, and notable ARM
licensees Samsung and NVIDIA. An instruction set architecture (ISA)
is a specification for the instructions or machine code that you feed
to a processor and how you encode those instructions into a binary
form, along with many other precise details about how a family of
processors works. Modern ISAs are huge and complex specifications.
Perhaps the most famous ISA is Intel's x86 — that specification
runs to ten volumes.
More importantly, ISAs are covered by aggressive copyright, patent, and trademark rules. Want to independently implement an x86-compatible processor? Almost certainly you simply cannot do that without making arrangements with Intel — something the company rarely does. Want to create your own ARM processor? You will need to pay licensing fees to Arm Holdings up front and again for every core you ship.
In contrast, open ISAs, of which RISC-V is only one of the newest, have permissive licenses. RISC-V's specifications, covering user-space instructions and the privileged instructions are licensed under a Creative Commons license (CC BY 4.0). Furthermore, researchers have determined that all RISC-V instructions have prior art and are now patent-free. (Note this is different from saying that implementations will be open or patent-free — almost certainly the highest end chips will be closed and implementations patented). There are also several "cores" — code that compiles to Verilog and can be programmed into an FPGA or (with a great deal more effort) made into a custom chip — licensed under the three-clause BSD.
Unlike earlier open ISAs, RISC-V's main features are that it is scalable and that it is primarily a specification that allows for multiple implementations. RISC-V starts with a choice of 32-, 64- or 128-bit integer-only specifications that we call "RV32I", "RV64I", or "RV128I". (I'm not going to cover the 128-bit ISA any further in this article because it is still in the design phase and there is only one software implementation, written by the inimitable Fabrice Bellard.) The "I" stands for "integer" and includes the basic processor features like loads, stores, jumps, and integer arithmetic. The architecture however is scalable and other extensions are common. Most Linux-capable RISC-V chips will be "RV32IMAFDC" or "RV64IMAFDC" where the letters mean:
I Integer and basic instructions M Multiply and divide A Atomics F IEEE floating point (single precision) D IEEE floating point (double precision) C Compressed instructions
For convenience "IMAFD" can be written "G" (for "general purpose") and so you will more commonly see those chips described as "RV32GC" or "RV64GC".
Most Linux-capable designs have skipped 32-bit variants entirely; in the second article I will describe Fedora on RISC-V, which is entirely concentrating on RV64GC. For completeness I should also say there is a cut-down embedded specification called "RV32E" that has half the number of general-purpose registers but is otherwise identical to RV32I. Since RV32E machines are likely to have only a few kilobytes of RAM and lack a "supervisor" mode, they are unlikely to ever run Linux.
RISC-V has 31 general purpose registers (15 for RV32E), approximately double the number visible to the programmer on x86-64. This simple unoptimized loop counting to 1000 demonstrates some features of the instruction set:
- binary - - mnemonic - fe042623 sw zero,-20(fp) # store zero into stack slot a031 j L2 # compressed jump L1: fec42783 lw a5,-20(fp) # load stack slot into a5 2785 addiw a5,a5,1 # compressed increment fef42623 sw a5,-20(fp) # store back to stack L2: fec42783 lw a5,-20(fp) # load stack slot into a5 0007871b sext.w a4,a5 # sign extend a5 into a4 3e700793 li a5,999 # load immediate fee7d5e3 ble a4,a5,L1 # compare and branch
Registers are named x1 through x31 (with x0 being logically wired to zero), but the assembler provides a set of names like a0-a7 for function arguments and return values, t0-t6 for temporaries, fp for the frame pointer, sp for the stack pointer, zero for the zero register, and others. These are just aliases for the x-names. The floating-point extensions (if present) add 32 more registers, and it is expected that future extensions like vectorization will add more.
Instructions are variable length, with the basic length being 32 bits. Many common instructions can be compressed to 16 bits when using the compressed extension (that is expected to be present in all Linux-class chips). Longer instructions are possible too, with the more obscure extensions expected to use them. Unlike x86, variable length does not have to mean "horribly complex to decode". The encoding ensures that the processor can easily see the length of every instruction in its prefetch queue by decoding a few bits in a uniform location. This is even the case where the code is using extensions that the processor does not understand (e.g. for handing them off to a co-processor or to trap and emulate them).
Although the architecture is (by design) simple, boring, and similar to others that have gone before, one interesting area is the approach to complex instructions such as specialized instructions for string handling, video decoding, or encryption. Some of these may be implemented in future extensions. For others, the designers have expressed a preference not to add complex instructions to the specification but instead to rely on macro-op fusion for performance. (Note there is a patent claim on a limited version of this technique, although it expires in 2022.) Processors are expected to detect sequences of simpler instructions that together perform some complex operation (e.g. copying a string) and fuse them together at run time into a single more efficient macro operation. How this wish will meet reality is yet to be seen, but it does mean that, for now, writing a RISC-V emulator is relatively easy because there are only simple instructions.
To make a real computer you need a lot more than just a core, and RISC-V is at least beginning to supply more of those pieces. Code is available for an L1 cache, a cache-coherence and inter-core communication protocol called TileLink, ChipLink, which is an inter-socket version of TileLink, an external hardware debugging interface, and the beginnings of an interrupt controller. But there are many missing pieces: everything from DDR4 interfaces for memory, to ethernet, to GPUs. In the first silicon, and perhaps for a long time to come, these will all be proprietary even if paired with open-source CPUs.
Linux kernel 4.15 added basic RISC-V support, which is sufficient to boot but not much else (there are no interrupts and hence no significant device support). For now you have to use the out-of-tree riscv-linux kernel, although it is expected that most things will be upstream by 4.17. GCC and binutils support has been upstream for over a year, but you are recommended to use at least GCC 7.3.1 and binutils 2.30.
The final missing piece for Linux was a stable glibc ABI, which was added in February 2018 with glibc 2.27. This allows Linux distributions to start to compile packages, knowing that we won't have to recompile everything from scratch if there's a change to the glibc ABI.
And finally, where can you get RISC-V hardware to run Linux on? At the time of this writing almost no hardware is available. A few lucky people have SiFive's HiFive Unleashed development board that has four 64-bit application cores (RV64GC) plus a power management core (RV32IMAC), but costs at least $999. However there is QEMU support in 2.12 that can be used to run Fedora. There are also plenty of FPGA implementations, although you will find that they run much more slowly than QEMU and have limited RAM and device support.
It's expected that the hardware landscape will change quickly in the coming year, with much cheaper iterations of the HiFive Unleashed and several other companies announcing hardware. One surprise though: you might have a RISC-V chip in your PC in the near future. Western Digital has announced that it will transition the cores used in its hard disks and other storage devices to RISC-V; currently it ships over a billion cores each year.
Look for the second article in this series, where I will cover how Fedora was ported to RISC-V.
Index entries for this article | |
---|---|
GuestArticles | Jones, Richard W.M. |
Posted Mar 14, 2018 16:58 UTC (Wed)
by JoelSherrill (guest, #43881)
[Link] (2 responses)
Posted Mar 14, 2018 18:31 UTC (Wed)
by willy (subscriber, #9762)
[Link] (1 responses)
Posted Mar 14, 2018 18:41 UTC (Wed)
by JoelSherrill (guest, #43881)
[Link]
Posted Mar 15, 2018 0:05 UTC (Thu)
by flussence (guest, #85566)
[Link] (5 responses)
(I'll take “horrible to decode” at face value in any case - I can't even make sense of x86's mnemonic names most of the time!)
Posted Mar 15, 2018 3:57 UTC (Thu)
by roc (subscriber, #30627)
[Link] (1 responses)
You can figure out the length of an instruction by decoding the first 16 bits, and 16 bits is the minimum instruction length. (This might change if they ever decide to add instructions more than 24 bytes long.)
The bytes of an instruction after the first 16 bits can be anything, so there is no way, given an arbitrary address, to reliably find the start or end of the instruction, unlike UTF8 where given a pointer you can find the start and end of the character. This seems like an OK tradeoff though.
Posted Mar 15, 2018 20:08 UTC (Thu)
by flussence (guest, #85566)
[Link]
Posted Mar 15, 2018 7:58 UTC (Thu)
by rwmj (subscriber, #5474)
[Link] (2 responses)
>I'll take “horrible to decode” at face value in any case
The Linux kernel is capable of decoding the length of an x86 instruction, given the first byte. The code is fairly intricate: arch/x86/lib/insn.c
Initial 8086 chips didn't have to worry about instruction boundaries because the microcode fetched, decoded and executed bytes one at a time. Over time x86 encoding has piled complexity on complexity. Now we know that instruction prefetch queues are a thing and that you have to split on instruction boundaries early so it's possible to design something to make this much simpler.
Posted Mar 17, 2018 13:53 UTC (Sat)
by ianmcc (subscriber, #88379)
[Link] (1 responses)
Posted Mar 17, 2018 14:50 UTC (Sat)
by rwmj (subscriber, #5474)
[Link]
Posted Mar 22, 2018 13:02 UTC (Thu)
by kragil (guest, #34373)
[Link] (4 responses)
Posted Mar 22, 2018 14:42 UTC (Thu)
by deater (subscriber, #11746)
[Link]
I don't want to get involved in an assembly-language beauty discussion, but if you know anything about the history of RISC-V and the people involved it's pretty clear RISC-V assembly language is more or less exactly the same as MIPS.
The main thing I have against MIPS assembly is that there are two names for each register (the generic one, and then the mnemonic one (such as a0, s0, etc.)) and it can get really confusing trying to remember the mapping between them.
Posted Mar 22, 2018 15:10 UTC (Thu)
by farnz (subscriber, #17727)
[Link]
RISC-V looks a lot more like MIPS or PowerPC to me than it does to x86 or 68k.
Posted Mar 23, 2018 11:20 UTC (Fri)
by rwmj (subscriber, #5474)
[Link] (1 responses)
Yes I find the MIPS-inspired asm to be annoying, particularly the fact that there's no common concept of source and destination, eg:
I also programmed the 68k and Z80 as a commercial programmer back in the day, but I recognize that few people are hand coding large volumes of asm these days, even for embedded platforms.
In fact with complex rules for immediate loads, pervasive use of pseudo instructions (both lw and sw above aren't "real" instructions, they expand to one or two base instructions), "linker relaxation", compressed instructions, superscalar, macro-op fusion etc I doubt it's really feasible.
Posted Mar 25, 2018 14:53 UTC (Sun)
by Jonno (subscriber, #49613)
[Link]
In RISC-V assembler the destination (if any) always come first. However, note that the non-atomic store instructions (sd, sw, sh, and sb) does not have a destination, only two operands (a value and an address) and a side effect (a memory write)...
An introduction to RISC-V
An introduction to RISC-V
An introduction to RISC-V
An introduction to RISC-V
Just wondering aloud, as I have no idea where to begin to look this up: does it resemble UTF-8 with a unary length prefix (but without UTF-8's other inefficiencies)? Or is it something different? I'm curious what works best for hardware with no legacy compat to worry about.
An introduction to RISC-V
An introduction to RISC-V
An introduction to RISC-V
An introduction to RISC-V
An introduction to RISC-V
An introduction to ugly RISC-V assembler
One example:
move.w #$500,d0
moves the word hex 500 to d0. The list would go on and on. It was so much more readable and nicer than x86-assembler.
But I guess they wanted to look like ugly and stupid x86 for some reason (IMNSHO).
An introduction to ugly RISC-V assembler
An introduction to ugly RISC-V assembler
An introduction to ugly RISC-V assembler
li a0, immediate # dst, src
lw a0, addr # dst, src
sw a0, addr # src, dst
An introduction to ugly RISC-V assembler