|
|
Subscribe / Log in / New account

An introduction to RISC-V

March 14, 2018

This article was contributed by Richard W.M. Jones

LWN has covered the open RISC-V ("risk five") processor architecture before, most recently in this article. As the ecosystem and tools around RISC-V have started coming together, a more detailed look is in order. In a series of two articles, I will look at what RISC-V is and follow up with an article on how we can now port Linux distributions to run on it.

The words "Free and Open RISC Instruction Set Architecture" are emblazoned across the web site of the RISC-V Foundation along with the logos of some possibly surprising companies: Google, hard disk manufacturer Western Digital, and notable ARM licensees Samsung and NVIDIA. An instruction set architecture (ISA) is a specification for the instructions or machine code that you feed to a processor and how you encode those instructions into a binary form, along with many other precise details about how a family of processors works. Modern ISAs are huge and complex specifications. Perhaps the most famous ISA is Intel's x86 — that specification runs to ten volumes.

More importantly, ISAs are covered by aggressive copyright, patent, and trademark rules. Want to independently implement an x86-compatible processor? Almost certainly you simply cannot do that without making arrangements with Intel — something the company rarely does. Want to create your own ARM processor? You will need to pay licensing fees to Arm Holdings up front and again for every core you ship.

In contrast, open ISAs, of which RISC-V is only one of the newest, have permissive licenses. RISC-V's specifications, covering user-space instructions and the privileged instructions are licensed under a Creative Commons license (CC BY 4.0). Furthermore, researchers have determined that all RISC-V instructions have prior art and are now patent-free. (Note this is different from saying that implementations will be open or patent-free — almost certainly the highest end chips will be closed and implementations patented). There are also several "cores" — code that compiles to Verilog and can be programmed into an FPGA or (with a great deal more effort) made into a custom chip — licensed under the three-clause BSD.

Unlike earlier open ISAs, RISC-V's main features are that it is scalable and that it is primarily a specification that allows for multiple implementations. RISC-V starts with a choice of 32-, 64- or 128-bit integer-only specifications that we call "RV32I", "RV64I", or "RV128I". (I'm not going to cover the 128-bit ISA any further in this article because it is still in the design phase and there is only one software implementation, written by the inimitable Fabrice Bellard.) The "I" stands for "integer" and includes the basic processor features like loads, stores, jumps, and integer arithmetic. The architecture however is scalable and other extensions are common. Most Linux-capable RISC-V chips will be "RV32IMAFDC" or "RV64IMAFDC" where the letters mean:

I Integer and basic instructions
M Multiply and divide
A Atomics
F IEEE floating point (single precision)
D IEEE floating point (double precision)
C Compressed instructions

For convenience "IMAFD" can be written "G" (for "general purpose") and so you will more commonly see those chips described as "RV32GC" or "RV64GC".

Most Linux-capable designs have skipped 32-bit variants entirely; in the second article I will describe Fedora on RISC-V, which is entirely concentrating on RV64GC. For completeness I should also say there is a cut-down embedded specification called "RV32E" that has half the number of general-purpose registers but is otherwise identical to RV32I. Since RV32E machines are likely to have only a few kilobytes of RAM and lack a "supervisor" mode, they are unlikely to ever run Linux.

RISC-V has 31 general purpose registers (15 for RV32E), approximately double the number visible to the programmer on x86-64. This simple unoptimized loop counting to 1000 demonstrates some features of the instruction set:

       - binary -    - mnemonic -

        fe042623    sw     zero,-20(fp) # store zero into stack slot
        a031        j      L2           # compressed jump
    L1:
        fec42783    lw     a5,-20(fp)   # load stack slot into a5
        2785        addiw  a5,a5,1      # compressed increment
        fef42623    sw     a5,-20(fp)   # store back to stack
    L2:
        fec42783    lw     a5,-20(fp)   # load stack slot into a5
        0007871b    sext.w a4,a5        # sign extend a5 into a4
        3e700793    li     a5,999       # load immediate
        fee7d5e3    ble    a4,a5,L1     # compare and branch

Registers are named x1 through x31 (with x0 being logically wired to zero), but the assembler provides a set of names like a0-a7 for function arguments and return values, t0-t6 for temporaries, fp for the frame pointer, sp for the stack pointer, zero for the zero register, and others. These are just aliases for the x-names. The floating-point extensions (if present) add 32 more registers, and it is expected that future extensions like vectorization will add more.

Instructions are variable length, with the basic length being 32 bits. Many common instructions can be compressed to 16 bits when using the compressed extension (that is expected to be present in all Linux-class chips). Longer instructions are possible too, with the more obscure extensions expected to use them. Unlike x86, variable length does not have to mean "horribly complex to decode". The encoding ensures that the processor can easily see the length of every instruction in its prefetch queue by decoding a few bits in a uniform location. This is even the case where the code is using extensions that the processor does not understand (e.g. for handing them off to a co-processor or to trap and emulate them).

Although the architecture is (by design) simple, boring, and similar to others that have gone before, one interesting area is the approach to complex instructions such as specialized instructions for string handling, video decoding, or encryption. Some of these may be implemented in future extensions. For others, the designers have expressed a preference not to add complex instructions to the specification but instead to rely on macro-op fusion for performance. (Note there is a patent claim on a limited version of this technique, although it expires in 2022.) Processors are expected to detect sequences of simpler instructions that together perform some complex operation (e.g. copying a string) and fuse them together at run time into a single more efficient macro operation. How this wish will meet reality is yet to be seen, but it does mean that, for now, writing a RISC-V emulator is relatively easy because there are only simple instructions.

To make a real computer you need a lot more than just a core, and RISC-V is at least beginning to supply more of those pieces. Code is available for an L1 cache, a cache-coherence and inter-core communication protocol called TileLink, ChipLink, which is an inter-socket version of TileLink, an external hardware debugging interface, and the beginnings of an interrupt controller. But there are many missing pieces: everything from DDR4 interfaces for memory, to ethernet, to GPUs. In the first silicon, and perhaps for a long time to come, these will all be proprietary even if paired with open-source CPUs.

Linux kernel 4.15 added basic RISC-V support, which is sufficient to boot but not much else (there are no interrupts and hence no significant device support). For now you have to use the out-of-tree riscv-linux kernel, although it is expected that most things will be upstream by 4.17. GCC and binutils support has been upstream for over a year, but you are recommended to use at least GCC 7.3.1 and binutils 2.30.

The final missing piece for Linux was a stable glibc ABI, which was added in February 2018 with glibc 2.27. This allows Linux distributions to start to compile packages, knowing that we won't have to recompile everything from scratch if there's a change to the glibc ABI.

And finally, where can you get RISC-V hardware to run Linux on? At the time of this writing almost no hardware is available. A few lucky people have SiFive's HiFive Unleashed development board that has four 64-bit application cores (RV64GC) plus a power management core (RV32IMAC), but costs at least $999. However there is QEMU support in 2.12 that can be used to run Fedora. There are also plenty of FPGA implementations, although you will find that they run much more slowly than QEMU and have limited RAM and device support.

It's expected that the hardware landscape will change quickly in the coming year, with much cheaper iterations of the HiFive Unleashed and several other companies announcing hardware. One surprise though: you might have a RISC-V chip in your PC in the near future. Western Digital has announced that it will transition the cores used in its hard disks and other storage devices to RISC-V; currently it ships over a billion cores each year.

Look for the second article in this series, where I will cover how Fedora was ported to RISC-V.


Index entries for this article
GuestArticlesJones, Richard W.M.


to post comments

An introduction to RISC-V

Posted Mar 14, 2018 16:58 UTC (Wed) by JoelSherrill (guest, #43881) [Link] (2 responses)

Thanks for the article. Linux is not the only free OS supporting RISC-V. The free real-time operating system RTEMS (RTEMS.org) also has a port to the 32 and 64-bit RISC-V.

An introduction to RISC-V

Posted Mar 14, 2018 18:31 UTC (Wed) by willy (subscriber, #9762) [Link] (1 responses)

There's also at least a FreeBSD port to RISC-V:
https://wiki.freebsd.org/riscv

An introduction to RISC-V

Posted Mar 14, 2018 18:41 UTC (Wed) by JoelSherrill (guest, #43881) [Link]

Sorry for not pointing that out also. RTEMS uses the FreeBSD TCP/IP stack among other pieces.

An introduction to RISC-V

Posted Mar 15, 2018 0:05 UTC (Thu) by flussence (guest, #85566) [Link] (5 responses)

> Unlike x86, variable length does not have to mean "horribly complex to decode". The encoding ensures that the processor can easily see the length of every instruction in its prefetch queue by decoding a few bits in a uniform location.
Just wondering aloud, as I have no idea where to begin to look this up: does it resemble UTF-8 with a unary length prefix (but without UTF-8's other inefficiencies)? Or is it something different? I'm curious what works best for hardware with no legacy compat to worry about.

(I'll take “horrible to decode” at face value in any case - I can't even make sense of x86's mnemonic names most of the time!)

An introduction to RISC-V

Posted Mar 15, 2018 3:57 UTC (Thu) by roc (subscriber, #30627) [Link] (1 responses)

https://riscv.org/specifications/ chapter 12.

You can figure out the length of an instruction by decoding the first 16 bits, and 16 bits is the minimum instruction length. (This might change if they ever decide to add instructions more than 24 bytes long.)

The bytes of an instruction after the first 16 bits can be anything, so there is no way, given an arbitrary address, to reliably find the start or end of the instruction, unlike UTF8 where given a pointer you can find the start and end of the character. This seems like an OK tradeoff though.

An introduction to RISC-V

Posted Mar 15, 2018 20:08 UTC (Thu) by flussence (guest, #85566) [Link]

Oh, it's right there in section 12.7: 16 bit instructions unless the bottom two bits are set, then it's a long instruction. That looks pretty similar to how websockets encodes length, makes sense to me now. Thanks.

An introduction to RISC-V

Posted Mar 15, 2018 7:58 UTC (Thu) by rwmj (subscriber, #5474) [Link] (2 responses)

>I'll take “horrible to decode” at face value in any case

The Linux kernel is capable of decoding the length of an x86 instruction, given the first byte. The code is fairly intricate: arch/x86/lib/insn.c

Initial 8086 chips didn't have to worry about instruction boundaries because the microcode fetched, decoded and executed bytes one at a time. Over time x86 encoding has piled complexity on complexity. Now we know that instruction prefetch queues are a thing and that you have to split on instruction boundaries early so it's possible to design something to make this much simpler.

An introduction to RISC-V

Posted Mar 17, 2018 13:53 UTC (Sat) by ianmcc (subscriber, #88379) [Link] (1 responses)

I am surprised it is possible to determine the length of an x86 instruction after just one byte. What about prefixes?

An introduction to RISC-V

Posted Mar 17, 2018 14:50 UTC (Sat) by rwmj (subscriber, #5474) [Link]

It should read "address of the first byte". It's definitely not possible to determine this from the first byte of an x86 instruction, eg. the LOCK prefix is a counterexample.

An introduction to ugly RISC-V assembler

Posted Mar 22, 2018 13:02 UTC (Thu) by kragil (guest, #34373) [Link] (4 responses)

I wish they would have used assember like the Motorola 68xxx. It was so much nicer!
One example:
move.w #$500,d0
moves the word hex 500 to d0. The list would go on and on. It was so much more readable and nicer than x86-assembler.
But I guess they wanted to look like ugly and stupid x86 for some reason (IMNSHO).

An introduction to ugly RISC-V assembler

Posted Mar 22, 2018 14:42 UTC (Thu) by deater (subscriber, #11746) [Link]

> But I guess they wanted to look like ugly and stupid x86

I don't want to get involved in an assembly-language beauty discussion, but if you know anything about the history of RISC-V and the people involved it's pretty clear RISC-V assembly language is more or less exactly the same as MIPS.

The main thing I have against MIPS assembly is that there are two names for each register (the generic one, and then the mnemonic one (such as a0, s0, etc.)) and it can get really confusing trying to remember the mapping between them.

An introduction to ugly RISC-V assembler

Posted Mar 22, 2018 15:10 UTC (Thu) by farnz (subscriber, #17727) [Link]

RISC-V looks a lot more like MIPS or PowerPC to me than it does to x86 or 68k.

An introduction to ugly RISC-V assembler

Posted Mar 23, 2018 11:20 UTC (Fri) by rwmj (subscriber, #5474) [Link] (1 responses)

Yes I find the MIPS-inspired asm to be annoying, particularly the fact that there's no common concept of source and destination, eg:

li a0, immediate  # dst, src
lw a0, addr   # dst, src
sw a0, addr   # src, dst

I also programmed the 68k and Z80 as a commercial programmer back in the day, but I recognize that few people are hand coding large volumes of asm these days, even for embedded platforms.

In fact with complex rules for immediate loads, pervasive use of pseudo instructions (both lw and sw above aren't "real" instructions, they expand to one or two base instructions), "linker relaxation", compressed instructions, superscalar, macro-op fusion etc I doubt it's really feasible.

An introduction to ugly RISC-V assembler

Posted Mar 25, 2018 14:53 UTC (Sun) by Jonno (subscriber, #49613) [Link]

> Yes I find the MIPS-inspired asm to be annoying, particularly the fact that there's no common concept of source and destination, eg:

In RISC-V assembler the destination (if any) always come first. However, note that the non-atomic store instructions (sd, sw, sh, and sb) does not have a destination, only two operands (a value and an address) and a side effect (a memory write)...


Copyright © 2018, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds