FPGAs and free software
The problems with field-programmable gate arrays (FPGAs) is not exactly an obvious talk topic for a graphics-related conference like the 2019 X.Org Developers Conference (XDC). Ben Widawsky acknowledged that, but said that he sees parallels in the situation with FPGA support in the free-software world and the situation with graphics hardware support in the past. It is his hope that the tools for developing with FPGAs can make the same journey that graphics drivers have made over the last two decades or so.
Widawsky began by saying that attendees must think he knows some pretty important people to be able to present on a non-graphics topic; "I guess you'd be right", he said with a grin—to laughter. By way of an introduction, he said that he worked on the graphics stack at Intel from 2010 until 2018 when he moved into other work at the company. He is now using some of his free time "looking at FPGAs and trying to make a more open stack for FPGAs".
Architecture
He spent some time describing the building blocks that make up FPGAs, starting out with lookup tables (LUTs). He gave an example (as can be seen in the PDF slides or YouTube talk video) of a two-input XNOR gate, which is true (1) if both inputs are the same and false (0) otherwise. He then showed how a XNOR gate can be built with three multiplexers (MUXes) and some static RAM (SRAM) to hold the XNOR truth-table values. That same technique could be used to build any two-input gate of interest.
One problem with the LUT-based mechanism is that it is slower than using a "real" gate as the MUXes are driven by a clock, so each level of MUX (e.g. two levels for the two-input XNOR) adds some delay. The FPGA vendors have some optimizations for the mechanism, he said, so the delays are not as bad as they could be with a simple implementation; a four-input truth table, with four levels of MUX, will not have a four-clock delay, for example. Meanwhile, the LUTs can be combined to create "functions" for larger numbers of inputs.
FPGAs aren't just LUTs, however; there is a lot more to them, he said. There are things like flip-flops, adders, and carry chains that can be used to do arithmetic. All of those elements are collected up into logical array blocks (LABs); that is Intel terminology—other FPGA vendors have their own terms for these elements. Multiple LABs are conceptually laid out in a grid with mesh routing between them. One can start to see the complexity inherent in FPGAs by looking at the process of choosing how the functionality is arranged in the FPGA, he said; if functionality that goes together is placed far apart in signal-routing terms, more delay will be introduced.
There is, of course, more to it, Widawsky said. The LAB "network" is connected to general-purpose I/O (GPIO) or other pins on the chip to facilitate data transfer in both directions. Even though you can implement CPUs and other complex devices simply using the programmable logic available in the LABs, today's FPGAs have "hard" CPUs, typically Arm cores, that can be "wired" into the design. There may also be RAM, DSPs, and other hard devices available in the FPGA.
The title of his talk is "Everything Wrong with FPGAs", he said, so what's wrong with the FPGA architecture he described? To start with, the programmable nature of the devices makes them less efficient than a straight hardware implementation. That is inherent in the nature of FPGAs, though, so it is both a blessing and a curse: it allows doing things more easily and flexibly but at the cost of speed.
The problem that really bothers him, though, is that each FPGA vendor does things so completely differently from the others that it is nearly impossible to move to a new vendor. Beyond that, the standards for hardware description languages (HDLs) are loose, unlike the standards for graphics; there is not the same kind of conformance testing, so different tools will accept their own dialects of the languages. Designers will often get a library that implements a device (e.g. memory controller) they need from their vendor; once they get that integrated into their design, they are locked into using that vendor's FPGAs. It is too expensive to do the redesign necessary to switch vendors.
FPGAs differ wildly, but there is not even a standard way to describe the differences between them—a kind of "FPGA floor plan" description. He believes there is an opportunity to come up with intermediate representation that would allow FPGA users to move more freely between vendors. He likened it to the NIR intermediate representation that is used by Mesa. "You could think of it as the NIR for FPGAs", he said.
Tools
Moving on to the tools for FPGAs, Widawsky said that they are closer to a user-space driver for FPGAs, rather than tools like debuggers and the like. In the FPGA world, "vendor tools are the norm"; if you are doing anything in production, he said, you will be using proprietary vendor tools. These tools will provide the entire process needed to get a design into an FPGA: synthesis, timing, placing and routing, and emitting bitstreams. He would go into more detail about those steps a bit later in the talk.
He did want to note something that Xilinx is doing that is not open source, exactly, but is "friendly to open source". The company is allowing third-party tools to do the place and route step. That same mechanism can be used by open-source tools.
There are some third-party proprietary tools targeting the front end of the process (synthesis and timing), with Cadence, Mentor, and Synopsys being the big three. Those functions can be performed in a quasi-FPGA-independent way, which is where open-source tools can also fit in. He pointed to Yosys, which is "an incredible project" that has reverse engineered two Lattice FPGAs such that you can "run pretty complex designs on them with an entirely open toolchain".
As of the 5.2 kernel, there are now "kernel interfaces to let you program these FPGAs". In Widawsky's opinion, "this is where things are becoming really terrible". The interface being pushed by Intel is DFL (device feature list, though he cautioned that some of the acronyms have been changing), which allows an FPGA vendor to "list general properties of the FPGA if you wanted to use it as an accelerator". There are kernel drivers to read this data and to expose it to user space. But, as far as he knows, DFL is only used by Intel.
That means there is a "giant framework" in drivers/fpga that is only there to serve one vendor, he said. DFL gets wrapped with the open programmable acceleration engine (OPAE) interface; in graphics terms, DFL is the direct rendering manager (DRM), while OPAE is libdrm, providing a thin wrapper around DFL/DRM and providing easier access to the underlying interface. Beyond that are roughly 20 different FPGA access drivers—providing read, write, and erase functionality—for specific FPGA models. "The directory is a bit of a nightmare, I think."
He then moved onto the process of turning HDL code into the bitstream needed to program an FPGA. You start with HDL, likely Verilog or SystemVerilog, which gets turned into a netlist in the synthesis step. That, in turn, gets placed and routed, then turned into a bitstream that can be downloaded to the FPGA. Two of those operations are quite complex, synthesis is NP complete, while "place and route in the real world is NP hard". That means there are quite a few interesting technical challenges for any compiler writers in the room who might be getting tired of shader compilers, he said with a grin.
He went through a simplified example of synthesis and then described the place-and-route process. Handling the placement and routing of the logic is where the "secret sauce" for FPGA vendors is. It is a complex task and vendors have tuned their devices for specific workloads in order to have the right internal components on the fast path for their customers' designs.
Placing is the process of finding a place on the FPGA to put various parts of the logic in the design. There are constraints on where things can go based on the timing between them but also in minimizing the FPGA space it occupies. There is also a need to determine how to use the hard CPUs and other on-chip devices. Routing is the process of hooking everything up: making the connections between LABs, to and from the I/O pins, and so on.
There is no one optimal solution, so the tools typically allow the user to specify what they want to optimize. They can optimize for the smallest area, the highest performance, which typically means the shortest distances, or the lowest power usage. The information used to make those kinds of decisions (the FPGA floor plan) is not public, so much of the effort by free tools is spent trying to figure that out, he said.
The bitstream assembly is pretty straightforward. The results of the previous phases are encoded into the bitstream in unsurprising ways. There are some special cases, including encrypted bitstreams, which might be controversial with some, and compressed bitstreams, which are important because the bitstreams can be large and take a long time to transfer. There is some fanciness that can happen, he said, such as uploading part of the design that implements the decompression algorithm and sending the rest of the bitstream compressed. Overall, though, it is a "pretty linear translation" of the output of the earlier stages into the bitstream.
Returning to his "what's wrong with that?" theme, he noted that the tools are proprietary and are "drastically different" between hardware vendors. That means it takes a lot of effort to get up and running with a particular vendor's tools—which helps lock people in once that investment is made. The tools also require running some binary-blob license server, which is cumbersome and somewhat worrisome from a security perspective.
Customer support for the tools is provided by the hardware vendor, which can make it difficult to solve any problems you might have, he said. You can try to strace the tools, for example, but it doesn't help much in his experience. A look around the web will also easily find some horror stories about the support provided by the vendors. Customers cannot improve or even debug the tool flow; they are at the mercy of the vendor. Any kind of automation requires using Tcl-based interfaces that are not well documented and are not good interfaces in his opinion.
Beyond all of that, the high barrier to entry means that students typically only learn one vendor's tools—whichever vendor gave free hardware and software to their university. If they start to work for a company that uses a different FPGA vendor, they may not really be up to speed on that toolchain. That sometimes leads companies to buy most of the design in the form of IP blocks from the vendor, which is not inherently a bad thing, Widawsky said, but it does limit the scope of the designs if the designers are mostly relegated to connecting IP blocks together.
Use cases
He put up a graph of the growth of the FPGA market, but cautioned that it was really only the shape of the graph that was of interest since that seems to be roughly consistent while the overall numbers vary depending on the source. The growth was fairly flat from 1984, when Xilinx introduced its first FPGA until the late 1990s when it started to pick up some. There is a clear inflection point at 2010, which is roughly when the recent artificial intelligence push got going. While the increase in transistor density has helped vendors create highly capable FPGAs, he believes the main cause of the growth acceleration is that a problem was finally found where FPGAs were really needed. It was, to some extent, a solution looking for a problem prior to that.
The traditional (vertical) use case for an FPGA is as part of a device that gets shipped or for a proof of concept. If you are shipping something in a low-enough volume (he has heard three-million units, but is not sure that number is right), it makes sense to use an FPGA rather than creating an application-specific integrated circuit (ASIC). It also makes sense when the hardware is implementing a standard that is still changing, such as a media codec; a field upgrade can be done to the FPGA to pick up any changes. Another common use case is to validate the design of an ASIC before committing it to silicon; FPGAs are much faster than software simulators.
Another use case is the "FPGA as an accelerator" (or FPGAAAA). The idea is that "there is an FPGA somewhere and it helps you do things fast". There is runtime support available, such as Intel's OpenVINO, OpenStack Cyborg, or Kubernetes device plugins, to allow access to parts of the FPGA by an application in order to accelerate it. The runtimes facilitate getting the accelerator logic into the device and routing the calls needing acceleration to the FPGA; there is, he thinks, some overlap with GPU-based acceleration techniques. In addition, these days cloud providers have started offering raw access to FPGAs.
"FPGA as an AI accelerator" (or FPGAAAAA, he said to laughs) is less common these days. He pointed to Microsoft's Project Brainwave as an example. The company has created an FPGA-based neural processing unit (NPU) that can be accessed via the Azure cloud. One of the problems with this approach is that it can take on the order of a week to do synthesis for an NPU, which makes the overhead rather daunting. He believes there is a large potential for this use case in the future.
But there are a number of problems with all of that, he said. Even though the FPGA market is projected to be ten-billion dollars in 2022, that is still a pretty small market. Because of that, it tends to be expensive to get real hardware to experiment with, so open-source people are not able to crack the market without spending an enormous sum. The vertical FPGA developers have already learned a particular toolchain and are not interested in having different tools, even if they are better, so the interest in open-source solutions is fairly low.
Beyond that, OPAE is a nice open standard; trying to standardize the interface so that FPGAs can be used more is the right idea, he said. Unfortunately, the interface itself depends on proprietary stuff underneath. In graphics terms, there was no requirement that the user-space tools be open source before merging support for a particular FPGA, as Dave Airlie required for the Linux graphics stack. Since there was no requirement, all of the user-space tools are proprietary.
The tool makers within companies like Intel believe that the open-source community is not capable of building something as complex as an FPGA toolchain, Widawsky said. There is a lot of misinformation floating around within companies. For example, many insiders believe that the proprietary compilers sold by Intel and Arm are popular and make lots of money; when you try to explain that, overall, developers prefer to use GCC and LLVM, they think you are either lying or don't know what you are talking about, he said. The "FPGA way" has not changed much since it began in the mid-1980s; nobody has really been able to shake that community up and get it moving in an open-source direction.
Comparisons with GPUs
He then compared FPGAs with devices much better known at XDC: graphics processing units (GPUs). He began with the similarities between them. As with a GPU, an FPGA is a dense set of transistors providing more power than a CPU. There is a small set of vendors that produce FPGAs and there is a lot of proprietary software involved. In fact, all of the forums where questions are answered about FPGAs refer to the proprietary tools and how to use them. Aside from the folks working on Yosys, no one is trying to improve the process or tools, they are all just trying to use them to kick out a product.
The documentation is lacking for what is needed to create tools, such as the floor-plan information that has to be reverse engineered. In his case, he could get the documentation for Intel products, but not share it and he was not allowed to reverse engineer his own company's products; so even if there are sympathetic employees at a particular vendor, their hands may be tied.
Nearly all of the code uploaded to FPGAs is a binary blob of one sort or another; customers have no idea how most of it works. But there is a dedicated community that is reverse engineering the floor plans of various FPGAs. There are a few different projects, including Project Trellis that targets Lattice FPGAs. There is also a substantial academic interest in making better tools.
There are some dissimilarities as well. For one, there are no strict standard APIs for FPGAs, so there are no real conformance tests, which is unlike the graphics world. Yosys and nextpnr (for place and route) should be the Mesa of FPGAs; they would define an interface that would work across all of the different FPGA hardware. It is not that there are alternative open-source projects that vendors would rather rally around, Widawsky said; there just aren't enough people pushing for open source in this space at all. Something that would help is to have a piglit-like test suite for the FPGA space: something that can be run quickly to show that the tools can make a design that runs on a particular FPGA.
The kernel interfaces for FPGAs are in their infancy, while those for graphics devices are pretty well established at this point. Intel has taken the lead to try to consolidate the interfaces, but no one else is really following along. Separate from having open-source implementations of various accelerators (e.g. gzip), he would like to see the interface to that code be common across vendors. The code that runs on the devices is a complete black box and the emphasis has been on getting the black box to the device, not on what's in the black box—or at least having a common way to interface to the black boxes.
There is little pull for open tools, he said. He was told that no big customer was asking for open tools, so why would a vendor spend time making them? "That's a fair point", he said, but large companies will find it much easier to create their devices if they can have visibility into the tools. He has heard anecdotally that SpaceX had serious trouble debugging its Xiliinx-based FPGA designs. He suggested that large customers of the FPGA vendors start asking for open tools to help avoid those kinds of problems.
There are lessons that can be learned from graphics, he said. He touched on some of them earlier, like enforcing an open user space before accepting patches for a driver and not adding interfaces that only support one vendor's hardware. It is important for the FPGA vendors to recognize that the open-source community can solve difficult engineering problems—it already has in many areas.
Call to action
He ended with a "call to action" suggesting that those interested give Yosys a try—"do something cool"—and help improve it. He said that a handwavy estimate of the processing power available in an FPGA means that current FPGAs could theoretically build an Intel Sandy Bridge processor. "Go do that, find the bugs, make it work with the open tools." The Yosys community is easy to work with, though he didn't mean the talk to be a "rah rah Yosys talk". But he was able to get a few simple patches into the tool; it is something that anyone interested in open FPGA tools should look at.
Another area that needs work is to improve place and route, which in practice means to do more reverse engineering of FPGAs. There are multiple projects out there doing that. Also, if writing standards is of interest, the bitstreams and floor plans could use some work of that sort; maybe standards are simply not possible for those things, but no one has really looked into it. There is also the linux-fpga mailing list for those that want to keep an eye on what's going on in the kernel; it is a low-traffic list, so it is not hard to follow. With that, Widawsky was done with his talk; he took a few questions and comments from attendees, most of which were pointing to additional efforts in this space.
[I would like to thank the X.Org Foundation and LWN's travel sponsor, the Linux Foundation, for travel assistance to Montréal for XDC.]
| Index entries for this article | |
|---|---|
| Conference | X.Org Developers Conference/2019 |
