FPGAs can crunch the numbers too

Researchers at the Computer Science Department of Colorado State University want to allow FPGAs to be programmed in a more traditional, algorithmic language.

Virtually all FPGAs presently used are programmed using one of the various hardware description languages, such as VHDL (VHSIC Hardware Description Language), or directly with logic circuit diagrams.

However, The Cameron project at the Computer Science Department of Colorado State University aims to change all that. Researchers there want to make it easier to exploit FPGAs as computing devices by allowing them to be programmed in a more traditional, algorithmic (as opposed to circuit-based) language.

To this end, they have developed a variant of the C programming language and an optimising compiler that maps high-level programs directly onto FPGAs; the language and compiler have been tested on a variety of image processing (and other) applications.

Over the last several years, FPGAs and other adaptive computers have evolved from simple ‘glue logic’ chips into powerful coprocessors. Furthermore, because the logic blocks are connected via a reconfigurable mesh of wires, they can be configured into massively parallel circuits that directly implement algorithms.

As a result, they can achieve very high computational densities. In fact, for many applications, FPGAs have grown more powerful then the general-purpose processors they were meant to serve.

Field programmable gate arrays were invented in the mid-1980s as devices that resembled Application Specific Integrated Circuits (ASICs) but could be programmed after the chip was manufactured. An FPGA consists of a matrix of programmable logic cells, with a grid of interconnecting lines and switches between them. I/O cells exist around the perimeter, providing an interface between the interconnect lines and the chip’s external pins.

The exact functionality of a logic cell varies with the manufacturer of the chip, but it is typically comprised of a small amount of functional logic and/or some register store capability. Programming (configuring) an FPGA consists of specifying the logic function of each cell and the switches in the interconnecting lines.

Both a conventional RISC microprocessor and an FPGA, at a low level, consist of gates that implement Boolean logic. The conventional processor uses its logic to implement a fixed instruction set, in which each instruction has been chosen based on its usefulness to compilers in generating code for general purpose computing. The instructions are very simple, and performance is achieved by devising ways in which they can be rapidly loaded and executed.

The (re)configurability of an FPGA makes it possible to depart from the idea of a fixed instruction set. Instead, application-specific ‘instructions’ can be specified, and these instructions can be orders of magnitude more complex than those found in RISC processors. The instructions can make heavy use of both breadth-wise and pipeline parallelism.

The tradeoff is in the load time of the instruction. Whereas a RISC instruction on a conventional processor can load in a few nanoseconds, the load time of an FPGA instruction can be measured in milliseconds.

This makes FPGAs suitable only for parts of applications that are highly regular and work on large data sets. In such applications, however, reconfigurable architectures provide roughly an order of magnitude higher raw computational density than the programmable architectures.

In the past, FPGAs have been programmed in hardware description languages, such as Verilog or VHDL. This puts them off limits to applications programmers who have been trained in traditional algorithmic programming languages such as C, C++, Java or Fortran. Worse still, even skilled circuit designers find it difficult to embed complex algorithms directly in VHDL or Verilog circuits, or to modify existing circuits to accommodate algorithmic changes. This increases the design time and ultimately the cost of any application exploiting FPGAs.

The goal of the Cameron project is to remove the circuit design barrier and make FPGAs accessible to a wider audience. In particular, the researchers want to raise the level of abstraction for FPGAs from a circuit based design model to an algorithmic programming one. To this end, they have designed a high-level language that can be automatically compiled to circuits, and they have implemented an optimising compiler that produces FPGA configurations directly from high-level source code. As a result, they have created a one-step process for compiling applications directly to FPGAs, in much the same way as a traditional compiler maps high-level source code to a series of machine instructions.

The high-level algorithmic language is a variant of the well-known C programming language. It is called ‘Single Assignment C’, or SA-C (pronounced ‘sassy’) for short. SA-C is very similar to C to make the language easy to learn and use.

However, the C language is intimately tied to the traditional von Neumann model of processors and memories, where every variable corresponds to a memory location and functions reside on a calling stack. As a result, it is easy to write programs in C that cannot easily be mapped onto circuits or FPGAs, for example by directly manipulating addresses. While other researchers try to map C programs onto circuits with only limited success, the Colorado State developers simplified the problem by designing a more appropriate language.

The SA-C compiler provides a one-step compilation from SA-C programs to adaptive computer systems. The general model employed is that is of a programmer at a workstation, with access to an FPGA-based adaptive coprocessor. Programmers write programs for the coprocessor in SA-C, and debug and test them on their workstation. When they are ready, the SA-C compiler translates the source code from SA-C to an optimised FPGA configuration (or configurations). The compiler also generates the host code to download the configuration(s) onto the adaptive coprocessor, download the data and any run-time parameters, start the coprocessor, and upload the resulting data. As a result, programmers can treat the adaptive coprocessor like any other target machine, without knowing about circuit design or the intricacies of FPGAs. There is also a second kind of user.

These are users who know a great deal about FPGAs, and want to control the process of how their program is mapped onto a configuration. For these users, the compiler can generate files of the intermediate data flow graph representations, and these files can be viewed using the tools provided. Simulators can simulate the execution of the graphs on data.

The Colorado system has already been used in the development of a co-processor for the ARAGTAP automatic target recognition (ATR) system for synthetic aperature radar (SAR) images that was developed by the US Air Force. An ARAGTAP pre-screener, the front end of the ARAGTAP system, is a focus of attention (FOA) system designed to detect possible targets in SAR images, returning regions of interest (ROIs) to be verified and identified.

The ARAGTAP pre-screener relies on morphology operators, including eight repetitions of image dilation (alternating between a square and cross-shaped kernel) and eight repetitions of image erosion (again, alternating kernels).

Using the SA-C compiler, the researchers fused all eight dilations (and separately all eight erosions) into a single FPGA configuration. Since this did not fill up the FPGA, they then used the compiler to apply vertical stripmining to increase parallelism and decrease I/O traffic.

The run-times of the programs on a reconfigurable processor were then compared with those of the original C programs. In general, it was found that execution times on the reconfigurable system were about ten times faster. Other types of imaging applications have also been studied by the researchers at Colorado State and these can be viewed by visiting the web site below:

http://www.cs.colostate.edu/cameron/applications.html