Supercomputer on a chip

University of Texas researchers are collaborating with IBM to develop an adaptive, high-performance microprocessor.

Dr. Doug Burger and Dr. Stephen Keckler at the University of Texas at Austin are collaborating with IBM on the development of an adaptive, high-performance microprocessor based on a new architecture called TRIPS (the Tera-Op Reliable Intelligently Adaptive Processing System).

To address the semiconductor scaling challenges of high-performance processors, particularly in instruction selection, execution, and bypass, the TRIPS team has proposed a new class of processor organisation called Grid Processor Architectures (GPAs). A GPA is composed of a tightly coupled array of ALUs connected via a thin network, onto which large blocks of instructions are scheduled and mapped. To mitigate on-chip communication delays, applications are scheduled so that their critical dataflow paths are placed along nearby ALUs.

The TRIPS architecture is designed to be configurable to meet the needs of a variety of workloads and environmental conditions. Both the grid processors and an on-chip memory system are configurable, able to run workloads as diverse as control-bound integer codes, highly parallel threaded codes, and regular, computationally intensive streaming codes efficiently.

The allocation of ALUs within the grid, the instruction mapping onto the grid, the number of executing threads, and the flow of instructions across the grid are all exposed to the system, compiler, and application software for maximum flexibility.

To respond to changing workloads and conditions, a TRIPS chip provides on-chip sensors and a lightweight software layer called ‘morphware’, which monitors power, temperature, memory performance, and ALU usage. The morphware layer controls the runtime operation of the execution resources, mediating between the requirements of running applications, the capabilities of a specific TRIPS implementation, and the operating environment of the system.

TRIPS is intended to support a variety of runtime workloads, including desktop, scientific, streaming, and server workloads. Desktop applications are characterised by irregular integer operations, scientific applications by their large data sets, streaming applications with their regularity and predictability, and server applications by their non-uniform workloads, independent thread execution, and real-time response requirements.

Presently, the university scientists are working closely with IBM to develop a prototype. Mr. Charles Moore, a senior research fellow at the university and a former chief engineer of IBM’s POWER4 processor, will help with the prototype effort and will lead the effort to commercialise the technology. The team includes researchers at IBM’s Austin Research Lab who are developing long-range technologies necessary for the industrial success of this approach, and engineers at IBM’s World-Wide Design Center, which is expected to be the fabrication partner for the TRIPS processor prototypes.

The prototype will contain up to four processor cores, each capable of executing 16 operations per clock cycle, and a uniquely partitioned cache structure. The chip will contain more than 250 million transistors and will operate at 500 MHz. The scientists’ goal is to demonstrate the feasibility of a full-scale industrial development that could offer a 10GHz chip capable of executing more than a trillion instructions per second.

The TRIPS project is supported by a total of $11.1 million in funding from DARPA. The scientists expect to have TRIPS prototype chips and systems running in their laboratory by December 2005.