The heart of the matter

The processors have run out of steam and the future lies in software optimisation. Dave Wilson discovers that there is no longer any free lunch in the software development business.


In 1982, four years after the launch of the $150,000 32-bit VAX minicomputer, IBM debuted its first Personal Computer – for $1500. At its heart was a 4.77 MHz Intel 8088 microprocessor supported by 16 kilobytes of memory and two floppy disks. It ran a rather rudimentary operating system called DOS supplied by Microsoft.


To many in research and development, the traditional home of the mini and mainframe computers, the new machine was no more than a toy. Yet within the space of twenty years, the Personal Computer has become every bit as powerful as the VAX of thirty years ago.


As the PC made the tortuous journey from an 8-bit to full 32-bit machine, and the processor speed moved from a lethargic 4.77 MHz to over 3GHz, it also came to support virtual memory management and large address spaces just like the VAX of old. Its performance continually enhanced year after year by increased clock speed, execution optimisation and bigger on chip caches. Modern 32-bit microprocessor based PCs are now as powerful as the minicomputers of the 1970’s, yet rather than fill a room, they now fill no more than a briefcase!


Today’s microprocessor manufacturers aren’t resting on their laurels. But both Intel and AMD, the main suppliers of 32-bit and 64-bit microprocessors, have, over the past few years, turned away from outracing each other on the GHz speed stakes. Making chips faster and faster simply makes them too hot to handle.


Instead of pursuing that course, they have created multi-core processors that contain two, or four, processing cores residing on one chip that can perform calculations on two or four streams of data, increasing efficiency when running multiple programs or multi-threaded software.


“Moore’s law for single processor x86-style architecture ran out of steam at around 3-4GHz” says University of Cambridge researcher Dr. Alan Mycroft. “However, Moore’s law continues to hold true as processor vendors’ place two, and four, 3GHz processors on one die instead of one.”


In the early days of computing, there were two kinds of multiprocessing. On asymmetrical systems, one or more processors were exclusively dedicated to specific tasks, such as running the operating system. The remaining processors were available for all other tasks, such as user applications.


But modern symmetrical multiprocessing multi-core systems on a chip offer software programmers the ability to distribute their computational tasks more fairly. In such systems, any thread, or pieces of a program, from the operating system or the user application, can run on any processor in the system. In this way, the entire computing load is spread evenly across all computing resources.


To take advantage of the power of single central processing cores, and more importantly, symmetrical multi-core processors, today’s operating systems, such as Windows XP, need too, and do, support multithreading. Like multitasking, this is where the computer simultaneously handles the operation of two or more tasks. The difference is that here, the tasks are components or “threads” of a single application, rather than being separate programs.


But for the user to benefit from multithreading, the programs that run on the machine also need to be multithreaded too. That is, rather than being developed as a single long sequence of instructions, they are broken up into logical units whose execution is controlled by the mainline of the program.


And, of course, there are different kinds of threading that programmers can exploit. If a software package is to comprise multiple subsystems, such as a physics engine and AI engine, functional threading will allow a developer to dedicate these functions to their own processing resource. Alternatively, for those computer intensive tasks, data parallel threading can be employed to divide up a single task across multiple threads, so that each thread handles a different part of the same data set.


As a result, the multithreading issues and opportunities that it provides will become very important issues for software developers over the next ten years. Basic multi-core hardware is now out there and in the course of another five years, all personal computers will be sporting multi-core chip sets – four, eight, or even sixteen processor core devices may become the norm.


That means, of course, that every programmer who works on a new project will need an intimate understanding of how to develop multi-threaded code. Next generation software will need to be structured to take advantage of the performance of the new processors – programmers will need to be able to write multi-threaded software from the outset.


Along with the fundamental change in computer architecture, programming has to change too. New tools will be needed to assist programmers with the optimisation process. Other tools will be needed to take the code and distribute it optimally amongst the cores on the silicon. And developers will need tools that will help them to debug the software that they have written too once it is running.


But it’s not going to be easy. Citing Amdahl’s law, Photoshop co-architect Russell Williams writes in an on-line blog that almost all algorithms that can be ‘parallelised’ also have some portion that must be done sequentially – setup (deciding how to divide the problem up among multiple cores) or synchronization, or collecting and summarizing the results.


At those times each step depends on the step before being completed. As you add processors and speed up the parallel part, the sequential part inevitably takes up a larger percentage of the time. And some algorithms are just really hard or impossible to parallelize: calculating text layout on a page is a commonly cited example.


“Programs have to be broken into reasonably equal sized (size in terms of CPU consumption) tasks which can run on separate engines. It is not good enough for most of the tasks to wait or sleep for most of the time – they must all be doing useful things one the critical path to speed things up,” concurs University of Cambridge’s Dr.Mycroft.


He believes that the main group of programmers who will exploit the power of the dual-core processors first will be games developers, as they are most in the “speed and detail” arms race. He argues that a games program could be designed such that it’s key computational functions – such as world model processor, AI processor and physics model could all be run on individual processors to achieve performance increases.


Laurent Bernardin is the VP of Engineering at US software house Maplesoft. He notes that writing programs for multi-core processors is definitely harder than writing for a single core. But, he says “the problem is not new. Even on a single core machine, developers still needed to program in terms of multiple threads so the processor could perform multiple computations simultaneously.”


Bernardin’s mathematical/scientific package Maple has a user interface that’s written in Java while the computational engine for the package is in C++. “On the Java end, it is reasonably easy to work with multiple threads; when multi-core processor became available that same code worked unchanged and took advantage of the multiple cores. But the same was not true of our core computational engine which is written in C and C++,” he adds.


Ultimately, of course, what would be desirable from a developer’s perspective is a software package that could abstract the entire idea of threading and parallelism so that the developer could focus on developing software from a functional standpoint and not worry about how it will run on the platform. That, however, may be some way off.


From a software user’s perspective, however, things are a little different. National Instruments’ Ian Bell claims that the reason that it has been easy for users to take advantage of the parallelism of his company’s LabView data acquisition, development and control software is that it is graphically oriented – an easy way to visually develop software by graphically connecting functional elements of software subsystems on screen.


Bell claims that once users of the software define the functional elements of the system they require, they can then see how to parallelise them quickly on screen. The underlying code in the LabView package takes care of the rest. And although that underlying code is written in C, it’s up to the software developers at National Instruments to figure out how to optimise that for multi-core systems, not the end-user of the package.


Bell hints that other issues of importance to application developers, such as presenting synchronisation and timing issues between the software modules in the Labview environment, will be addressed in future versions of the software making it even easier to develop multi-core code.


He believes that in the next ten years the sheer complexity of programming for multiple processors will mean that all software developers, whatever software they are developing, will come to depend on graphical programming languages and that these will replace traditional programming languages such as C.


Whether students are developing those languages themselves, or building applications based upon them, there will still be plenty of opportunities in industry for computer science graduates fluent in high-level languages such as C, C++, Java, and C#, to find work developing new systems to take advantage of the new breed of processors.


But they also will need to become familiar with basic concurrent programming and threading methods, such as the OpenMP Application Program Interface (API) that supports multi-platform shared-memory parallel programming, POSIX threads (also referred to as Pthreads), or the Win32 threading API.


“Experience with multi-threaded computation and an awareness of the issues that arise like synchronisation of processes and race conditions and deadlock that can arise are important to us when we hire programmers, “ acknowledges Maplesoft’s Bernardin.


For candidates interested in such a career, National Instruments’ Ian Bell advises graduates in the UK who are interested in gaining some specialist postgraduate education in the field to check out the courses available at the web site of CEESI (Continuing Education in Electronics Systems Integration Training) The postgraduate level courses, accessible mainly via the internet, caters for people who are looking for continuing professional development (CPD) in a particular field without necessarily being concerned about working towards a qualification.