Centralised solutions for supercomputers

Scientists from US universities and Department of Energy laboratories are to collaborate in a $15 million, five-year project to create a Scaleable Systems Software Centre.

Supercomputers provide researchers with powerful tools, but operating them can also be problematic, says an Oak Ridge National Laboratory (ORNL) researcher who heads a team working to fix the problem.

Through a $15 million five-year project, ORNL and a team from universities and other Department of Energy laboratories will create the Scalable Systems Software Centre.

The centre, funded through DOE’s Scientific Discovery through Advanced Computing initiative, will address the lack of software for effective management of terascale computational resources like the ones being installed at ORNL and other sites around the US.

‘DOE operates many of the largest computers in the world and some of the largest computer centres,’ said Al Geist of ORNL’s Computer Science and Mathematics Division. ‘But today, each computer centre uses ad hoc and homegrown systems software solutions to, for example, schedule jobs and monitor the health of the supercomputers.’

With the centre, problems solved at one DOE computer centre could be leveraged to other centres.

‘The Scalable Systems Software Centre provides the opportunity to create and support a common set of systems software for large computer centres across the country,’ Geist said. ‘It’s a problem that the computer industry isn’t going to solve because business trends push the industry toward smaller systems aimed at Web serving, database farms and departmental-sized systems.’

The vision and goal of the centre are to bring together a team of experts who, with industry involvement, can agree on and specify standardised interfaces between system components.

Another goal is to produce a fully integrated set of systems software and tools to effectively use terascale computational resources.

Researchers also plan to study and develop more advanced versions of the system tools to meet the needs of future – and even larger – supercomputers.

According to a statement, Scientific Discovery through Advanced Computing is an integrated program that will help create a new generation of scientific simulation codes.

The codes will take full advantage of the computing capabilities of computers capable of performing trillions of calculations per second to address increasingly complex problems.