Software to digitise library text

Researchers at Salford University are developing software that will help libraries convert millions of pages of newspapers and books into digital records, preserving them for posterity.

At present, libraries are struggling to digitise the hundreds of years’ worth of printed material they hold and are fighting a battle to save newspapers, which were only designed to last a few days or weeks from decay.

The researchers from the university’s School of Computing, Science and Engineering have been given £1m to develop the software, which will be able to accurately analyse images of documents that are often stained, yellowed and written in obsolete typefaces.

The project – called IMPACT (Improving Access to Text) – is part of a huge European scheme led by the Royal Library of the Netherlands and involves the British Library and national libraries from France, Germany and Austria, as well as a handful of other universities.

Dr Apostolos Antonacopoulos is leading the Salford part of the project and the overall image enhancement task.

‘Current scanning and recognition technology is designed to convert modern printed material and typically only picks up about 40 per cent to 60 per cent of the words in old documents correctly,’ he said.

‘This means that libraries have to employ typists in the developing world to copy every word. Something that costs up to £1 per page. Over millions of pages, this isn’t sustainable, so we’re working to slash this cost,’ added Antonacopoulos.

The most important benefit of the new software is that it will allow text to be searched reliably – a major advantage for historians or people researching their family trees.