Team aims intelligence at voice-recognition systems

A group of three UK universities are spearheading efforts to create more intelligent conversation-recognition and speech-synthesis systems.

The technology could open up a host of new applications such as fully automated meeting transcription, voice-activated domestic devices for the elderly and expressive assisted speech for people with conditions such as motor neuron disease.

Speaking to The Engineer, project collaborator Dr Thomas Hain of Sheffield University acknowledged that voice recognition had come a long way in recent years, but said that a ‘flexible and adaptable system’ was still lacking.

‘You have applications that seem to work well, for example, Google Voice Search, but these are tied to a specific purpose — you need to have lots and lots of data and then you can built a recognition system around it that just works for that one single purpose.

‘But where it’s about natural speech — people having a normal conversation — these applications still have very poor performance.’

The £6.2m EPSRC-funded Natural Speech Technology project will aim to develop intelligent systems that can be adapted for various uses. The project involves researchers from Cambridge, Sheffield and Edinburgh universities and will have four main phases.

First, the team will develop the base models and algorithms for synthesis and recognition that are able to learn from, and adapt to, new scenarios and contexts almost instantaneously.

Second will come the conversation-recognition software that can detect ‘who spoke what, when, and how’ in any acoustic environment.

Third, the team will hone speech synthesisers that are capable of generating the full expressive diversity of natural speech.

Last, it will look for suitable applications and to this end it is working with a range of partners, including NHS Trusts, health charities and the BBC.

Indeed Hain’s team, which recently won an international competition for its transcription technology, will be working with the BBC to automatically transcribe audio and video footage from its vast archive.

However, the main driving force will likely be lifestyle and well-being applications such as voice-controlled devices for the home, which could help older people stay independent and voice assistance for people with motor neuron disease and Parkinson’s disease.

‘This research could open the door to computer-speech technology becoming commonplace throughout our lives — at home, at work, and in our leisure time,’ project lead Prof Steve Renals of Edinburgh commented.