New research could lead to voice-recognition systems that can teach themselves to better understand speakers above background noise and with different accents.
Scientists at Cambridge, Edinburgh and Sheffield universities have received a £6.2m grant from the EPSRC to develop speech-recognition software that can adapt to non-controlled environments, such as when several people can be heard talking at once.
This could make it easier to create applications using voice recognition, such as the Siri speech interface on the new Apple iPhone 4S, without having to feed as much initial data into the program to train it to understand different voices.
For example, the project could lead to devices that are able to automatically record a person’s speech throughout the day and transcribe their conversations, Cambridge’s Prof Phil Woodland told The Engineer.
‘The systems we have at the moment are very data-intensive and typically you have to collect data of the type that you want to recognise,’ he said. ‘That’s commercially quite important in being able to field new applications.
‘[The new software] would learn to recognise you and also the people you interact with, and be able to track your location and understand information about the context. But rather than having to have lots of data from that situation we would try to use adaptive technology.
‘One aspect being worked on in Sheffield is assistive technology for the old and disabled at home. So there are those sorts of applications that are able to adapt to particular people’s situations and particular ways they speak.’
The research could also enable more realistic speech synthesis programs, for example, for people who are losing their voice through serious illness or operation.