Emotional-speech analysis

Researchers at Dublin Institute of Technology’s Digital Media Centre are developing technology that recognises emotion conveyed in a person’s speech.

Researchers at Dublin Institute of Technology’s Digital Media Centre (DMC) are developing technology that recognises emotion conveyed in a person’s speech.

The DMC’s Emovere project aims to identify the most appropriate acoustic characteristics to use in emotional-speech analysis and investigate advanced machine learning techniques for the recognition of emotion in speech.

It is anticipated the technology could be used in applications as diverse as animation and automated telephone helplines.

Researchers Sarah Jane Delany and Charlie Cullen will spend four years processing recordings from volunteers who will verbally express emotions such as anger, sadness and elation.

Cullen said they will focus on the shape and rhythm of the recordings’ acoustic signals through techniques inspired by music.

Their methods differ from other groups that have studied emotional speech recognition, he said. While others considered the shapes of signals, no one has yet developed a way to classify them.

‘We take an approach similar to what is done in music,’ he said. ‘We only take four points: the first, last, highest and lowest. That gives us the discreet shape for the speech track. We then compare several hundred of these shapes so we can delineate between different emotions.’

Cullen explained the group will classify the rhythm of a person’s voice using a patent-pending method known as vowel stress tagging framework.

‘When someone gets excited you get more clusters of vowels,’ he said. ‘We want to know the relative length between the vowels and the position of the vowels. What we’re trying to do there is effectively take the same kind of principles in music notation and use them for speech notation.’

Cullen and Delany hope this data will help them develop algorithms for emotion-recognition software.

Delany explained initial applications for the technology could be in animation. Their software could process the speech of digital animated character’s voice and match its movement to the emotion.

A prototype has been developed to demonstrate this, but Delany said applications for the technology could go well beyond animation.

‘In any sort of telephone help-desk scenario, if you had something monitoring the user’s voice, it could send alarms or triggers if something got heated,’ she added.

Siobhan Wagner