Lip service

George Bush Snr may not have been as quotable as his son, but the now-famous soundbite, ‘Read my lips: no new taxes’, probably helped him win the presidency. He knew voters would get the message loud and clear if they used their eyes and ears together to work out what he was saying.

Now, Dr Richard Harvey of the University of East Anglia and colleagues are conceiving computers with similar skills — to read anyone’s lips in a variety of languages.



Camera links

They have won a grant from the EPSRC for a three-year project to automate language- independent lip-reading by linking new algorithms to camera systems such as web cams and CCTV.

‘There’s no way we could do African click and whistle languages or Chinese and Japanese,’ admitted Harvey. ‘But we hope to do a selection of European languages and standard modern Arabic.’ These rank highly among the top 20 spoken globally.

There are numerous potential commercial applications for the technology. In-car speech recognition systems start failing as soon as the microphone picks up other sounds, from the stereo or through open windows. A lip-reading camera on the dashboard would help and has already been investigated by Siemens.

A similar camera built into a mobile phone handset could understand speech more accurately and so pre-process the audio to send a cleaner signal.

‘The grant has been awarded under the crime-fighting initiative and the Home Office Scientific Development Branch is giving us a lot of help and expertise,’ said Harvey. ‘Lip-reading from surveillance footage, for example, has been used to solve crimes. In some situations it may not be safe or feasible to place a microphone close enough to hear voices, but a long-range camera might still be able to see faces.’

Harvey and his team have carried out preliminary work and know some of the challenges they face. ‘We have to track the head accurately over a variety of poses then extract numbers, or features, that describe the lips and then learn what features correspond to what text,’ he said.

That is why he has teamed up with Dr Richard Bowden and colleagues at the Centre for Vision, Speech & Signal Processing at Surrey University. ‘One of the deficiencies of our previous system was that we had to track the head by hand,’ said Harvey. ‘Richard Bowden is an expert on tracking people and will work on knowing where the head is in the frame. Then we plan to find the lip region and extract the features.’ This would be a first in the development of computerised lip-reading technology.

Having programmed the computer to recognise lips whichever way the head is tilted and moving, more software will have to watch and recognise their fast and subtle movements. Two approaches, called ‘active shape model’ and ‘grey-scale sieving’, will be combined to track the changes in the shape of the lips.

‘We’re also going to do some work on language identification. It’s not been tackled before,’ said Harvey.



Unpredictable speech

But why go to all this trouble when humans can lip-read, as everybody who has ever held a conversation in a noisy bar or at a party can testify? ‘Almost everyone who claims they are an expert lip-reader actually has terrible performance,’ said Harvey. ‘Even so-called trained lip-readers can’t do speech that is unpredictable, even though it may be syntactically correct.

‘Having said that, there are a few star performers out there who are often used for forensic purposes. The man who did the lip-reading of the silent footage of Hitler was amazing. You needed to have a special background and do it with an enormous amount of care and concentration.’

Another reason is the fall in the number of people who have the skill. ‘The number of trained lip-readers is diminishing,’ said Harvey. ‘The really skilful people are the ones who learned very early on, who had learned to speak a language before they lost their hearing and had not been taught to sign but had been taught lip-reading intensively.’

This set of circumstances will become less common as signing grows in popularity.

It is not realistic to expect Harvey, Bowden and their colleagues to have perfected a system that can monitor the screens at any European airport and understand every passenger’s utterances by the end of the project.

‘We hope to have produced a system that will demonstrate the ability to lip-read in more general situations than we have done so far,’ said Harvey. ‘And we’ll get some definitions of the boundaries of performance.’

Max Glaskin