A system designed by scientists at Oxford and Leeds universities can learn British Sign Language signs from overnight television broadcasts.
A system designed by scientists at Oxford and Leeds universities can learn British Sign Language (BSL) signs from overnight television broadcasts by matching subtitled words to the hand movements of an on-screen interpreter.
The work is a crucial step towards a system that can automatically recognise BSL signs and translate them into words.
A major challenge in recognising signs is tracking the signer’s hands as they move on the broadcast. This is no mean feat, as they can get lost in the background, blur or cross, and the arms can assume a vast number of configurations.
The new system tackles this problem by overlaying a model of the upper body onto the video frames of the signer, looking for probable configurations, finding the large number of frames where these can be correctly identified and then ‘filling in the gaps’ to infer how the hands get from one position to another.
Another big challenge was to match a target word that appears in a subtitle to the corresponding sign – this is particularly difficult as words and signs often appear separated in time and words can be signed in many different ways so the corresponding sign may not appear at all.
To overcome this problem, the system compares a small number of sequences in which the target word appears in the subtitles with a large number of sequences in which it does not.
Within this footage it then finds the seven to 13 frames that appear often in the ‘target word’ sequences and infrequently in the ‘no target word’ ones. This enables it to learn to match more than 100 target words to signs automatically.
Andrew Zisserman of Oxford University’s Department of Engineering Science led the work, with Patrick Buehler at Oxford and Mark Everingham of Leeds University’s School of Computing.
Zisserman said: ‘This is the first time that a computer system has been able to learn signs on its own and on this scale in this way – with just the information available in the broadcast’s subtitle information and video frames and without the need for humans to give it annotated examples of what each sign looks like.’
Everingham added: ‘It demonstrates the sort of very tough problems that advanced image-recognition technology is starting to be able to solve. These technologies have the potential to revolutionise the automated searching, classifying and analysis of moving and still images.’
The research was supported by the Engineering and Physical Sciences Research Council, Microsoft and the Royal Academy of Engineering.
Source: Oxford Science Blog