By recognising both visual and audio cues, a self-aiming camera being developed at the University of Illinois can tell the difference between an airplane and an albatross.
The camera system, which could find use as an intelligent sentinel in sensitive military applications, was originally built to demonstrate the versatility of a simulated neural network, which the researchers modelled after the superior colliculus of the human brain.
‘The superior colliculus serves as the visual reflex centre of the brain,’ said Sylvian Ray, a UI professor of computer science and a researcher at the Beckman Institute for Advanced Science and Technology. ‘It is the primary agent for deciding which direction to turn the head in response to sensory stimuli such as visual and auditory cues.’
To demonstrate the effectiveness of their neural network, Ray and his colleagues – molecular and integrative physiology professor Thomas Anastasio, postdoctoral research associate Paul Patton, and graduate research assistants Samarth Swarup and Alejandro Sarmiento – constructed a camera and microphone system that supplies visual and auditory cues to the model and responds to its directives.
One camera looks for motion by comparing successive video frames while the system monitors audio signals from a pair of omnidirectional microphones. A sound-location algorithm analyses the sounds and sends the information to the neural network. The model then determines the correct position and moves a second camera, equipped with a long-focus lens, to acquire the target. This target image can be transmitted to a human operator for further analysis.
‘While the system can be attracted by either sight or sound, the combination of the two offers a much stronger stimulus,’ Ray said. ‘By using look-up libraries of sight and sound, the system can differentiate between an aircraft on the horizon and a flock of birds.’
During infancy, the superior colliculus helps a baby’s brain associate external direction with an internal visual reference grid – mapping a mother’s moving lips to the sound of her voice, for example. In a similar fashion, the researchers’ model learns to align its sound-source location processing with an embedded visual map.
‘As the system learns to correctly locate both sound and visual sources, it also learns what types of objects are preferred targets,’ Ray said. ‘We want to teach it to ignore common objects and focus on unusual sounds or visual motions.’
Besides the obvious security applications, the self-aiming camera could also find applications in long-distance learning, Ray said. ‘One camera could follow the speaker. Another camera could point at the audience, and automatically zero in on a student raising a hand to ask a question.’
The work was originally funded by a UI Critical Research Initiatives grant. Additional funding to develop the intelligent sentinel concept came from the Office of Naval Research.