New software uses a model of how the brain interprets shapes to locate, analyse and report text in any orientation and font in a photographic image.
A digital photograph of a busy street scene on a PC contains an enernous variety of text in differing fonts and orientations. Shop signs, adverts on buses, street names, road markings and signposts are all present but none of it is square-on to the camera. The human eye can easily recognise and understand all the text on display, but until now it has proven very difficult to replicate on a computer. SceneReader, a software tool made by break-step productions, can take the photograph as input and return all the text in a useable format.
Patrick Andrews, Managing Director of break-step productions, which makes SceneReader, said, “The idea for SceneReader came to me when I was a research student with the late Fergus Campbell, FRS. It occurred to me to write a program that would coarsely emulate the functions which we know certain cells in the visual cortex of primates perform. The original version was implemented using a simple spreadsheet. Since then it has been professionally developed and extended by our in-house technical team.”
SceneReader uses three inputs: the photographic image, a flat-file dictionary of allowed words and a pluggable font knowledgebase. It uses Foveola shape recognition technology, a wholly new approach inspired by Nobel-prizewinning research into the primate visual system.
The Foveola system works by employing a general model of how shapes may be represented in the brain, which is unrelated to the conventional probabilistic neural nets approach. “When we show the system a new shape it can not only recognise similar shapes at once, but the new knowledge doesn’t interfere with whatever has been learnt before, “ said Andrews.
SceneReader also incorporates a model of how words appear within images and uses a bank of adaptive techniques to detect and interpret word structure.
The company is hoping to license the SceneReader technology to third parties for use in image search services online and for building text interpretation into new and existing imaging products. Examples include handheld assistive tools for the blind and sign translation. Robots could be guided by reading signs, for example in airports. Unmanned aerial vehicles could identify particular buildings or vehicles by their signage, and cars could read and act on road signs.
SceneReader can work with relatively poor-quality images such as those from phone-cameras, as well as high-quality images such as scanned photographs and high-end digital camera output. It can also monitor camera systems for the appearance of printed signs, cutting false alarms whilst reducing the bandwidth and personnel costs.
break-step productions won a DTI SMART award in connection with an application of Foveola intended to allow blind people to read environmental text, such as labels on cans in a cupboard.