‘Smart ears’ for machines

A major project is under way to give machines ‘smart ears’. The aim is to develop digital systems that can understand the sounds they hear so they can react appropriately.

A traffic camera, for example, would know where to zoom when it heard a crash. At a football match a search engine would find a video clip of a goal by listening for the crowd’s roar. A computer would print out a symphony score after enjoying the live performance.

‘There are many potential applications for machine listening, just as there already is with machine vision,’ said project leader Dr Mark Plumbley, of Queen Mary, University of London. ‘I want to establish machine listening as a key-enabling technology to improve our ability to interact with the world, leading to advances in areas such as health, security and the creative industries.’

There are already niche systems that give dedicated machines the ability to recognise some elements of the sounds they hear. Examples of these include speech recognition and process monitoring. But these applications have limited functions and are not very useful in the wider world.

‘We want to identify underlying principles so they can be applied to many different applications,’ said Plumbley.

Sound, however, only makes real sense to living creatures. Our brains have developed techniques to screen out unwanted noise and focus on the important events, while still remaining alert to other key sounds.

We can adjust our hearing according to background noise, foreground volume, rhythm, pitch and timbre. No computer can do that yet because there is just too much information that is not easily separable.

Plumbley believes that new approaches will provide the basis for achieving this. ‘My idea is to introduce new methods for machine listening of general audio scenes,’ he said.

‘I will develop new interdisciplinary collaborations with both the machine vision and biological sensory research communities to investigate and develop general organisational principles for machine listening. One such principle that looks very promising is that of sparse representations.’

This is an analysis method based on the principle that observations should be represented by only a few items chosen from a large number of possible items.

It is thought by some that this is how our brain manages to make sense of life’s general cacophony, and Plumbley thinks his project could help neuroscientists improve their understanding of our auditory processing.

‘I plan to use sparse representations to explore new biologically-inspired machine listening methods, and in turn improve our understanding of biological hearing systems,’ said Plumbley.

Plumbley has won funding for five years from the EPSRC under its Leadership Fellowship scheme which, unusually within academia, will allow him to work full time on the project.

He has received interest from industrial concerns, including Bang & Olufsen, Kodak, the BBC and Google, plus hearing aid manufacturers Oticon and Phonak, all of which recognise the commercial possibilities of practicable machine listening.

Plumbley said it is an opportunity for the UK to take an international lead in a technology which has enormous potential to transform the way we interact with the world.

‘There is no Machine Listening equivalent of the British Machine Vision Association or the UK Industrial Vision Association,’ he said. Even internationally, machine listening researchers are often divided into separate areas such as speech processing, music processing or acoustics.

‘Yet new analysis techniques from the worlds of musical audio analysis and pervasive/ubiquitous computing mean that these approaches are ripe for exploitation in a wider range of applications.

‘So there is an urgent need for a new interdisciplinary community in machine listening and an important part of my work is to create one,’ he said.

Plumbley will be promoting seminars, research workshops and knowledge transfer events to stimulate machine listening and sparse representations development, initially within the UK and then internationally.

Condition monitoring, security, safety, cochlear implants and hearing aids, music and video, even computer games, all stand to benefit from machine listening.

Max Glaskin