Sound in the round
The latest developments in audio technology could bring the 3D cinema experience to your ears as well as your eyes. Stephen Harris immerses himself in the new soundscape
A helicopter roars overhead as the sound of sirens appears in the distance, the cars drawing gradually closer before circling the building. You start to hear rain hitting the roof above you and trickling slowly to the ground. Then an almighty explosion engulfs the theatre, passing from front to back as bullets fly past your seat. Luckily, none of this is real. These are the sounds of the next generation in immersive entertainment to hit cinemas: 3D audio.
We’ve already become used to watching images pop out of the screen or disappear into the distance when we’re enjoying the latest blockbuster movie with 3D visuals. But most people have yet to experience 3D sound, even though it was invented decades ago – a fact that could be about to change. In some forms, it’s an evolution of the surround sound we’re used to encountering in cinemas or home entertainment systems. However, the concept of 3D audio isn’t necessarily about placing more speakers around an audience; it’s about recreating sounds in a way that tricks listeners’ brains into thinking they are hearing them live rather than as a recording.
Inside a small, specially designed room in London’s Soho is one of just two examples in the UK of perhaps the most advanced commercial cinema sound system in the world: Dolby Atmos. Actually, it’s a room within a room, created to insulate screenings from the vibrations of the nearby Tottenham Court Road station (currently being rebuilt to accommodate Crossrail). Rows of speakers not only adorn the walls and hide behind the screen but also hang from the ceiling, ready to envelop the audience in sound from (almost) all directions.
The clever part of the technology, however, is invisible. As well as traditional dialogue, music and background sounds sent to different speakers, films mixed for Dolby Atmos include a number of individual sound effects known as “objects” designed to move between speakers as smoothly as if you were hearing them move in real life.
Typically, sound files such as movie soundtracks are mixed to a specific number of tracks or channels, each feeding a different signal to a separate speaker or array of speakers in a different part of the room. This number has grown over the years from mono (1), to stereo (2), to the now commonly used 5.1 (which includes an additional bass signal to a subwoofer), and on through a number of formats up to 22.2.
Rather than adding to this number again just to accommodate more speaker signals, Dolby Atmos uses the first 10 channels for the traditional soundtrack and a further 118 for the moveable objects. Each object signal includes extra information known as metadata that tells the speakers’ control system the three-dimensional coordinates of where in the room that sound should play from at any one time. As the coordinates change, the system uses different speakers to recreate the sounds in the right location, producing the illusion of a moving audio source.
‘For the first time now we’re opening up the possibility of having very precise sound coming from around the room, and those sat in the front will get a different experience from those sat in the back because as the sound pans around it will get remarkably quieter [as it gets further away],’ said Julian Pinn, director of cinema marketing at Dolby Europe, speaking at a recent demonstration of Dolby Atmos for the Audio Engineering Society.
“True 3D sound systems can aid the sense of reality and opens up many new creative possibilities
Dave Hunt, Immersive Audio
This precise level of control is what really differentiates 3D audio from conventional surround sound. And the technological breakthroughs that are enabling it, including Dolby Atmos, are creating real excitement among sound engineers who are ready to move their craft to a new level. ‘True 3D sound systems allow much better reproduction of spatial direction of sounds, moving sources and an acoustic environment which contains them,’ said Dave Hunt, a sound engineer from the recently founded firm Immersive Audio, set up to provide 3D sound installations to performance venues and live events. ‘It can aid the sense of reality (or artfully constructed hyper-reality), and opens up many new creative possibilities.’
Source: Warner Bros Pictures
So far, several films including Disney Pixar’s Brave and more recently Warner Bros’ The Hobbit have been released with an Atmos-mixed soundtrack to a handful of test cinemas, in advance of the system’s commercial launch in April this year. Because the object information is contained in the metadata not the main sound signal, Dolby doesn’t need to produce two versions of the soundtrack. Plus, the system doesn’t require a special cinema design: each object signal is encoded with spatial coordinates rather than sent to a specific speaker channel, so the Atmos software will render it to the best location regardless of the overall speaker setup.
‘With that in mind it gives us the ability to tailor the number of speakers that a given space will have depending on the size and shape of that auditorium,’ explained Pinn. In Dolby’s Soho screening room there are 26 speakers; in the Leicester Square Empire – the UK’s other Atmos-equipped cinema – there are around 40. The Dolby Theatre (formerly the Kodak Theatre) in Hollywood, meanwhile, essentially uses a hybrid of two and a half Atmos systems to cope with the auditorium’s balcony.
Although Dolby has so far only focused on the cinema market, this adaptability of object-based 3D audio systems means they could transition to domestic use relatively easily. In fact, there are a variety of techniques for producing 3D sound that could offer similar flexibility that could find their way not only into home entertainment systems but also the growing number of computers, tablets and mobile phones used to watch TV.
With this in mind, the BBC has set up an Audio Research Partnership with the universities of Surrey, Salford, Southampton, York and Queen Mary, University of London, in order to explore the possibilities for bringing 3D or spatial audio to television programme making, broadcasting and watching. ‘The most important thing for the BBC is to deliver very high quality to the audience and always look for new experiences,’ said BBC head of audio research Frank Melchior. ‘If we go back in history and look at what are the developments in audio formats, there hasn’t been much in the last decade. We would like to deliver a new experience to the audience, but this time it has to be flexible enough to deal with the habits of the audiences and multiple devices.’
As with Dolby Atmos, the key to this flexibility is the ability to take the different tracks of a sound file and use software to render and optimise them for the specific speaker set-up or device the viewer is using – the opposite of current 5.1 broadcasts that are optimised for speakers in set positions. ‘You can also imagine if you have these separate elements in your scene not mixed together then you can have also room for changing and adapting them, for example bringing in personal preferences or just changing the language of a movie,’ said Melchior. ‘So they open up a whole new dimension of new experiences, not only 3D where it’s really helpful but also there is a lot of new potential within these new representations of audio.’
3D audio theories
Three-dimensional audio goes back almost to the start of recorded sound and a variety of approaches have been developed to produce it. One method the BBC and its partners have been examining is binaural sound, a very old technique where two microphones are used to capture the same sound signals a person hears through their left and right ears, and the subsequent recording is played back through headphones. The signals are often recorded using microphones placed inside the head of a mannequin to get as close as possible to those picked up by human ears.
Another spatial audio approach known as wave field synthesis (WFS) dates back to the late 1980s. It uses loudspeaker arrays to synthesise wave fronts in such a way that you can control where the sound appears to be coming from. The advantage of this method is that – unlike other 3D sound techniques including ambisonics – it doesn’t create a sweet spot, outside of which the spatial audio effects break down. WFS systems have been installed in a small number of cinemas worldwide in the last few years, including in the famous Chinese Theatre on Hollywood Boulevard, although the large number of speakers need to make it work may have limited its spread.
One of the techniques the BBC is experimenting with to produce 3D audio is known as ambisonics, which attempts to go further than just controlling the signals received by the listener’s ears and instead create or recreate an entire sound field within a space. Invented by British academic Michael Gerzon in the 1970s, ambisonics involves combining a number of sound signals recorded or synthesised from different directions, based on a mathematical theory called spherical harmonics. Recording basic ambisonic sound requires three mics in the X, Y and Z directions and a fourth omnidirectional device; higher order recordings use additional microphone signals, processed to give even more sophisticated directional information. Specialist software can then combine the signals and use them with an array of loudspeakers to produce a sound field very similar to the one where they were originally captured.
‘You can see ambisonics as something in between the channel- and object-based approaches,’ said Melchior. ‘You come up with a kind of scalable representation of the spatial audio scene and adapt that to a different speaker layout by decoding it, knowing the speaker positions and trying to adapt the representation to that layout, but you no longer have access to single elements in the scene. It’s not like the object-based approach where you can just take out the dialogue, for example. But you can make sure that if the centre speaker in the setup is not in the right position you can adapt the whole scene to that case and get the correct spatial reproduction.’
“We would like to deliver a new experience to the audience, but this time it has to be flexible enough to deal with the habits of the audiences and multiple devices
Frank Melchior, BBC R&D
One of the obvious uses for replicating an entire sound field is in live music recordings, especially of large bands or orchestras performing in unique venues. Sound engineers have long tried to reproduce on record the atmosphere of a concert, complete with echo, reverberation and background noise. Ambisonics adds to this a precise sense of exactly where all the musicians are playing in the room, allowing a listener at home to close their eyes and imagine they are at the concert. The BBC has already experimented with ambisonic recordings made at the Last Night of the Proms at the Royal Albert Hall and at a performance by award-winning band Elbow in Manchester Cathedral.
However, ambisonics is just one approach to creating this kind of spatial audio. Part of the challenge for the BBC Audio Research Partnership is to experiment with capturing and mixing 3D audio signals in a variety of different recording locations, whether that’s a controlled radio studio, a more unpredictable outdoor film set or an entirely synthesised environment. These signals will then need to be transmitted to and reproduced through relatively affordable and practical equipment in the home.
Members of the Partnership at Southampton’s Institute of Sound and Vibration Research are working on an alternative method for generating spatial audio based on mathematical techniques derived from a theory known as inverse problems. Developed in collaboration with the Korean Electronics and Telecommunications Research Institute (ETRI), this involves recording and examining a sound field and then effectively attempting to reverse-engineer it using active sound control, where the signals from different speakers interfere to create a specific pattern of sound. The researchers say that compared to other techniques, their approach is more adaptable to almost any speaker configuration.
To test their signal processing algorithms, the Southampton team, led by Prof Philip Nelson and Dr Filippo Fazi, has built a huge spherical array of 40 loudspeakers that each send out a different component signal. Inside is a microphone array that measures the resulting sound field, analysing phase and amplitude for all the different frequencies, to determine if it has been reproduced accurately. The sphere also provides a great platform for the researchers to hear what the result sounds like, which feeds into another element of their work known as psychoacoustics.
Source: University of Southampton
‘The physical reality of the sound field is described by a very large number of physical observables that have a very high degree of complexity,’ said Fazi. ‘We can simplify this process but obviously every time we throw information away we reduce the accuracy of the model. [But] when human beings perceive the sound scene we throw away a large amount of information.’
If the researchers can create signals that only include the information picked up by the brain then they would need much less space to store and transmit those signals, in a similar way to how the MP3 format compresses sound files with relatively little noticeable difference in quality. And to do this they are developing biologically inspired models of how sound pressure at the ears is turned into an image in the brain. ‘We don’t try to mimic the brain but try to better understand it and use that as inspiration for engineering purposes,’ said Fazi.
This issue of file size is an important one to the wider success of the technology. One of the reasons 3D audio has yet to reach the mainstream, despite techniques like ambisonics being around for decades, is the vast amount of computing power it takes to render the files in real time to whatever speaker layout the listener happens to be using. But processors have finally reached the high-speeds needed to produce 3D sound at a reasonable cost, which explains some of the recent advances in the field.
The other barrier to the spread of 3D sound has been industry takeup, limited by a fear the technology is too expensive for consumers and the lack of an agreed standard format. So Dolby’s decision to roll out the Atmos system, encouraged by the movie industry’s desire to improve the cinema-going experience in response to piracy, is a big deal. However, just as 3D video has its detractors, there is a chance the audience might not buy into. ‘It is definitely an important milestone,’ said Fazi. ‘If it is not well received then obviously it might impact the whole 3D audio world … We will see if this will actually lead to a 3D revolution in the audio world or will be just an attempt and will die out again.’
But even if some find 3D audio to be a gimmick, the trend in the movie world is towards a greater number of ways to watch a film, whether with 3D glasses, giant IMAX screens, or the 48 frames per second format also pioneered by The Hobbit.
And 3D sound is another option for filmmakers and audiences, said Dolby’s Julian Pinn. ‘If we can allow content creators, whether they’re using a sonic device or a visual device, to bring more of an immersive experience to the audience then that’s just one more tool in their armoury to tell the story.’
Home sound systems
One of the great advantages of both object-based and ambisonic 3D audio systems is that the sound is rendered for whatever speaker setup the listener is using, although the level of immersion is likely to improve with more speakers. This means the technology may find its way into people’s homes more quickly than previous advances such as the 7.1 system for Blu-ray players, which requires consumers to buy a new set of speakers and arrange them in a particular layout. But it will still require the perfection of complicated spatial audio algorithms and the creation of standard formats.
One potential interim alternative could come from UK firm Cambridge Mechatronics, which is developing a single speaker array that can turn ordinary stereo (or the more effective binaural sound) into a representation of a 3D sound field around the listener. The Dynasonix projects two sound beams to a listener’s ears, creating the illusion that the signals are coming from separate speakers. Using motion-tracking technology, the device can follow and project to up to six listeners at a time, using additional noise-cancelling signals so each user can only hear their own sound beams.