Blind source separation technology and AI are completely cross-compatible with each other when it comes to noise reduction, says Dave Betts, chief science officer at AudioTelligence.
Open-plan offices often come under fire for being noisy and disruptive – making it difficult for workers to concentrate or join conference calls. Requests to work remotely tend to cite the peace and quiet at home that will lead to greater productivity.
But now that the coronavirus has suddenly forced many of us to work from home, the reality is proving somewhat different.
If your partner is also working from home, the chances are they are making phone calls and taking part in videoconferences to stay in touch with colleagues and clients. Add to that a couple of young children who are off school for weeks on end – and perhaps a dog that is now cooped up indoors for much of the day – and suddenly that peace and quiet seems more like a pipe dream.
Not only do you find you can’t hear yourself think – but your colleagues can’t hear you either when you join team-building conference calls on your laptop or make a VoIP call to your boss to discuss a particular project.
So the news that Microsoft is using artificial intelligence (AI) to improve the sound quality of meetings using its Teams communication and collaboration tool comes as no surprise. In fact, AI has been used in VoIP conferencing systems like Webex and Zoom for some time.
It’s one of the reasons I’m often asked why blind source separation (BSS) technology is better than AI when it comes to noise reduction. But that is the wrong question. The two technologies address different problems and – more importantly – are completely cross-compatible with each other.
Peace and quiet seems more like a pipe dream
BSS technology is the grown-up successor to beamforming. Beamformers are a form of spatial filter that use a microphone array to focus in a particular direction. Traditional beamformers need to know all sorts of information about the acoustic scene – such as the target source direction and the microphone geometry – and the more sophisticated ones need precisely calibrated microphones.
BSS works its magic by learning from the data. For each acoustic source in the scene, it learns a spatial filter that focuses on the region containing the source and optimally eliminates all the other sources in the scene. This means you get excellent interference rejection – without knowing anything about the positions of the sources, the microphone geometry or the calibration of the microphones.
It also means you don’t need any training to learn the array characteristics for any deployment. BSS technology still picks up any ambient noise coming from the same region as the target source. But interference signals are rejected and ambient noise is reduced.
This is where AI noise reduction comes in. It is simply the latest incarnation of a long line of noise reduction technologies going all the way back to simple spectral subtraction. It uses a completely different principle from a spatial filter – it analyses the signal in the time-frequency domain and tries to identify which components are due to signal and which components are due to noise.
The advantage of this approach is that it can work with just a single microphone. But the big problem with this technique is that it extracts the signal by dynamically gating the time-frequency content – and this gating can lead to unpleasant artefacts in poor signal-to-noise ratios.
We’ve all heard mangled VoIP calls where the other person sounds like they’re underwater – that’s the gating eating into the voice. You simply don’t get these sorts of artefacts when using spatial filters.
Now for the big secret… the two technologies work really well together. Put a microphone array in front of your VoIP call and Blind source separation will give you a signal with the interference rejected and the ambient noise reduced. It has significantly improved the signal-to-noise ratio. Now those AI noise reduction technologies will find it much easier to identify the residual ambient noise and get rid of it without all those unpleasant artefacts. Sounds like a winning combination to me.
Dave Betts, chief science officer at AudioTelligence, specialists in blind audio signal separation