Researchers at Columbia University in New York have created a system that translates thoughts directly into recognisable speech.
The neuroengineers behind the system, from the university’s Zuckerman Institute, claim that it marks a major step towards the development of brain-computer interfaces for patients with limited or no ability to speak such as those living with motor neurone disease or recovering from stroke.
Research has shown that when people speak – or even imagine speaking – telltale patterns of activity appear in their brain. Distinct (but recognisable) pattern of signals also emerge when we listen to someone speak, or imagine listening.
Early efforts by the Columbia team to decode these brain signals and translate them into words focused on simple computer models that analysed spectrograms, which are visual representations of sound frequencies. However, this approach failed to produce anything resembling intelligible speech, so the team – led by Dr Nima Mesgarani – turned instead to a vocoder, a computer algorithm that can synthesise speech after being trained on recordings of people talking. “This is the same technology used by Amazon Echo and Apple Siri to give verbal responses to our questions,” said Dr Mesgarani.
To teach the vocoder to interpret to brain activity, Dr Mesgarani teamed up with neurosurgeon Dr Ashesh Dinesh Mehta.
“Working with Dr Mehta, we asked epilepsy patients already undergoing brain surgery to listen to sentences spoken by different people, while we measured patterns of brain activity,” said Dr Mesgarani. These neural patterns were used to train the vocoder.
Next, the researchers asked those same patients to listen to speakers reciting digits between 0 to 9, while recording brain signals that could then be run through the vocoder. The sound produced by the vocoder in response to those signals was analysed and cleaned up by neural networks, a type of artificial intelligence that mimics the structure of neurons in the biological brain.
The end result was a robotic-sounding voice reciting a sequence of numbers. To test the accuracy of the recording, Dr Mesgarani and his team tasked individuals to listen to the recording and report what they heard.
“We found that people could understand and repeat the sounds about 75 per cent of the time, which is well above and beyond any previous attempts,” he said.
The improvement in intelligibility was especially evident when comparing the new recordings to the earlier, spectrogram-based attempts. “The sensitive vocoder and powerful neural networks represented the sounds the patients had originally listened to with surprising accuracy.”
Dr Mesgarani and his team now plan to test more complicated words and sentences, and want to run the same tests on brain signals emitted when a person speaks or imagines speaking.
Ultimately, they hope their system could be part of an implant, similar to those worn by some epilepsy patients, that translates the wearer’s thoughts directly into words.
A paper on the research is published in the journal Scientific Reports.