The breakthrough marks the first time that speech or facial expressions have been synthesised from brain signals. The system can also decode these signals into text at nearly 80 words per minute, which improves on commercially available technology.
Edward Chang, MD, chair of neurological surgery at UCSF hopes this latest research breakthrough, will lead to an FDA-approved system that enables speech from brain signals in the near future. The is detailed in Nature.
“Our goal is to restore a full, embodied way of communicating, which is really the most natural way for us to talk with others,” Chang said in a statement. “These advancements bring us much closer to making this a real solution for patients.”
Chang’s team previously demonstrated it was possible to decode brain signals into text in a man who had also experienced a brainstem stroke. The current study demonstrates the decoding of brain signals into the richness of speech, along with the movements that animate a person’s face during conversation.
Chang implanted a paper-thin rectangle of 253 electrodes onto the surface of the woman (Ann)’s brain over areas that are critical for speech. The electrodes intercepted the brain signals that, if not for the stroke, would have gone to muscles in her tongue, jaw and larynx, as well as her face. A cable plugged into a port fixed to her head connected the electrodes to a bank of computers.
The participant worked with the team to train the system’s artificial intelligence algorithms to recognise her unique brain signals for speech. This involved repeating different phrases from a 1,024-word conversational vocabulary repeatedly, until the computer recognised the brain activity patterns associated with the sounds.
Rather than train the AI to recognise whole words, the researchers created a system that decodes words from phonemes. Using this approach, the computer only needed to learn 39 phonemes to decipher any word in English, which enhanced the system’s accuracy and made it three times faster.
“The accuracy, speed and vocabulary are crucial,” said Sean Metzger, who developed the text decoder with Alex Silva, both graduate students in the joint Bioengineering Program at UC Berkeley and UCSF. “It’s what gives a user the potential, in time, to communicate almost as fast as we do, and to have much more naturalistic and normal conversations.”
To create the voice, the team devised an algorithm for synthesising speech, which they personalised to sound like her voice before the injury, using a recording of her speaking at her wedding.
The team animated the avatar with software that simulates and animates muscle movements of the face, developed by Speech Graphics, a company that makes AI-driven facial animation. The researchers created customised machine-learning processes that allowed the company’s software to mesh with signals being sent from the woman’s brain as she was trying to speak and convert them into the movements on the avatar’s face, making the jaw open and close, the lips protrude and purse and the tongue go up and down, as well as the facial movements for happiness, sadness and surprise.
The team will look to create a wireless version that would not require the user to be physically connected to the BCI.