Ultrasound-based augmented tongue technology could improve speech therapy

1 min read

A novel system that converts ultrasound scans of the movement of the tongue into an augmented reality “talking head” could be used to improve speech-therapy techniques.

Developed by a team from the National Centre for Scientific Research (CNRS) in France the technology uses an ultrasound probe placed under the jaw to scan the movement of the tongue, palate and teeth.  These movements are then processed by a specially-developed machine learning algorithm that controls an "articulatory talking head," that can be used to help speech therapists and patients better understand the physical processes occurring during speech.

Typically, a speech therapist will analyse a patient’s pronunciation and then explain, using drawings, how the patient can improve this by adjusting the placement of the tongue. However, the effectiveness of this approach depends heavily on how well the patient can understand what they are told. By enabling patients to see their articulatory movements in real time, and in particular how their tongues move, it should be easier to correct pronunciation problems.

Researchers have been using ultrasound to design biofeedback systems for a number of years. However, the images acquired using this technique are often of a poor quality and therefore difficult for a patient to use. According to the CNRS team, the creation of a virtual clone of a real speaker brings this data to life in a way that is more readily accessible.

Ultrasound scans of a patient's tongue alongside a real-time AR representation of the mouth. Image: CNRS

This system, validated in a laboratory for healthy speakers, is now being tested in a simplified version in a clinical trial for patients who have had tongue surgery. The researchers are also developing another version of the system, where the articulatory talking head is automatically animated, not by ultrasounds, but directly by the user's voice.

A paper on the research is published in the journal Speech Communication.