Computer translation

Three years of work by a large interdisciplinary team at the University of Southern California has created a rudimentary but working two-way voice translation system.

Three years of work by a large interdisciplinary team at the University of Southern Californiahas created a rudimentary but working two-way voice translation system that allows an English-speaking doctor to talk to a Persian-speaking patient.

The Transonics Spoken Dialog Translator turns a doctor’s spoken English questions into spoken Persian, and translates patients’ spoken Persian replies into spoken English.

“Fluent two-way machine voice translation is one of the holy grails of engineering,” said Shrikanth Narayanan, an associate professor of electrical engineering, computer science and linguistics at the USC Viterbi School of Engineering who directs the Speech Analysis and Interpretation Laboratory (SAIL) in the ViterbiSchool‘s IntegratedMediaSystemsCenter.

“We are years away from perfecting it, but we hope to have something that will be useful in emergency rooms or ambulances within two years or so.”

The Transonics system runs on a laptop computer using the Linux operating system. Doctor and patient both wear headphones with attached microphones. A small keypad connected to the computer speeds and simplifies certain routine commands — switching from doctor mode to patient mode, for example.

When a doctor asks a question, the speech recognition software captures it — but hedges its bets by displaying not just its best guess about what was said, but a range of options. When the doctor chooses the most appropriate (some of the most used phrases can be put in a quick access “ready menu,”) and the result is a spoken Persian question in the earphones of the patient.

Narayanan says much of the success of the interface grows directly out of analysis of a large database of some 300 English-speaking-doctor/Persian-speaking-patient dialogs created by USC medical students and Iranian-heritage USC students and Los Angeles residents. “Rather than imagining what people might say, we analyzed what people did say,” he explained, adding that recordings of the encounters were used to train and tune the system.

The system contains about 23,000 English and 9,000 Persian words, a disproportion that exists because relatively little has so far been done in machine translation of Persian (a language also called Farsi), either written or spoken.

More information on the system, including a video demonstration, is available at