Researchers at Hong Kong Polytechnic University have developed speech recognition technology that may lead to mobile phones understanding a wide variety of commands, even in noisy settings.
The ASSF (auditory spectrum-based speech feature) approach, which runs on less processing power than in current systems, could be found in commercial products in three to five years, enabling applications such as voice-controlled Web surfing on mobile phones or PDAs, the researchers said.
Another potential use of ASSF speech recognition is voice-controlled computer games, an application that was highlighted by Chuang Wen-hao of the Hong Kong Polytechnic University at the Game Technology Conference 2001 currently being held in Hong Kong.
A sophisticated voice-controlled game could respond quickly to commands such as ‘Fire!’ even under noise conditions that might be very different from the typical office environment, said Chuang.
Chuang said ASSF uses less processing power than the widely used MFCC (Mel function cochlear coefficient) technology, because it looks at fewer parameters when it interprets the waveforms of the user’s speech.
Instead of filtering the sound for all those parameters, ASSF uses more sophisticated decision rules for dealing with the data it gathers about the waveforms.
The decision rules, unlike the complex algorithms used in interpreting wave forms, can run in memory so ASSF can run on a system with a less powerful processor, said Chuang.
In a quiet setting, ASSF makes more errors than MFCC, but it outperforms MFCC in a noisy environment, he said. Chuang’s team is now working on reducing speech-recognition error rates – now more than 70 percent in the noisiest setting – to usable levels.
In addition, it is said to be best suited to recognition of commands rather than complex statements. However, the technology could allow for handheld devices that respond to more complex commands than can be used on phones today, which are generally limited to short, specific statements and may be affected by outside noise, concluded Chuang.
In a separate development, Toshiba Corporation has announced the release of the TC35273XB videophone LSI.
TC35273XB delivers an MPEG-4 video encoder and decoder (codec), a speech codec, an audio and video multiplexer and 12 megabit of DRAM on a single chip.
The new LSI was developed with 0.18-micron process technology, and is said to achieve two key design goals: reduced power consumption and chip shrink to achieve an easy-to-assemble Fine Ball Grid Array (FBGA) package measuring 11 x 11 millimetres.
Advances in power consumption and size also extend to performance, including reinforcement of the gated clock to support video conferencing at 80-milliwatt power consumption and 60MHz operation.
Mass production of the chip will begin towards the end of 2001.