Research Scientist, IBM
yuqing at us.ibm.com
Place: CS Conference Room in MUDD
Abstract:
The MASTOR (Multilingual Automatic Speech-to-speech TranslatOR) system is a speech translation prototype system developed by IBM (funded by the DARPA Babylon program). In the first part of this talk, I will give a brief overview of the project status. In the second part of the talk, I will address the issue of Natural Language Generation, which is critical to the performance of our statistical interlingua-based speech translation used in the MASTOR system. To improve maximum-entropy-based concept generation, a set of novel features and algorithms are proposed including features enabling model training on parallel corpora, employment of confidence thresholds and multiple sets of features. The concept generation error rate is reduced by 43%-50% in our speech translation corpus within limited domains. Improvements are also achieved in our experiments on speech-to-speech translation.
A real-time translation demo will be presented at the end of the talk.