Yuqing Gao

Research Scientist, IBM
yuqing at us.ibm.com


Title: Statistical Natural Language Generation in MASTOR: IBM Speech-to-Speech Translation System

Time: Thursday October 23 , 11:30 - 12:30

Place: CS Conference Room in MUDD

Abstract:

The MASTOR (Multilingual Automatic Speech-to-speech TranslatOR) system is a speech translation prototype system developed by IBM (funded by the DARPA Babylon program). In the first part of this talk, I will give a brief overview of the project status. In the second part of the talk, I will address the issue of Natural Language Generation, which is critical to the performance of our statistical interlingua-based speech translation used in the MASTOR system. To improve maximum-entropy-based concept generation, a set of novel features and algorithms are proposed including features enabling model training on parallel corpora, employment of confidence thresholds and multiple sets of features. The concept generation error rate is reduced by 43%-50% in our speech translation corpus within limited domains. Improvements are also achieved in our experiments on speech-to-speech translation.

A real-time translation demo will be presented at the end of the talk.