Overview



We propose to develop a practical, multilingual and multidocument information tracking and summarization system. Our design features the integration of robust, statistical techniques, shallow linguistic approaches and machine learning to achieve scalability within languages and portability across languages. To realize these goals, we will develop methods for information tracking based on a novel algorithm for identification of events, summarization across documents using information fusion and identification of key differences, summarization across languages relying on identification and translation of terms, and new methods for identification, expansion and translation of terms. We will begin work with a language such as Spanish, but quickly expand to include Asian languages and other non Indo-European languages such as mid-Eastern languages.

The key features of our approach include: