New Developments in Information Management: 
Topic Detection, Translingual Retrieval and Summarization

Jaime Carbonell
Language Technologies Institute
Carnegie Mellon University

Abstract

As information retrieval comes of age, new challenges arise, including: 1) Are there better ways to retrieve documents than simple relevance to a key-word query? 2) Are there better ways to present long documents, e.g., via query-relevant summaries? 3) Can one issue a query in English and retrieve appropriate documents in other languages too? 4) Is there a way to explore the contents of large text collections in less than glacial time without missing important topics? These issues are addressed by recent advances, including the Maximal-Marginal Relevance method in retrieval and summarization, hierarchical document clustering for event and topic detection, and various methods for cross-language retrieval such as example-based query translation, bilingual pseudo-relevance-feedback, and bilingual dual-vector space methods (GVSM and LSI). Recent results based on the above are discussed, as well as a plethora of exciting open problems.



Luis Gravano
gravano@cs.columbia.edu