New Developments in Information Management:
Topic Detection, Translingual Retrieval and Summarization
Abstract
As information retrieval comes of age, new challenges arise, including:
1) Are there better ways to retrieve documents than simple relevance to
a key-word query? 2) Are there better ways to present long documents, e.g.,
via query-relevant summaries? 3) Can one issue a query in English and retrieve
appropriate documents in other languages too? 4) Is there a way to explore
the contents of large text collections in less than glacial time without
missing important topics? These issues are addressed by recent advances,
including the Maximal-Marginal Relevance method in retrieval and summarization,
hierarchical document clustering for event and topic detection, and various
methods for cross-language retrieval such as example-based query translation,
bilingual pseudo-relevance-feedback, and bilingual dual-vector space methods
(GVSM and LSI). Recent results based on the above are discussed, as well
as a plethora of exciting open problems.
Luis Gravano
gravano@cs.columbia.edu