Approaches to Automatic Information Organization

James Allan
Center for Intelligent Information Retrieval
Computer Science Department
University of Massachusetts, Amherst

Wednesday, April 22, 1998 
11am-12:15n
 Interschool Lab, 7th floor, Schapiro CEPSR Bldg.

Host: Kathy McKeown

Abstract

How can someone efficiently and automatically analyze the electronic information that arrives each day? How can a system help someone make sense of all the information already saved, as well as monitor other sources for additional data that might be useful? How can that information be presented in a way that makes it possible to assimilate the new material rapidly and to understand how it connects with archival data?

The goal of Information Organization is to address those questions. It is an outgrowth of Information Retrieval (IR) technology that uses statistical and probabilistic techniques to find and describe relationships between texts. In this talk, I will discuss methods for finding and gathering new events in news, and will cover an approach to information visualization that adapts to user feedback.

The first item is addressed by Topic Detection and Tracking (TDT), a new effort that automatically identifies and tracks new events--i.e., unknown and unanticipated happenings--within a stream of news stories. "Events" differ from traditional IR "queries" and require a different model for effective organization. I will also talk about how the organization of information can be visualized and how a person might provide feedback to the system to improve the usefulness of the presentation. Although automatic organization alone is useful, a model of an "interesting presentation" can significantly improve the chance that a system will produce useful organization.



Luis Gravano
gravano@cs.columbia.edu