How to Get What You Want:

Searching for Meaning Worldwide

Charles Elkan
Computer Science and Engineering Department
University of California, San Diego

 

Abstract

Users of the worldwide web often pose queries that are excessively general or excessively specific by the standards of traditional information retrieval research. On the one hand, queries often consist of one or two words that superficially match thousands of documents equally well. On the other hand, users often want to find specific facts, as opposed to documents about a topic. This talk describes ongoing research addressing these two challenges.

We have two methods for handling very short queries. First, the relevance of a document D to a query Q is measured as the average similar between Q and a cluster of documents similar to D. This reduces the estimated relevance of documents that are only tangentially related to the query. Second, relevant documents that are similar are represented by a single document in the list given to the user, which increases the diversity of documents shown to the user.

Finding facts will be handled by hidden Markov model (HMM) queries generated automatically and refined through user feedback. Given a template for a fact, for example "director X of movie Y", an initial HMM generalizing the template will be assembled from a library of HMM components for standard grammatical units such as noun phrases. The keywords appearing in the HMM will be used for a standard Boolean search, and then the HMM will be used to score sentences in the retrieved documents. The user will specify which of the highest- scoring sentences actually contain facts of interest. A training algorithm will then revise the HMM to maximize the score of these sentences, and the search process will be repeated.

Software for extracting facts from text using HMMs has been implemented by Tim Leek, and the methods mentioned above for handling short queries have been implemented by Onn Brandman. Lessons learned from these prototypes will be discussed.



Luis Gravano
gravano@cs.columbia.edu