Exploiting Linguistic Context in Image Interpretation 
and 
Content-Based Image Retrieval

 

Rohini K. Srihari
Center for Document Analysis and Recognition (CEDAR)
Dept. of Computer Science
State University of New York at Buffalo

 

Abstract

This research concerns the interaction of language and vision in image understanding and retrieval tasks. This research is relevant to many situations where text and images are jointly used to communicate information; two distinct scenarios will be discussed.

The first scenario concerns the exploitation of linguistic context in vision systems. Linguistic context is qualitative in nature and is obtained dynamically. We view this as a new paradigm which is a golden mean between data driven object detection and site-model based vision. Our theory for collateral-based vision includes goal-directed NLP, suitable knowledge representations, and efficient search strategies. The design and implementation of a system, Show&Tell, a multimedia system for semi-automated image annotation is discussed. This system, which combines advances in speech recognition, natural language processing and image understanding, is designed to facilitate the work of image analysts.

The second scenario concerns the interaction of textual and photographic information in multimodal documents. The World Wide Web (WWW) may be viewed as the ultimate, large-scale, dynamically changing, multimedia database. Finding useful information from the WWW poses a challenge in the area of multimodal information indexing and retrieval. The word ``indexing'' is used here to denote the extraction and representation of semantic content. Our research focuses on improving precision and recall in a multimodal information retrieval system by interactively combining text processing with image processing. We exploit the fact that images do not appear in isolation, but rather with accompanying text which we refer to as collateral text. The interaction of text and image content takes place in both the indexing and retrieval phases. An application of this research, namely a picture search engine which permits a user to retrieve pictures of people in various contexts will be discussed.



Luis Gravano
gravano@cs.columbia.edu