Humane Interfaces to Video

Nuno Vasconcelos
MIT Media Laboratory

Wednesday, March 25, 1998 
11am-12:15n
 Interschool Lab, 7th floor, Schapiro CEPSR Bldg.

Host: John Kender

Abstract

Due to the massive amounts of imagery that start to appear all over the digital communications infrastructure, there has recently been a strong interest in the development of interfaces to filter, retrieve, classify, browse through, and summarize video. Four components are needed in any system to catalogue and access clips of motion picture information: a representation that permits ready perusal, a set of robust techniques to select appropriate footage, an interface that maps the analysis to human terms, and an application context in which to work. The design of such components is addressed in this talk, where I will present a Bayesian framework for the construction of useful interfaces to visual libraries.

Two video representations are derived from this framework. The first stresses the combination of efficient coding, indexing, and retrieval necessary to make content-based access viable over large-scale distributed and unconstrained environments such as the Internet. The resulting interfaces rely mostly on queries that are visual (based in images or objects) in nature. The second explores the structured nature of specific content domains to support interaction at a semantic level, leading to more meaningful characterization and summarization of the video and appealing procedures for classification, browsing, and retrieval. In both cases, the analysis relies heavily on probabilistic modeling through procedures such as the EM algorithm, and Bayesian belief propagation is used to construct interfaces whose behavior adapts according to the specifications of the user. The procedures presented here are generic and applicable to a wide variety of problems involving human-machine interaction.



Luis Gravano
gravano@cs.columbia.edu