Computer Vision Talks at Columbia University

 

Spatio-Temporal Analysis and Manipulation  of Visual Information

Michal Irani
Dept. of Computer Science and Applied Math 
The Weizmann Institute of Science, ISRAEL 
 
 
1:00 pm, Monday April 2nd , 2001 
Inter School Lab, 7th floor, Schapiro Building, Computer Science.

Host: Shree K. Nayar

 
 
 

Abstract

Video provides a continuous visual window into the space-time world. It captures the evolution of dynamic scenes in space and time. This makes video much more than just a collection of images of a scene taken from different view points. In this talk I will show that by treating video as a space-time data volume, one can perform tasks that are very difficult (and often impossible) to perform when only "slices'' of this information, such as image frames, are used. In particular, I will demonstrate the power of this approach by two example problems:

(i) I will show how by utilizing all available spatio-temporal information within the video sequences,  one can align and integrate information across multiple video sequences both in time and in space. By combining the spatial and dynamic visual scene information within a single alignment framework, situations which are inherently ambiguous for traditional image-to-image alignment methods are uniquely resolved by sequence-to-sequence alignment. Moreover, coherent dynamic information can sometimes be used for aligning video sequences even in extreme cases when there is  no common spatial information  across these sequences (e.g., when there is no spatial overlap between the camera fields of views). I will demonstrate applications of this approach to three real-world problems:  (i) Alignment of non-overlapping sequences for generating wide-screen movies, (ii) Multi-sensor image alignment for multi-sensor fusion, and (iii) Alignment of images (sequences) obtained at significantly different zooms (e.g., 1:10), for surveillance applications. 

(ii)  I will show how extended spatio-temporal scene  representations can be very efficiently used to view, browse, index into, edit and enhance the video data.  In raw video data the spatio-temporal scene information is implicitly and redundantly distributed across many video frames. This makes access and manipulation of video data very difficult. However, by analyzing the redundancy of visual information within the space-time data volume, the distributed scene information can be integrated into coherent and compact scene-based visual representations.  These lead to very efficient methods for access and manipulation of visual information in video data.