Abstract
Video
provides a continuous visual window into the space-time world. It captures
the evolution of dynamic scenes in space and time. This makes video much
more than just a collection of images of a scene taken from different view
points. In this talk I will show that by treating video as a space-time
data volume, one can perform tasks that are very difficult (and often impossible)
to perform when only "slices'' of this information, such as image frames,
are used. In particular, I will demonstrate the power of this approach
by two example problems:
(i)
I will show how by utilizing all available spatio-temporal information
within the video sequences, one can align and integrate information
across multiple video sequences both in time and in space. By combining
the spatial and dynamic visual scene information within a single alignment
framework, situations which are inherently ambiguous for traditional image-to-image
alignment methods are uniquely resolved by sequence-to-sequence alignment.
Moreover, coherent dynamic information can sometimes be used for aligning
video sequences even in extreme cases when there is no common
spatial information across these sequences (e.g., when there
is no spatial overlap between the camera fields of views). I will demonstrate
applications of this approach to three real-world problems: (i) Alignment
of non-overlapping sequences for generating wide-screen movies, (ii) Multi-sensor
image alignment for multi-sensor fusion, and (iii) Alignment of images
(sequences) obtained at significantly different zooms (e.g., 1:10), for
surveillance applications.
(ii)
I will show how extended spatio-temporal scene representations can
be very efficiently used to view, browse, index into, edit and enhance
the video data. In raw video data the spatio-temporal scene information
is implicitly and redundantly distributed across many video frames. This
makes access and manipulation of video data very difficult. However, by
analyzing the redundancy of visual information within the space-time data
volume, the distributed scene information can be integrated into coherent
and compact scene-based visual representations. These lead to very
efficient methods for access and manipulation of visual information in
video data.