Abstract
A central problem
in Vision is the reconstruction of 3D scenes from multiple 2D images of that
scene. One of the most
powerful visual cues useful in this process is the coherence of visual motion
over space of time. Since their
introduction about 10 years ago, layered motion models have been a powerful way
to describe and recover multiple coherent motions in image sequences. Although they have proved
promising for multiple motion analysis, their full use in scene reconstruction
remains an open problem. In this
talk, I will describe our research effort aimed towards obtaining layered
descriptions of a scene from multiple images. Our recent work has focused on three aspects of this
problem: The first is an approach
for modeling the appearance and geometry of rigid, static 3D scenes from multiple views of that
scene. We model the scene as a
collection of layered 2.5D sprites. Each sprite corresponds approximately to a
"cardboard-cutout" description of a portion of the scene together
with a "parallax" component, which describes the finer variation of
the shape of that region. A
semi-automatic technique is used for recovering the layered description of a
scene from a given set of input images. Our second effort is aimed at automatically initializing
the layer segmentation process using a statistical approach. A Bayesian formulation of the problem
is used to automatically determine the number of layers and an initial
segmentation of the scene into layers.
Our most recent effort focuses on decomposing multiple images of a scene
containing reflections and transparency into component layer images. All of the ideas will be illustrated
with real-image examples.