Computer Vision Talks at Columbia University

Movies to Geometric 3D Models:
The Structure--from--Motion Problem

John Oliensis

NEC Research

Wednesday, Oct/24, 11AM

Interschool Lab, 7th floor, CEPSR

Host: Prof. Shree Nayar

Abstract

I describe some of my recent results on the Structure-from-Motion problem (SFM). Given a sequence of photographic images of a fixed 3D scene, taken by a camera at several unknown positions and orientations, the problem is to recover 1) a 3D geometric model of the scene (structure), 2) the camera's position and orientation for each image (motion).

One seeks estimates that optimally explain the image data: thus, SFM is an optimization problem. Formally, the goal is to find the estimate of the scene and motion minimizing the ``error'' between the data predicted by the estimate and the actual image data. To understand the SFM problem - and to ensure that algorithms avoid false reconstructions - one must understand the shape of the "error surface," i.e., how the error depends on the estimate.

My recent results include:
* For sequences of two images, a simple, EXACT expression for the error that depends only on the camera motion. This gives a fast optimal algorithm, since one can estimate the motion by minimizing over the motion alone, avoiding a time-consuming minimization over the many unknowns needed to describe the scene. Also, I present a solution to the stereo or triangulation problem: a simple, EXACT expression for the optimal estimate of the structure given known camera motion. I also demonstrate a new ambiguity in recovering the structure by triangulation.

* An analytic model of the error surface, giving a fairly complete understanding of the SFM problem. The model applies to planar and nonplanar scenes, which is crucial since most 3D scenes are in effect nearly planar. Using this model, one can show that the error surface has no false local minima under some conditions.

* Multi-image algorithms that compute directly from the photographic image data, without needing to iterate from an initial guess at the unknowns as in previous approaches. If available, this approach can also and simultaneously use data in the form of 3D points or lines pre-tracked over the sequence, or measurements of the affine deformations of image patches over time. It is designed for sequences where the camera makes small movements, e.g., hand--held video sequences. It is simple to implement and gives results superior to those of the Sturm/Triggs algorithm.