Next: Concluding Remarks Up: 3D Structure from 2D Previous: Feeding Forward SfM to

Practical Experience in Commercial Post-Production Applications

Years of experience and comparative testing in the film and video post-production industry by the authors and software developers has demonstrated the effectiveness and importance of our computational foundation of nonlinear and probabilistic modeling for 3D computer vision over competing approaches. In the film and video production industry, software based on the principles of the above SfM technique has been used in a commercial setting by Alchemy 3D Technology's MatchMaker(TM) and Alias|Wavefront's MayaLive(TM) software products [43] and has recently been chosen to contribute to the feature-based vision subsystem of SynaPix's SynaFlex(TM) [51] system.

In this industry, there has been a proliferation of demand for computer-graphics based special effects. An emerging staple in producing such effects is the process of "3D compositing", in which 2D-source imagery (i.e. film, video, or digital image sequences) is combined with 3D-source imagery (i.e. 3D computer graphics) in a realistic and metrically accurate fashion by first recovering an accurate 3D representation of the 2D-source imagery using computer vision techniques. The computer vision component is known as "3D matchmoving" and results in a 3D representation of camera motion, scene geometry, and camera imaging geometry.

Product developers have sought for years to develop reliable vision front-ends to facilitate this growing need and have considered all available published work in the field as candidate technology, including linear algebraic techniques, photogrammetric techniques, and optical-flow-based techniques. The selection of software based on our technology as the basis of several major matchmoving systems is testimony to the practical importance of the theoretical foundation as borne out in results of objective testing in the field against competing approaches. In particular, software based on our technology has exhibited substantially greater efficiency, reliability, accuracy, flexibility, and extensibility. There is sound theoretical grounding for these observations.

Efficiency arises from the ability to combine probabilistic representations of information recursively. In typical cinematic sequences of 200-300 frames, software based on our techniques usually obtain complete solutions in 00:00:30 (30 seconds) to 00:08:00 (8 minutes), whereas comparable solutions with photogrammetric or other nonlinear approaches on the same sequences typically require many hours, often overnight processing.

Reliability arises from the stability associated with probabilistic rather than rigid linear algebraic modeling of spatial and dynamic processes. In the presence of noisy input, our techniques have proven to be resilient where competing methods produce nonsensical output. Long "dolly" shots (primarily translation along z-axis) in particular have proven difficult for most solution techniques, but our techniques routinely acquire accurate solutions, including in extreme conditions, e.g. a 1550-frame helicopter shot with over 70 features and large turnover of features [4].

Accuracy arises from nonlinear 3D scene-based modeling. Linear algebraic techniques are fragile in the presence of real-world data and modeling imperfections and often do not even produce a useful 3D Euclidean output. For post-production applications which depend upon useful 3D (Euclidean) output to match 3D CG representations, there is no substitute for Euclidean modeling. Optical flow methods produce dense depth maps, but since they are view-based and based on pairs of closely spaced images, there is no easy or sufficiently general way of producing a consistent and accurate scene-based 3D description and there is no general way of controlling scaling and drift.

Flexibility arises from probabilistic modeling. Since probabilistic modeling facilitates accumulation and propagation of information, such modeling allows efficient solution of otherwise difficult sequences. Among these are sequences in which features disappear and reappear and those in which there is insufficient visual information throughout all frames. Competing systems have found it particularly difficult to solve cinematic shots in which features appear and disappear due to foreground occlusions caused by, e.g., actors and vehicles, those in which only a camera pan (pure rotation) is present, those in which almost the entire feature set changes from the start to the end, and those in which large segments of the sequence are completely unusable (e.g. due to practical effects such as steam, explosions, or blinding light). Software based on our techniques routinely solve these types of cinematic shots because probabilistic modeling can be used to account for missing information.

Finally, extensibility arises from probabilistic modeling. Many shots encountered in cinematic post-production do not have ideal camera motions for 3D recovery purely from 2D visual motion. In these cases, additional information about scene structure is necessary to obtain complete solutions. The information can come in many forms and must be integrated in some consistent fashion. Probabilistic modeling has long been used as the foundation for integrating information from qualitatively different sources, and this application is no exception. Since the visual process is already modeled probabilistically, the integration of, e.g., scene-based measurements with visual feature measurements has been able to take place quite naturally and allow shots to be solved that techniques based purely on visual relationships could not possibly have solved completely.

In short, the theoretical foundation of nonlinear and probabilistic modeling for 3D computer vision has borne itself out in at least one industry with an important application using these techniques. The objective nature of the arena in which the technology has competed and is now enjoying growing preference lends credence to the fundamental practical advantages of the formulation. From an evolutionary standpoint, the observed flexibility and extensibility in particular offer the greatest indication that the technology can find an important place both in further software applications and in a larger framework for perceptual information processing.

Next: Concluding Remarks Up: 3D Structure from 2D Previous: Feeding Forward SfM to

1999-05-17