3D Motion Model - Translation

Next: 3D Motion Model - Up: Representation Previous: 3D Structure Model

3D Motion Model - Translation

The translational motion is represented as the 3-D location of the object reference frame relative to the current camera reference frame using the vector

$\begin{displaymath}{\bf t} = \left(t_X,t_Y,t_Z\right) \end{displaymath}$

The t_X and t_Y components correspond to directions parallel to the image plane, while the t_Z component corresponds to the depth of the object along the optical axis. As such, the sensitivity of image plane motion to t_X and t_Y motion will be similar to each other, while the sensitivity to t_Z motion will differ, to a level dependent upon the focal length of the imaging geometry.

For typical video camera focal lengths, even with ``wide angle'' lenses, there is already much less sensitivity to t_Z motion than there is to (t_X, t_Y) motion. For longer focal lengths the sensitivity decreases until in the limiting orthographic case there is zero image plane sensitivity to t_Z motion.

For this reason, t_Z cannot be represented explicitly in our estimation process. Instead, the product $t_Z\beta$ is estimated. The coordinate frame transformation equation

$\begin{displaymath} \left(\begin{array}{c} X_C\\ Y_C\\ Z_C\beta \end{array}\righ... ...t) {\bf R} \left(\begin{array}{c} X\\ Y\\ Z \end{array}\right) \end{displaymath}$

(15)

combined with Equation 12 demonstrates that only $t_Z\beta$ is actually required to generate an equation for the image plane measurements (u,v) as a function of the motion, structure, and camera parameters (rotation ${\bf R}$ is discussed below).

Furthermore, the sensitivity of $t_Z\beta$ does not degenerate at long focal lengths as does t_Z. For example, the sensitivities of the u image coordinate to both t_Z and $t_Z\beta$ are

$\begin{displaymath} \begin{tabular}{ccc} $\frac{\partial u}{\partial t_Z} = \fra... ...\partial(t_Z\beta)} = \frac{-X_C}{(1+Z_C\beta)^2}$\end{tabular}\end{displaymath}$

demonstrating that $t_Z\beta$ remains observable from the measurements and is therefore estimable for long focal lengths, while t_Z is not ( $\beta$ approaches zero for long focal lengths).

Thus we parameterize translation with the vector

$\begin{displaymath}(translation) = \left(t_X,t_Y,t_Z\beta\right) \end{displaymath}$

True translation ${\bf t}$ can be recovered post-estimation simply by dividing out the focal parameter from $t_Z\beta$ . This is valid only if $\beta$ is non-zero (non-orthographic), which is desirable, because t_Z is not geometrically recoverable in the orthographic case. To see this mathematically, the error variance on t_Z will be the error variance on $t_Z\beta$ scaled by $1/\beta^2$ , which gets large for narrow fields of view.

Next: 3D Motion Model - Up: Representation Previous: 3D Structure Model

1999-05-17