next up previous
Next: 3D Motion Model - Up: Representation Previous: 3D Structure Model

3D Motion Model - Translation

The translational motion is represented as the 3-D location of the object reference frame relative to the current camera reference frame using the vector

\begin{displaymath}{\bf t} = \left(t_X,t_Y,t_Z\right) \end{displaymath}

The tX and tY components correspond to directions parallel to the image plane, while the tZ component corresponds to the depth of the object along the optical axis. As such, the sensitivity of image plane motion to tX and tY motion will be similar to each other, while the sensitivity to tZ motion will differ, to a level dependent upon the focal length of the imaging geometry.

For typical video camera focal lengths, even with ``wide angle'' lenses, there is already much less sensitivity to tZ motion than there is to (tX, tY) motion. For longer focal lengths the sensitivity decreases until in the limiting orthographic case there is zero image plane sensitivity to tZ motion.

For this reason, tZ cannot be represented explicitly in our estimation process. Instead, the product $t_Z\beta$ is estimated. The coordinate frame transformation equation

\left(\begin{array}{c} X_C\\ Y_C\\ Z_C\beta \end{array}\righ...
...t) {\bf R} \left(\begin{array}{c} X\\ Y\\ Z \end{array}\right)
\end{displaymath} (15)

combined with Equation 12 demonstrates that only $t_Z\beta$ is actually required to generate an equation for the image plane measurements (u,v) as a function of the motion, structure, and camera parameters (rotation ${\bf R}$ is discussed below).

Furthermore, the sensitivity of $t_Z\beta$ does not degenerate at long focal lengths as does tZ. For example, the sensitivities of the u image coordinate to both tZ and $t_Z\beta$ are

$\frac{\partial u}{\partial t_Z} = \fra...
...\partial(t_Z\beta)} = \frac{-X_C}{(1+Z_C\beta)^2}$\end{tabular}\end{displaymath}

demonstrating that $t_Z\beta$ remains observable from the measurements and is therefore estimable for long focal lengths, while tZ is not ($\beta$ approaches zero for long focal lengths).

Thus we parameterize translation with the vector

\begin{displaymath}(translation) = \left(t_X,t_Y,t_Z\beta\right) \end{displaymath}

True translation ${\bf t}$ can be recovered post-estimation simply by dividing out the focal parameter from $t_Z\beta$. This is valid only if $\beta$ is non-zero (non-orthographic), which is desirable, because tZ is not geometrically recoverable in the orthographic case. To see this mathematically, the error variance on tZ will be the error variance on $t_Z\beta$ scaled by $1/\beta^2$, which gets large for narrow fields of view.

next up previous
Next: 3D Motion Model - Up: Representation Previous: 3D Structure Model