next up previous
Next: Results and Discussions Up: Problem Formulation Previous: Problem Formulation


Affine Homography

We first look at the case when the transformation between two views is affine. The third row of the matrix $M$ in Equation 1, has the special form $[0\,\, 0 \,\,1]$ for affine transformations. We can write Equation 1 in this case as

\begin{displaymath}{\bf x}^l[i] = {\bf A} {\bf x}^0[i] + {\bf b} \end{displaymath}

where A is a $2\!\times\!2$ matrix, ${\bf b}$ is a translation vector and $l$ is the view index, with $l = 0$ being considered to be the reference view. We can discard the effect of vector ${\bf b}$ by discarding the DC component - the Fourier coefficient corresponding to $k = 0$ (or by shifting the origin to the centroid of the shape). The affine transformation can now be written as
$\displaystyle {\bf x}^l[i]$ $\textstyle =$ $\displaystyle {\bf A} {\bf x}^0[i] \mbox{ in the spatial
domain, and}$  
$\displaystyle {\bf\bar{X}}^l[k]$ $\textstyle =$ $\displaystyle {\bf A} {\bf\bar{X}}^0[k] \mbox{ in
the Fourier domain}$ (3)

without any loss in generality. In general, correspondence between images, i.e., information as to which image points in different views are projections of the same 3D point, is not available. This implies that when the boundary is seen in two views, the description may not start from the same point. In other words, there is an unknown shift between the sequences of boundary points in different views. This shift in the spatial domain translates into a rotation in the Fourier domain. Equation 3 can now be written as
\begin{displaymath}
{\bf\bar{X}}^l[k] = {\bf A} {\bf\bar{X}}^0[k]\,
e^{\omega_k\lambda_l}
\end{displaymath} (4)

where, $\lambda_l$ is the unknown shift in view $l$, $N$ is the number of points on the boundary of the shape, and ${\omega_k = -j2\pi k /N.}$ Let us define a measure called the cross-conjugate product (CCP) on the Fourier representations of two views as
$\displaystyle \psi(0, l)$ $\textstyle =$ $\displaystyle ({\bf\bar{X}}^0[k])^{*T}{\bf\bar{X}}^l[k]$  
  $\textstyle =$ $\displaystyle ({\bf\bar{X}}^0[k])^{*T} {\bf A} {\bf\bar{X}}^0[k]\, e^{\omega_k\lambda_l}.$ (5)

The matrix ${\bf A}$ can be expressed as a sum of a symmetric matrix and a skew symmetric matrix as ${\bf A} =
{\bf A}_s + {\bf A}_{sk}$ where ${\bf A}_s = \frac{1}{2}
({\bf A} + {\bf A}^T)$ and ${\bf A}_{sk} = \frac{1}{2} ({\bf
A} - {\bf A}^T)$. The skew symmetric matrix reduces to

\begin{displaymath}c \left[ \begin{array}{cr} 0 & 1 \\ -1\;\; & 0 \end{array} \right], \end{displaymath}

where $c = a_{12} - a_{21}$ is the difference of the off-diagonal elements of ${\bf A}$. We now have
$\displaystyle \psi(0, l)$ $\textstyle =$ $\displaystyle \gamma_1 + \gamma_2$  
  $\textstyle =$ $\displaystyle {\bf\bar{X}}^0[k]^{*T}\left( {\bf A}_s +
{\bf A}_{sk} \right)
{\bf\bar{X}}^0[k]\;e^{\omega_k\lambda_l} .$ (6)

The term ${\bf\bar{X}}^0[k]^{*T} {\bf A}_s {\bf\bar{X}}^0[k]$ of the above equation is purely real and the term ${\bf
\bar{X}}^0[k]^{*T} c\left[ \begin{array}{cr} 0 & 1 \\ -1\;\; & 0
\end{array} \right] {\bf\bar{X}}^0[k]$ is purely imaginary. The phases of $\gamma_1$ and $\gamma_2$ depend only on the shift $\lambda_l$. Thus, $\lambda_l$ can be recovered from the inverse Fourier transform of $\gamma_1$ or $\gamma_2$, if known. However, we can only compute $\psi(0, l)$, a combination of $\gamma_1$ and $\gamma_2$, which is not directly useful to recover the shift. We observe that the effect of the transformation matrix $A$ on $\gamma_2$ is restricted to a scaling by a factor $c$. We can define a new measure $\kappa $, ignoring scale, for the sequence ${\bf\bar{X}}^l$ as
\begin{displaymath}
\kappa(l) = {\bf\bar{X}}^l[k]^{*T} \left[ \begin{array}{cr} 0 & 1 \\
-1\;\; & 0 \end{array} \right] {\bf\bar{X}}^l[k].
\end{displaymath} (7)

It can be shown that
$\displaystyle \kappa (l)$ $\textstyle =$ $\displaystyle ({\bf\bar{X}}^l[k])^{*T}
\left[ \begin{array}{cl} 0 & 1 \\  -1\;\; & 0 \end{array} \right] {\bf\bar{X}}^l[k]$  
  $\textstyle =$ $\displaystyle ({\bf A \bar{X}}^0[k]\, e^{\omega_k\lambda_l})^{*T}
\left[ \begin...
...& 1 \\  -1\;\; & 0 \end{array}\right] {\bf A \bar{X}}^0[k]e^{\omega_k\lambda_l}$  
  $\textstyle =$ $\displaystyle ({\bf\bar{X}}^0[k])^{*T} \;{\bf A}^T
\left[ \begin{array}{cl} 0 & 1 \\  -1\;\; & 0 \end{array}\right] {\bf A}\;{\bf\bar{X}}^0[k]$  
  $\textstyle =$ $\displaystyle \vert{\bf A}\vert\; \kappa(0)$ (8)

Equation 8 gives a necessary condition for the sequences ${\bf\bar{X}}^l$ and ${\bf\bar{X}}^0$ to be two different views of the same planar shape, or in other words, the values of the measure $\kappa(\cdot)$ in the two views should be scaled versions of each other. This extends to multiple views also. We can express it differently in multiple views. Consider the $M\!\times\!(N-1)$ matrix formed by the coefficients of the $\kappa(\cdot)$ measures for M different views.

\begin{displaymath}\Theta =
\left[ \begin{array}{ccc}
\kappa(0)[1] & \cdots ...
...ppa(M-1)[1] & \cdots & \kappa(M-1)[N-1]
\end{array} \right]
\end{displaymath}

The necessary condition for matching of the planar shape in $M$ views then reduces to
\begin{displaymath}
\mbox{rank}(\Theta) = 1.
\end{displaymath} (9)

It should be noted that this recognition condition does not require correspondence between views and is valid for any number of views. Can we also estimate the shift $\lambda_l$ that would align corresponding points in two views? The answer is yes, using a measure $\kappa _{mod}$ for a fixed $p$ given by
\begin{displaymath}
\kappa_{mod}(l, p) = (X^l[k])^{*T} \left[ \begin{array}{cl} 0 &
1 \\ -1\;\; & 0 \end{array} \right] X^l[p]
\end{displaymath} (10)

$\kappa _{mod}$ correlates each Fourier descriptor coefficient with a fixed one within each view. Following a chain of reasoning similar to the one above, we can show that
$\displaystyle \kappa_{mod} (l, p)$ $\textstyle =$ $\displaystyle \vert A\vert\; \kappa_{mod}(0, p)\;
e^{j2\pi\lambda_l (k - p) / N}.$ (11)

Equation 11 states that the phases of $\kappa_{mod}(l, p)$ and $\kappa_{mod}(0, p)$ differ by a value proportional to the shift $\lambda_l$ and the differential frequency $k - p$. Therefore, the ratio $\frac{\kappa_{mod}(l,
p)}{\kappa_{mod}(0,p)}$ will be a complex sinusoid. The value of $\lambda_l$ can be computed from the inverse Fourier transform of the quotient series. Thus, we can compute the correspondence between points on the shape boundary across views by determining $\lambda_l$ as above, starting with no prior knowledge. The measure $\kappa _{mod}$ can also be adopted for the purpose of recognition. However, this approach would not be discussed in this paper for want of space. The general projective homography relating two different views of the same planar shape can be reasonably approximated by an affine homography. This approximation seems to be a practical one as most real life configurations of imaging a scene from multiple view points, possess structure that are very close that of affine homographies. This assumption is also validated by the results for general homographies, which are presented in the next section.
next up previous
Next: Results and Discussions Up: Problem Formulation Previous: Problem Formulation
2002-10-09