Next: Updating the Experts Up: CEM - A Maximum Previous: Continuous Hidden Variables CEM

CEM for Gaussian Mixture Models

**Figure 7.2:** Mixture of 3 Gaussians
$\begin{figure}\center \begin{tabular}[b]{c} \epsfxsize=2.4in \epsfbox{plotclust.ps} \end{tabular} \end{figure}$

Consider the Gaussian distribution in the mixture model in Figure 7.2. For the joint density case, Equation 7.13 depicts the mixture model. Here we are using the ${\cal N}$ definition to represent a multivariate normal (i.e. Gaussian) distribution. We can also consider an unnormalized Gaussian distribution ${\hat {\cal N}}$ shown in Equation 7.14. Equation 7.15 depicts the conditioned mixture model which is of particular interest for our estimation [63]. In Equation 7.15 we also write the conditioned mixture of Gaussians in an experts and gates notation and utilize unnormalized Gaussian gates.

$\displaystyle \begin{array}{lll} p({\bf x},{\bf y} \vert \Theta) & = & \sum_{m=... ...} \left [ \stackrel{{\bf x} - \mu_x^m}{{\bf y} - \mu_y^m} \right ]} \end{array}$

(7.13)

$\displaystyle \begin{array}{lll} {\hat {\cal N}}({\bf x};\mu_x,\Sigma_{xx}) & :... ... -\frac{1}{2} ({\bf x} - \mu_x)^T \Sigma_{xx}^{-1} ({\bf x}-\mu_x)} \end{array}$

(7.14)

$\displaystyle \begin{array}{lll} p({\bf y}\vert{\bf x} , \Theta) & = & \frac{ \... ...m_{n=1}^M \alpha_n {\hat {\cal N}} ({\bf x};\mu_x^n,\Sigma_{xx}^n)} \end{array}$

(7.15)

The $Q(\Theta^t,\Theta^{(t-1)})$ function thus evolves into Equation 7.16. Note the use of a different parametrization for the experts: $\nu$ is the conditional mean, $\Gamma$ is a regressor matrix and $\Omega$ is the conditional covariance. We immediately note that the experts and gates can be separated and treated independently since they are parametrized by independent variables ( $\nu^m,\Gamma^m,\Omega^m$ versus $\alpha_m,\mu_x^m,\Sigma_{xx}^m$ ). Both the gates and the experts can be varied freely and have no variables in common. In fact, we shall optimize these independently to maximize conditional likelihood. An iteration is performed over the experts and then an iteration over the gates. If each of those manipulations causes an increase, we will converge to a local maximum of conditional log-likelihood. This is similar in spirit to the ECM (Expectation Conditional Maximization) algorithm proposed in [41] since some variables are held constant while others are maximized and then vice-versa.

$\displaystyle \begin{array}{lll} Q(\Theta^t,\Theta^{(t-1)}) & = & \sum_{i=1}^N ... ...ha_m {\hat {\cal N}} ({\bf x}_i;\mu_x^m,\Sigma_{xx}^m) \right ) + 1 \end{array}$

(7.16)

Next: Updating the Experts Up: CEM - A Maximum Previous: Continuous Hidden Variables CEM

Tony Jebara
1999-09-15