We now briefly discuss the difference in constraints on a conditional
model versus a joint model. First and foremost, the conditional model
(in theory) provides no information about the density in the covariate
variables (). Thus, it does not allocate any resources in
modeling the
domain unless it indirectly helps model the
domain. Thus, the *M* Gaussians (i.e. the model's finite
resources) do not cluster around the density in unnecessarily.

In addition, note that there are no constraints on the gates to force them to integrate to 1. The mixing proportions are not necessarily normalized and the individual gate models are unnormalized Gaussians. Thus, the gates form an unnormalized marginal density called which need not integrate to 1. In joint models, on the other hand, the marginal must integrate to 1.

Finally, we note that the covariance matrix in the gates is
independent of the covariance matrix and regressor matrix in the
expert. In a full joint Gaussian, these 3 matrices combine into one
large matrix and this matrix as a whole must be symmetric and positive
semi-definite. However, here, the gate covariance need not be
positive semi-definite. The expert covariance is symmetric positive
semi-definite *only on its own* and the regressor matrix is
arbitrary. Thus, the constraints on the total parameters are fewer
than in the joint case and each gate-expert combination can model a
larger space than a conditioned Gaussian. Thus, training up a
conditional model directly will yield solutions that lie outside the
space of the conditioned joint models. This is depicted in
Figure 7.11. Note how the additional constraints on
the joint density limit the realizable conditional models. This limit
is not present if the conditional models can be varied in their final
parametric form.