next up previous contents
Next: Squandered Resources: The Shoe Up: Learning: Conditional vs. Joint Previous: Machine Learning: Practical Considerations

Joint versus Conditional Densities - Pros and Cons

Recently, Hinton [23] and others proposed strong arguments for using model-based approaches in classification problems. However, in certain situations, the advantages of model-based approaches dim in comparison with performance of discriminative models optimized for a given task. The following list summarizes some advantages of generative models and joint density estimation for the purposes of both classification and regression problems.

It is important to note that the last two advantages occur infrequently in many applications and is the expected situation for the ARL framework. Typically, the system is called upon to perform the task it was trained for. Thus, the benefits of its superior performance over occasional missing data and poor data might rarely be noticed. In fact, on most standardized databases, the performance on the desired task will often be orders of magnitude more critical due to the infrequency of missing data or corrupt data.

The second and third advantages involve computational efficiency since discriminative or conditional models need to observe all data to be optimally trained. However, the need to observe all the data is not a disadvantage but truly an advantage. It allows a discriminative model to better learn the interactions between classes and their relative distributions for discrimination. Thus, as long as the discriminative model is not too computationally intensive and the volume of data is tractable, training on all the data is not a problem.

The most critical motivation for generative models in regression problems is actually the first advantage: the availability of superior inference algorithms (such as EM [15]). Typically, the training process for discriminative models (i.e. conditional densities) is cumbersome (i.e. neural network backpropagation and gradient ascent) and somewhat adhoc, requiring many re-initializations to converge to a good solution. However, tried and true algorithms for generative models (joint density estimation) avoid this and consistenly yield good joint models.

In fact, nothing prevents us from using both a generative model and a discriminative model. Whenever the regular task is required and data is complete and not corrupt, one uses a superior discriminative model. Whenever missing data is observed, a joint model can be used to `fill it in' for the discriminative model. In addition, whenever corrupt data is possible, a marginal model should be used to filter it (which is better than a joint and a conditional model for this task). However, the bulk of the work the learning system will end up doing in this case will be performed by the conditional model.

We now outline specific advantages of conditional models and discuss our approach to correct one of their major disadvantages: poor inference algorithms.



 
next up previous contents
Next: Squandered Resources: The Shoe Up: Learning: Conditional vs. Joint Previous: Machine Learning: Practical Considerations
Tony Jebara
1999-09-15