There is a common folk theorem that says just about any machine learning algorithm one may use for some task will perform as well as any other. The differences lie in computational performance and the representations used in alternative algorithms when constructing the resultant models or classifiers. However, when choosing some learning program for some particular task it is not immediately apparent which one will perform the best unless extensive empirical tests are performed. To choose "the best", comparisons of various cross validation evaluations are common, but even here there is some controversy in the ML community on how best to compare algorithms. (Oh well....what's a data miner to do?)
There is an alternative strategy that we propose. Use all available learning algorithms to generate a number of classifiers (or alternatively, use one algorithm to generate several classifiers over different partitions of the training data), and treat their integration and combination as a learning task. This means, each classifier is used to scan and label a set of data it is trained to classify to generate a meta-level training set. The records of this new training set contain the predictions generated by each participating classifier that is to be combined in a final meta-classifier. If one then applies a learning algorithm to this training data, the resultant meta-classifier will have learned the biases and correlations of the underlying classifiers, and will have also learned how to classify data with higher accuracy than any of the underlying base classifiers alone.
This approach has been studied by my group extensively and several reports at various conferences are available that show conclusively that this approach has remarkable value. We first published this idea in the area of speech recognition (where multiple acoustic- and word-models could be easily mapped to the DADO machine.) Recent work by us and others show that there is much value to this approach, but much more work is needed to better understand how to apply this technique in realistic large scale problems. David Wolpert for example has an excellent paper on "Stacked Generalizations" where this technique is evaluated in the context of neural net classifiers.
Check out our most recent paper on the topic for more details. BACK