Standardized Database Performance

To test the CEM algorithm in a typical application, we utilized a standardized regression database from the UCI Machine Learning Repository (Information and Computer Science Department) at the University of California, Irvine. The data set used was the Abalone Data courtesy of the Marine Research Laboratories - Taroona.

The required task is to predict the age of abalone from physical measurements. Scientists determine the age of abalone by a pain-staking process which requires cutting through the shell and the cone, staining and counting the number of rings with a microscope. For each abalone, the dataset contains the age and 8 easily measured physical attributes. These are: sex, length, diameter, height, whole height, shucked weight, viscera weight and shell weight. The database is claimed to be not linearly separable and data set samples are highly overlapped.

**Figure 7.14:** Conditional Log-Likelihood on Training Data
$\begin{figure}\center \begin{tabular}[b]{c} \epsfxsize=1.8in \epsfbox{iter100.ps} \end{tabular} \end{figure}$

The data set is split into 3133 training examples and 1044 test examples. We compare the results of the CEM algorithm (Mixture of Gaussians, annealed at 0.4) when run exclusively on the training data and then tested once on the test data. Figure 7.14 displays the conditional log-likelihood for the 100 Gaussian training. Results for age prediction (regression) are shown in Table 7.2. Also, the age was also split arbitrarily into 3 classes to simulate a classification task: Young, Middle-Aged and Old. The CEM algorithm was not retrained for this task and only tested on the classification performance using its numerically regressed estimates of age. These results are shown in comparison with other methods [68] [13] as well in the table. Approximate training times for the CEM algorithm examples are shown as well and compare favorably with other techniques. Each of the 3 CEM algorithm examples was run only once on the test data. In addition, no cross-validation maximization was made or complexity optimization (under fitting and over fitting) so the results for the CEM algorithm would probably be significantly better with such standard techniques. Despite this, the CEM algorithm (for the complex 100 Gaussian case) outperformed all methods on the regression task (the task it was trained for) and fared favorably on the classification task.

Table 7.2: Testing Results

Algorithm	3 - Class Accuracy	Regression Accuracy	Training Time
Cascade-Correlation (no hidden nodes)	61.40 %	24.86 %
Cascade-Correlation (5 hidden nodes)	65.61 %	26.25 %
C4.5	59.2 %	21.5 %
Linear Discriminant	32.57%	0.0 %
k=5 Nearest Neighbour	62.46 %	3.57 %
Backprop	64 %
Dystal	55 %
CEM 1 Gaussian	60.06 %	20.79 %	<1 minute
CEM 2 Gaussians	62.26 %	26.63 %	<1 minute
CEM 100 Gaussians	64.46 %	27.39 %	<20 minutes