Description: Perform a kernel k-nearest neighbor classification.
Usage: kknn [options] -examples <filename> -class <filename>
Input:
- -examples <filename> - an RDB file of examples. The first column contains labels, and the remaining columns contain real-valued features.
- -class <filename> - a two-column RDB file of class labels. This file must contain exactly the same number of lines as the example data file. The first column contains labels, which must appear in the same order as in the examples file. The second column contains an integer-valued classification. A classification of '0' indicates an unclassified example, which will be classified by kknn, but which will not affect the classification of other examples.
Output: A four-column RDB file. The first two columns are identical to the classification file that was provided as input. Columns three and four contain the predicted classification and the corresponding discriminant value. The classification is the integer of greatest magnitude at the boundary of the interval of unit length which contains the discriminant. For example, a discriminant in [-1, 0) implies that the classification is -1. While, a discriminant in (0, 1] implies that the classification is 1. The closer the discriminant is to the classification, the greater the supporting evidence.
Options:
The following eight options modify the base kernel function. The operations occur in the order listed below.
- -verbose 1|2|3|4|5 - Set the verbosity level of the output to stderr. The default level is 2.
- -K - The number of nearest neighboring points used to classify a given point. By default, K=1 which is simply nearest neighbor. For K > 1, classes of the K nearest points are tallied and the majority class is the prediction. In the event of a 'tie' K is iteratively reduced until the tie is broken.
- -matrix - The '-examples' file contains a kernel matrix, rather than training set examples. The matrix is an n+1 by n+1 RDB matrix, where n is the number of examples. The first row and column contain data labels. The matrix entry for row x, column y, contains the kernel value K(x,y).
- -zeromean - Subtract from each element in the input data the mean of the elements in that row, giving the row a mean of zero.
- -adddiag <value> - Add the given value to the diagonal of the kernel matrix.
- -normalize - Normalize the kernel matrix by dividing K(x,y) by sqrt(K(x,x) * K(y,y)).
- -constant <value> - Add a given constant to the kernel. The default constant is 1.
- -coefficient <value> - Multiply the kernel by a given coefficient. The default coefficient is 1.
- -power <value> - Raise the kernel to a given power. The default power is 1.
- -radial - Convert the kernel to a radial basis function. If K is the base kernel, this option creates a kernel of the form exp[(-D(x,y)2)/(2 w2)], where w is the width of the kernel (see below) and D(x,y) is the distance between x and y, defined as D(x,y) = sqrt[K(x,x)2 - 2 K(x,y) + K(y,y)2].
- -widthfactor <value> - The width w of the radial basis kernel is set using a heuristic: it is the median of the distance from each positive training point to the nearest negative training point. This option specifies a multiplicative factor to be applied to that width.
- -noformatline - Usually, RDB formatted files contain column width information on the second line of the file. With this option, the program does not expect a format line in the input files and does not produce a format line in the output file.