Build a knncat classifier, which is used for nearest-neighbor classification with categorical variables; continuous are permitted too.

1 2 3 4 |

`train` |
data frame of training data, with the correct classification in the classcol column |

`test` |
data frame of test data (can be omitted). This should have the correct classification in the classcol column, too. |

`k` |
vector of choices for number of nn's. Default c(1, 3, 5, 7, 9). |

`xvals` |
number of cross-validations to use to find the best model size and number of nn's. Default 10. |

`xval.ceil` |
Maximum number of variables to add. -1 = Use the smallest number from any xval; 0 = use the smallest number from the first xval; >= 0, use that. |

`knots` |
vector of number of knots for numeric variables. Reused if necessary. Default: 10 for each. |

`prior.ind` |
Integer telling how to compute priors. 1 = estimated from training set; 2 = all equal; 3 = supplied in "prior"; 4 = ignored. Default: 4. |

`prior` |
Numeric vector, one entry per unique element in the training set's classcol column, giving prior probabilities. Ignored unless prior.ind = 3; then they're normalized to sum to 1 and each entry must be strictly > 0. |

`permute` |
Number of permutations for variable selection. Default: 10. |

`permute.tail` |
A variable fails the permutation test if permute.tail or more permutations do better than the original. Default: 1. |

`improvement` |
Minimum improvement for variable selection. Ignored unless present and permute missing, or permute = 0; then default = .01. |

`ridge` |
Amount by which to "ridge" the W matrix for numerical stability. Default: .003. |

`once.out.always.out` |
if TRUE, a variable that fails a permutation test or doesn't improve by enough is excluded from further consideration during that cross-validation run. Default FALSE. |

`classcol` |
Column with classification in it. Default: 1. |

`verbose` |
Controls level of diagnostic output. Higher numbers produce more output, sometimes 'way too much. 0 produces no output; 1 gives progress report for xvals. Default: 1. |

A knncat classifier converts categorical labels into real numbers (phi) so as to produce a good k-nearest neighbor classifier. Continuous variables are handled by means of knots, in a manner similar to the linear spline representation. Variable selection is done by a permutation test, or by setting an "improvement" cutoff; error rate estimation is done by cross-validation. After the cross-validations are done, we choose the best value of k from among those proposed and the "best" number of variables, then make one more pass through all the data to estimate the phis.

A list of S3 class knncat, containing the following entries:

`cdata` |
A vector with one entry for each of the columns of train, except the classification column, with value 1 if that column was used in the final classifier, and 0 otherwise. |

`phi` |
A list with the phi's. Each element of the list has, as its name, the name of a column of train; the values of the element are the phi's, and the names of that element are the levels of the variable. For numeric variables, these names are "knot.1", "knot.2" etc. |

`k` |
The vector of k's to be tried, as passed in. |

`best.k` |
The best k selected. |

`misclass.mat` |
A matrix, number of classes * number of classes, whose columns give the correct classifications and rows, the estimates. |

`prior.ind` |
Method used to compute the prior, as passed in. |

`prior` |
A numeric vector, one per class, giving the prior probabilties, as computed by the program according to prior.ind. |

`status` |
Return value from the program. 0 = no error. |

`misclass.type` |
Type of misclass.mat. "train" means misclass.rate came from the training set; "test," from the test set. |

`train` |
Name of training set at build time. |

`vars` |
Vector of names of columns actually used in model. |

`knots.vec` |
Vector of numbers of knots, as passed in. |

`build` |
Named vector holding five of the arguments used at build time: permute, improvement, ridge, once.out.always.out, and xvals |

`missing` |
Vector of values with which to replace missing values. These are the most common values for categorical variables, and the means for continuous ones. |

`knot.values` |
List of knot locations, one element for each continuous variable. |

Samuel E. Buttrey, [email protected]

Buttrey, S.E., Nearest-neighbor classification with categorical variables, Comp. Stat. Data Analysis 28 (1998), 157-169.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | ```
## Not run:
data ("synth.tr", package="MASS")
data ("synth.te", package="MASS")
syncat <- knncat (synth.tr, classcol=3)
syncat
Train set misclass rate: 12.8
synpred <- predict (syncat, synth.tr, synth.te, train.classcol=3,
newdata.classcol=3)
table (synpred, synth.te$yc)
synpred 0 1
0 460 91
1 40 409
#
# Or do the whole thing in one pass:
#
knncat (synth.tr, synth.te, classcol=3)
Test set misclass rate: 13.1
## End(Not run)
``` |

