Description Usage Arguments Details Value Author(s) References Examples
Build a knncat classifier, which is used for nearest-neighbor classification with categorical variables; continuous are permitted too.
1 2 3 4 |
train |
data frame of training data, with the correct classification in the classcol column |
test |
data frame of test data (can be omitted). This should have the correct classification in the classcol column, too. |
k |
vector of choices for number of nn's. Default c(1, 3, 5, 7, 9). |
xvals |
number of cross-validations to use to find the best model size and number of nn's. Default 10. |
xval.ceil |
Maximum number of variables to add. -1 = Use the smallest number from any xval; 0 = use the smallest number from the first xval; >= 0, use that. |
knots |
vector of number of knots for numeric variables. Reused if necessary. Default: 10 for each. |
prior.ind |
Integer telling how to compute priors. 1 = estimated from training set; 2 = all equal; 3 = supplied in "prior"; 4 = ignored. Default: 4. |
prior |
Numeric vector, one entry per unique element in the training set's classcol column, giving prior probabilities. Ignored unless prior.ind = 3; then they're normalized to sum to 1 and each entry must be strictly > 0. |
permute |
Number of permutations for variable selection. Default: 10. |
permute.tail |
A variable fails the permutation test if permute.tail or more permutations do better than the original. Default: 1. |
improvement |
Minimum improvement for variable selection. Ignored unless present and permute missing, or permute = 0; then default = .01. |
ridge |
Amount by which to "ridge" the W matrix for numerical stability. Default: .003. |
once.out.always.out |
if TRUE, a variable that fails a permutation test or doesn't improve by enough is excluded from further consideration during that cross-validation run. Default FALSE. |
classcol |
Column with classification in it. Default: 1. |
verbose |
Controls level of diagnostic output. Higher numbers produce more output, sometimes 'way too much. 0 produces no output; 1 gives progress report for xvals. Default: 1. |
A knncat classifier converts categorical labels into real numbers (phi) so as to produce a good k-nearest neighbor classifier. Continuous variables are handled by means of knots, in a manner similar to the linear spline representation. Variable selection is done by a permutation test, or by setting an "improvement" cutoff; error rate estimation is done by cross-validation. After the cross-validations are done, we choose the best value of k from among those proposed and the "best" number of variables, then make one more pass through all the data to estimate the phis.
A list of S3 class knncat, containing the following entries:
cdata |
A vector with one entry for each of the columns of train, except the classification column, with value 1 if that column was used in the final classifier, and 0 otherwise. |
phi |
A list with the phi's. Each element of the list has, as its name, the name of a column of train; the values of the element are the phi's, and the names of that element are the levels of the variable. For numeric variables, these names are "knot.1", "knot.2" etc. |
k |
The vector of k's to be tried, as passed in. |
best.k |
The best k selected. |
misclass.mat |
A matrix, number of classes * number of classes, whose columns give the correct classifications and rows, the estimates. |
prior.ind |
Method used to compute the prior, as passed in. |
prior |
A numeric vector, one per class, giving the prior probabilties, as computed by the program according to prior.ind. |
status |
Return value from the program. 0 = no error. |
misclass.type |
Type of misclass.mat. "train" means misclass.rate came from the training set; "test," from the test set. |
train |
Name of training set at build time. |
vars |
Vector of names of columns actually used in model. |
knots.vec |
Vector of numbers of knots, as passed in. |
build |
Named vector holding five of the arguments used at build time: permute, improvement, ridge, once.out.always.out, and xvals |
missing |
Vector of values with which to replace missing values. These are the most common values for categorical variables, and the means for continuous ones. |
knot.values |
List of knot locations, one element for each continuous variable. |
Samuel E. Buttrey, buttrey@nps.edu
Buttrey, S.E., Nearest-neighbor classification with categorical variables, Comp. Stat. Data Analysis 28 (1998), 157-169.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | ## Not run:
data ("synth.tr", package="MASS")
data ("synth.te", package="MASS")
syncat <- knncat (synth.tr, classcol=3)
syncat
Train set misclass rate: 12.8
synpred <- predict (syncat, synth.tr, synth.te, train.classcol=3,
newdata.classcol=3)
table (synpred, synth.te$yc)
synpred 0 1
0 460 91
1 40 409
#
# Or do the whole thing in one pass:
#
knncat (synth.tr, synth.te, classcol=3)
Test set misclass rate: 13.1
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.