kfcv-kfcv2: k-fold cross-validation

Description Usage Arguments Details Value References See Also

Description

Run k-fold cross validation to validate a classification model

Usage

1
2
3
kfcv(X, Y, method = c("lda", "plsda", "tree"), k = 5, threshold, ncomp)

kfcv2(X, Y, method = c("lda", "plsda", "tree"), k = 5, threshold, ncomp)

Arguments

X

matrix/ dataframe of predictors, e.g. EFA coefficients/ PC scores selected using selectdim

Y

vector giving the class, e.g. value obtained from getclass or sp value from routine1 object

method

method "lda" for linear discriminant analysis, "tree" for classification tree

k

fold number of cross-validation

threshold

optional. A numeric value between 0-1 to set the threshold of posterior probility. Any class prediction with posterior probility lower than this value will be NA-ed and not reported. See threcv

Details

Both version computes k-fold cross validation, however there are some differences in features:

kfcv gives the result of prediction on each specimen.

kfcv2 comes with the ability to calculate the by-class statistics (recall, precision and specificity). It also calculate confusion matrices for each folds.

What the stat values mean:

and, tp= true positive, tn=true negative, fp=false positive, fn=false negative Please refer to reference for detailed explanation.

Value

misclass

vector of k values of misclassification rate in percent resulted from each fold of testing

total

total number of prediction after excluding the ones lower than threshold, if threshold value is given

ind.prediction

[kfcv only] logical. prediction result on each specimen (every specimen will be used once in validation set in kfcv), with TRUE = correctly predicted, FALSE = wrongly predicted

stat

[kfcv2 only] k number of matrix containing the calculated precision, sensitivity(recall) and specificity for each class, for each fold. May contain NA values if the class is not present in the fold

conmat

[kfcv2 only] k number of confusion matrices, shown as proportion rather than counts. Proportion = number correctly or incorrectly predicted divided by the total number of that class in training set

References

Sokolova, M., & Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing & Management, 45(4), 427-437.

See Also

Function that wraps this function: mrkfcv, mrkfcv2


jinyung/otolith documentation built on May 19, 2019, 10:36 a.m.