kfcv-kfcv2: k-fold cross-validation
In jinyung/otolith: Identification of fish otolith using digital images

Description Usage Arguments Details Value References See Also

Run k-fold cross validation to validate a classification model

1
2
3

kfcv(X, Y, method = c("lda", "plsda", "tree"), k = 5, threshold, ncomp)

kfcv2(X, Y, method = c("lda", "plsda", "tree"), k = 5, threshold, ncomp)

`X`	matrix/ dataframe of predictors, e.g. EFA coefficients/ PC scores selected using `selectdim`
`Y`	vector giving the class, e.g. value obtained from `getclass` or `sp` value from `routine1` object
`method`	method `"lda"` for linear discriminant analysis, `"tree"` for classification tree
`k`	fold number of cross-validation
`threshold`	optional. A numeric value between 0-1 to set the threshold of posterior probility. Any class prediction with posterior probility lower than this value will be `NA`-ed and not reported. See `threcv`

Both version computes k-fold cross validation, however there are some differences in features:

kfcv gives the result of prediction on each specimen.

kfcv2 comes with the ability to calculate the by-class statistics (recall, precision and specificity). It also calculate confusion matrices for each folds.

What the stat values mean:

Recall = Sensitivity = tp / (tp + fn)
Precision = tp / (tp + fp)
Specificity = tn / (tn + fp)

and, tp= true positive, tn=true negative, fp=false positive, fn=false negative Please refer to reference for detailed explanation.

`misclass`	vector of k values of misclassification rate in percent resulted from each fold of testing
`total`	total number of prediction after excluding the ones lower than threshold, if `threshold` value is given
`ind.prediction`	[`kfcv` only] logical. prediction result on each specimen (every specimen will be used once in validation set in kfcv), with `TRUE` = correctly predicted, `FALSE` = wrongly predicted
`stat`	[`kfcv2` only] `k` number of matrix containing the calculated precision, sensitivity(recall) and specificity for each class, for each fold. May contain `NA` values if the class is not present in the fold
`conmat`	[`kfcv2` only] `k` number of confusion matrices, shown as proportion rather than counts. Proportion = number correctly or incorrectly predicted divided by the total number of that class in training set

Sokolova, M., & Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing & Management, 45(4), 427-437.

Function that wraps this function: mrkfcv, mrkfcv2

jinyung/otolith documentation built on May 19, 2019, 10:36 a.m.