mrkfcv-mrkfcv2: Multiple-run k-fold cross-validation

Description Usage Arguments Details Value References See Also

Description

run multiple runs of k-fold cross validation, see referece. "Use all data" variant is implemented here.

Usage

1
2
3
4
mrkfcv(X, Y, method = c("lda", "tree", "plsda"), k = 5, run = 100,
  threshold, ncomp, suppress = c(FALSE, TRUE, "text"))
mrkfcv2(X, Y, method = c("lda", "tree", "plsda"), k = 5, run = 100,
  threshold, ncomp, suppress = FALSE)

Arguments

X

matrix/ dataframe of predictors, e.g. EFA coefficients/ PC scores selected using selectdim

Y

vector giving the class, e.g. value obtained from getclass or sp value from routine1 object

method

method "lda" for linear discriminant analysis, "tree" for classification tree, "plsda" for partial least square-linear discriminant analysis

k

fold number of cross-validation

run

number of run to be used in multiple runs of k-fold cross-validation

threshold

optional. A numeric value between 0-1 to set the threshold of posterior probility. Any class prediction with posterior probility lower than this value will be NA-ed and not reported. See threcv

suppress

suppress the running status in R console when TRUE. [for mrkfcv] the option "text" is used for pccv/harcv wrappers

Details

mrkfcv is a wrapper for kfcv while mrkfcv2 is a wrapper for kfcv2, both of which iterate them for a number of times specified by run argument.

For mrkfcv, the ind.prediction value is the precentage calculated from run number of results. This calculation is useful when user wish to find out problematic specimen(s) during building classification model.

For mrkfcv2, the calculated by-class statistics (stat.sum) are average of all values of number of k x run of submodels (NA values are excluded).

Value

accuracy

cross-validated accuracy for the tested classifier, resulted from the average of k x run numbers of accuracy generated by the function

accu.sd

standard deviation for the accuracy, calculated from the k x run number of results

total

mean total successful prediction in percent, not returned if threshold value is not given

total.sd

sd of total successful prediction in percent, not returned if threshold value is not given

misclass

[mrkfcv only] vector of run x k number of misclassification rate

ind.prediction

[mrkfcv only] vector of precentage of correctly predicted specimens

stat.sum

[mrkfcv2 only] cross-validated by-class precision, recall and specificity

conmat

[mrkfcv2 only] confusion matrix shown in proportion, average across all confusion matrices of k x run number of submodels. Proportion = number correctly or incorrectly predicted divided by the total number of that class in training set.

References

Bouckaert, R.R., (2003). Choosing between two learning algorithms based on calibrated tests. In: Fawcett, T., Mishra, N. (Eds.), Proceedings of the Twentieth International Conference (ICML 2003) on Machine Learning. August 21-24, 2003, AAAI Press, Washington.

Beleites, C., Baumgartner, R., Bowman, C., Somorjai, R., Steiner, G., Salzer, R., & Sowa, M. G. (2005). Variance reduction in estimating classification error using sparse datasets. Chemometrics and Intelligent Laboratory Systems, 79(1), 91-100.

See Also

Which this function wraps: kfcv, kfcv2


jinyung/otolith documentation built on May 19, 2019, 10:36 a.m.