mrkfcv-mrkfcv2: Multiple-run k-fold cross-validation
In jinyung/otolith: Identification of fish otolith using digital images

Description Usage Arguments Details Value References See Also

run multiple runs of k-fold cross validation, see referece. "Use all data" variant is implemented here.

mrkfcv(X, Y, method = c("lda", "tree", "plsda"), k = 5, run = 100,
  threshold, ncomp, suppress = c(FALSE, TRUE, "text"))
mrkfcv2(X, Y, method = c("lda", "tree", "plsda"), k = 5, run = 100,
  threshold, ncomp, suppress = FALSE)

`X`	matrix/ dataframe of predictors, e.g. EFA coefficients/ PC scores selected using `selectdim`
`Y`	vector giving the class, e.g. value obtained from `getclass` or `sp` value from `routine1` object
`method`	method `"lda"` for linear discriminant analysis, `"tree"` for classification tree, `"plsda"` for partial least square-linear discriminant analysis
`k`	fold number of cross-validation
`run`	number of run to be used in multiple runs of k-fold cross-validation
`threshold`	optional. A numeric value between 0-1 to set the threshold of posterior probility. Any class prediction with posterior probility lower than this value will be `NA`-ed and not reported. See `threcv`
`suppress`	suppress the running status in R console when `TRUE`. [for `mrkfcv`] the option `"text"` is used for `pccv/harcv` wrappers

mrkfcv is a wrapper for kfcv while mrkfcv2 is a wrapper for kfcv2, both of which iterate them for a number of times specified by run argument.

For mrkfcv, the ind.prediction value is the precentage calculated from run number of results. This calculation is useful when user wish to find out problematic specimen(s) during building classification model.

For mrkfcv2, the calculated by-class statistics (stat.sum) are average of all values of number of k x run of submodels (NA values are excluded).

`accuracy`	cross-validated accuracy for the tested classifier, resulted from the average of `k x run` numbers of accuracy generated by the function
`accu.sd`	standard deviation for the accuracy, calculated from the `k x run` number of results
`total`	mean total successful prediction in percent, not returned if `threshold` value is not given
`total.sd`	sd of total successful prediction in percent, not returned if `threshold` value is not given
`misclass`	[`mrkfcv` only] vector of `run x k` number of misclassification rate
`ind.prediction`	[`mrkfcv` only] vector of precentage of correctly predicted specimens
`stat.sum`	[`mrkfcv2` only] cross-validated by-class precision, recall and specificity
`conmat`	[`mrkfcv2` only] confusion matrix shown in proportion, average across all confusion matrices of `k x run` number of submodels. Proportion = number correctly or incorrectly predicted divided by the total number of that class in training set.

Bouckaert, R.R., (2003). Choosing between two learning algorithms based on calibrated tests. In: Fawcett, T., Mishra, N. (Eds.), Proceedings of the Twentieth International Conference (ICML 2003) on Machine Learning. August 21-24, 2003, AAAI Press, Washington.

Beleites, C., Baumgartner, R., Bowman, C., Somorjai, R., Steiner, G., Salzer, R., & Sowa, M. G. (2005). Variance reduction in estimating classification error using sparse datasets. Chemometrics and Intelligent Laboratory Systems, 79(1), 91-100.

Which this function wraps: kfcv, kfcv2

jinyung/otolith documentation built on May 19, 2019, 10:36 a.m.