Description Usage Arguments Details Value Author(s) References See Also Examples
Conduct a cross-validation for a given classification/regression model and output the prediction results collected over the cross-validation loop. The cross-validation can be done in two ways: normal k-fold cross-validaiton (batch=NULL
), or batch-wise cross-validation (batch!=NULL
). The latter is particularly useful in the presence of significant intra-group heterogeneity.
1 2 3 4 |
data |
a data matrix, with samples saved in rows and features in columns. |
label |
a vector of response variables (i.e., group/concentration info), must be the same length as the number of samples. |
batch |
a vector of sample identifications (e.g., batch/patient ID), must be the same length as the number of samples. Ideally, this should be the identification of the samples at the highest hierarchy (e.g., the patient ID rather than the spectral ID). If missing, a normal k-fold cross validaiton will be performed (i.e., the data is split randomly into k folds). Ignored if |
method |
the name of the function to be performed on training data (can be any model-based procedures, like classification/regression or even pre-processings). A user-defined function is possible, see |
pred |
the name of the function to be performed on testing data (eg. new substances) based on the model built by |
classify |
a boolean value, |
folds |
a list of indices specifying the sample index to be used in each fold, can be the output of function |
nBatch |
an integer, the number of data folds in case of batch-wise cross-validaiton (if |
nFold |
an integer, the value of k in case of normal k-fold cross-validaiton. Ignored if |
verbose |
a boolean value, if or not to print out the logging info |
seed |
an integer, if given, will be used as the random seed to split the data in case of k-fold cross-validation. Ignored if |
... |
parameters to be passed to the |
The cross-validaiton will be conducted based on the data partitions folds
, each fold is predicted once using the model built on the rest folds. If folds
is missing, a data split will be done first (see more in dataSplit
).
The procedures to be performed within the cross-validation is given in the function method
, for example, fnPcaLda
. A user-defined function is possible, as long as the it follows the same structure as fnPcaLda
. A two-layer cross-validation (see reference) can be done by using a tuning function as method
, such as tunePcaLda
(see examples). In this case, the parameters of a classifier are optimized using the training data within tunePcaLda
and the optimal model is tested on the testing data. The parameters of pre-processing can be optimized in a similar way by involving the pre-processing steps into the function method
.
NOTE: It is recommended to specify the seed
for a normal k-fold cross-validation in order to get the same results from repeated runnings.
A list with elements
Fold |
a list, each giving the sample indices of a fold |
True |
a vector of characters, the groundtruth response variables, collected for each fold when it is used as testing data |
Pred |
a vector of characters, the results from prediction, collected for each fold when it is used as testing data |
Summ |
a list, the output of function |
Shuxia Guo, Thomas Bocklitz, Juergen Popp
S. Guo, T. Bocklitz, et al., Common mistakes in cross-validating classification models. Analytical methods 2017, 9 (30): 4410-4417.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | data(DATA)
### perform batch-wise cross-validation using the function fnPcaLda
RES3 <- crossValidation(data=DATA$spec
,label=DATA$labels
,batch=DATA$batch
,method=fnPcaLda
,pred=predPcaLda
,folds=NULL
,nBatch=0
,nFold=3
,verbose=TRUE
,seed=NULL
### parameters to be passed to fnPcaLda
,center=TRUE
,scale=FALSE
)
### perform a two-layer cross-validation using the function tunePcaLda,
### where the number of principal components used for LDA is optimized
### (i.e., internal cross-validaiton).
RES4 <- crossValidation(data=DATA$spec
,label=DATA$labels
,batch=DATA$batch
,method=tunePcaLda
,pred=predPcaLda
,folds=NULL
,nBatch=0
,nFold=3
,verbose=TRUE
,seed=NULL
### parameters to be passed to tunePcaLda
,nPC=2:4
,cv=c('CV', 'BV')[2]
,nPart=0
,optMerit=c('Accuracy', 'Sensitivity')[2]
,center=TRUE
,scale=FALSE
)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.