| MVA.cmv | R Documentation | 
Performs cross model validation (2CV) with different PLS analyses.
MVA.cmv(X, Y, repet = 10, kout = 7, kinn = 6, ncomp = 8, scale = TRUE,
  model = c("PLSR", "CPPLS", "PLS-DA", "PPLS-DA", "PLS-DA/LDA", "PLS-DA/QDA",
  "PPLS-DA/LDA", "PPLS-DA/QDA"), crit.inn = c("RMSEP", "Q2", "NMC"),
  Q2diff = 0.05, lower = 0.5, upper = 0.5, Y.add = NULL, weights = rep(1, nrow(X)),
  set.prior = FALSE, crit.DA = c("plug-in", "predictive", "debiased"), ...)
| X | a data frame of independent variables. | 
| Y | the dependent variable(s): numeric vector, data frame of quantitative variables or factor. | 
| repet | an integer giving the number of times the whole 2CV procedure has to be repeated. | 
| kout | an integer giving the number of folds in the outer loop (can be re-set internally if needed). | 
| kinn | an integer giving the number of folds in the inner loop (can be re-set internally if needed). Cannot be  | 
| ncomp | an integer giving the maximal number of components to be tested in the inner loop (can be re-set depending on the size of the train sets). | 
| scale | logical indicating if data should be scaled (see Details). | 
| model | the model to be fitted (see Details). | 
| crit.inn | the criterion to be used to choose the number of components in the inner loop. Root Mean Square Error of Prediction ( | 
| Q2diff | the threshold to be used if the number of components is chosen according to Q2. The next component is added only if it makes the Q2 increase more than  | 
| lower | a vector of lower limits for power optimisation in CPPLS or PPLS-DA (see  | 
| upper | a vector of upper limits for power optimisation in CPPLS or PPLS-DA (see  | 
| Y.add | a vector or matrix of additional responses containing relevant information about the observations, in CPPLS or PPLS-DA (see  | 
| weights | a vector of individual weights for the observations, in CPPLS or PPLS-DA (see  | 
| set.prior | only used when a second analysis (LDA or QDA) is performed. If  | 
| crit.DA | criterion used to predict class membership when a second analysis (LDA or QDA) is used. See  | 
| ... | other arguments to pass to  | 
Cross model validation is detailed is Szymanska et al (2012). Some more details about how this function works:
- when a discriminant analysis is used ("PLS-DA", "PPLS-DA", "PLS-DA/LDA", "PLS-DA/QDA", "PPLS-DA/LDA" or "PPLS-DA/QDA"), the training sets (test set itself in the inner loop, test+validation sets in the outer loop) are generated in respect to the relative proportions of the levels of Y in the original data set (see splitf).
- "PLS-DA" is considered as PLS2 on a dummy-coded response. For a PLS-DA based on the CPPLS algorithm, use "PPLS-DA" with lower and upper limits of the power parameters set to 0.5.
- if a second analysis is used ("PLS-DA/LDA", "PLS-DA/QDA", "PPLS-DA/LDA" or "PPLS-DA/QDA"), a LDA or QDA is built on scores of the first analysis (PLS-DA or PPLS-DA) also in the inner loop. The classification error rate, based on this second analysis, is used to choose the number of components.
If scale = TRUE, the scaling is done as this:
- for each step of the outer loop (i.e. kout steps), the rest set is pre-processed by centering and unit-variance scaling. Means and standard deviations of variables in the rest set are then used to scale the test set.
- for each step of the inner loop (i.e. kinn steps), the training set is pre-processed by centering and unit-variance scaling. Means and standard deviations of variables in the training set are then used to scale the validation set.
| model | model used. | 
| type | type of model used. | 
| repet | number of times the whole 2CV procedure was repeated. | 
| kout | number of folds in the outer loop. | 
| kinn | number of folds in the inner loop. | 
| crit.inn | criterion used to choose the number of components in the inner loop. | 
| crit.DA | criterion used to classify individuals of the test and validation sets. | 
| Q2diff | threshold used if the number of components is chosen according to Q2. | 
| groups | levels of  | 
| models.list | list of of models generated ( | 
| models1.list | list of of (P)PLS-DA models generated ( | 
| models2.list | list of of LDA/QDA models generated ( | 
| RMSEP | RMSEP computed from the models used in the outer loops ( | 
| Q2 | Q2 computed from the models used in the outer loops ( | 
| NMC | Classification error rate computed from the models used in the outer loops ( | 
| confusion | Confusion matrices computed from the models used in the outer loops ( | 
| pred.prob | Probability of each individual of being of each level of  | 
Maxime HERVE <maxime.herve@univ-rennes1.fr>
Szymanska E, Saccenti E, Smilde AK and Westerhuis J (2012) Double-check: validation of diagnostic statistics for PLS-DA models in metabolomics studies. Metabolomics (2012) 8:S3-S16.
predict.MVA.cmv, mvr, lda, qda
require(pls)
require(MASS)
# PLSR
data(yarn)
## Not run: MVA.cmv(yarn$NIR,yarn$density,model="PLSR")
# PPLS-DA coupled to LDA
data(mayonnaise)
## Not run: MVA.cmv(mayonnaise$NIR,factor(mayonnaise$oil.type),model="PPLS-DA/LDA",crit.inn="NMC")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.