MVA.cmv | R Documentation |
Performs cross model validation (2CV) with different PLS analyses.
MVA.cmv(X, Y, repet = 10, kout = 7, kinn = 6, ncomp = 8, scale = TRUE,
model = c("PLSR", "CPPLS", "PLS-DA", "PPLS-DA", "PLS-DA/LDA", "PLS-DA/QDA",
"PPLS-DA/LDA", "PPLS-DA/QDA"), crit.inn = c("RMSEP", "Q2", "NMC"),
Q2diff = 0.05, lower = 0.5, upper = 0.5, Y.add = NULL, weights = rep(1, nrow(X)),
set.prior = FALSE, crit.DA = c("plug-in", "predictive", "debiased"), ...)
X |
a data frame of independent variables. |
Y |
the dependent variable(s): numeric vector, data frame of quantitative variables or factor. |
repet |
an integer giving the number of times the whole 2CV procedure has to be repeated. |
kout |
an integer giving the number of folds in the outer loop (can be re-set internally if needed). |
kinn |
an integer giving the number of folds in the inner loop (can be re-set internally if needed). Cannot be |
ncomp |
an integer giving the maximal number of components to be tested in the inner loop (can be re-set depending on the size of the train sets). |
scale |
logical indicating if data should be scaled (see Details). |
model |
the model to be fitted (see Details). |
crit.inn |
the criterion to be used to choose the number of components in the inner loop. Root Mean Square Error of Prediction ( |
Q2diff |
the threshold to be used if the number of components is chosen according to Q2. The next component is added only if it makes the Q2 increase more than |
lower |
a vector of lower limits for power optimisation in CPPLS or PPLS-DA (see |
upper |
a vector of upper limits for power optimisation in CPPLS or PPLS-DA (see |
Y.add |
a vector or matrix of additional responses containing relevant information about the observations, in CPPLS or PPLS-DA (see |
weights |
a vector of individual weights for the observations, in CPPLS or PPLS-DA (see |
set.prior |
only used when a second analysis (LDA or QDA) is performed. If |
crit.DA |
criterion used to predict class membership when a second analysis (LDA or QDA) is used. See |
... |
other arguments to pass to |
Cross model validation is detailed is Szymanska et al (2012). Some more details about how this function works:
- when a discriminant analysis is used ("PLS-DA"
, "PPLS-DA"
, "PLS-DA/LDA"
, "PLS-DA/QDA"
, "PPLS-DA/LDA"
or "PPLS-DA/QDA"
), the training sets (test set itself in the inner loop, test+validation sets in the outer loop) are generated in respect to the relative proportions of the levels of Y
in the original data set (see splitf
).
- "PLS-DA"
is considered as PLS2 on a dummy-coded response. For a PLS-DA based on the CPPLS algorithm, use "PPLS-DA"
with lower
and upper
limits of the power parameters set to 0.5
.
- if a second analysis is used ("PLS-DA/LDA"
, "PLS-DA/QDA"
, "PPLS-DA/LDA"
or "PPLS-DA/QDA"
), a LDA or QDA is built on scores of the first analysis (PLS-DA or PPLS-DA) also in the inner loop. The classification error rate, based on this second analysis, is used to choose the number of components.
If scale = TRUE
, the scaling is done as this:
- for each step of the outer loop (i.e. kout
steps), the rest set is pre-processed by centering and unit-variance scaling. Means and standard deviations of variables in the rest set are then used to scale the test set.
- for each step of the inner loop (i.e. kinn
steps), the training set is pre-processed by centering and unit-variance scaling. Means and standard deviations of variables in the training set are then used to scale the validation set.
model |
model used. |
type |
type of model used. |
repet |
number of times the whole 2CV procedure was repeated. |
kout |
number of folds in the outer loop. |
kinn |
number of folds in the inner loop. |
crit.inn |
criterion used to choose the number of components in the inner loop. |
crit.DA |
criterion used to classify individuals of the test and validation sets. |
Q2diff |
threshold used if the number of components is chosen according to Q2. |
groups |
levels of |
models.list |
list of of models generated ( |
models1.list |
list of of (P)PLS-DA models generated ( |
models2.list |
list of of LDA/QDA models generated ( |
RMSEP |
RMSEP computed from the models used in the outer loops ( |
Q2 |
Q2 computed from the models used in the outer loops ( |
NMC |
Classification error rate computed from the models used in the outer loops ( |
confusion |
Confusion matrices computed from the models used in the outer loops ( |
pred.prob |
Probability of each individual of being of each level of |
Maxime HERVE <maxime.herve@univ-rennes1.fr>
Szymanska E, Saccenti E, Smilde AK and Westerhuis J (2012) Double-check: validation of diagnostic statistics for PLS-DA models in metabolomics studies. Metabolomics (2012) 8:S3-S16.
predict.MVA.cmv
, mvr
, lda
, qda
require(pls)
require(MASS)
# PLSR
data(yarn)
## Not run: MVA.cmv(yarn$NIR,yarn$density,model="PLSR")
# PPLS-DA coupled to LDA
data(mayonnaise)
## Not run: MVA.cmv(mayonnaise$NIR,factor(mayonnaise$oil.type),model="PPLS-DA/LDA",crit.inn="NMC")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.