# Cross model validation

### Description

Performs cross model validation (2CV) with different PLS analyses.

### Usage

1 2 3 4 5 | ```
MVA.cmv(X, Y, repet = 10, kout = 7, kinn = 6, ncomp = 8, scale = TRUE,
model = c("PLSR", "CPPLS", "PLS-DA", "PPLS-DA", "PLS-DA/LDA", "PLS-DA/QDA",
"PPLS-DA/LDA", "PPLS-DA/QDA"), crit.inn = c("RMSEP", "Q2", "NMC"),
Q2diff = 0.05, lower = 0.5, upper = 0.5, Y.add = NULL, weights = rep(1, nrow(X)),
set.prior = FALSE, crit.DA = c("plug-in", "predictive", "debiased"), ...)
``` |

### Arguments

`X` |
a data frame of independent variables. |

`Y` |
the dependent variable(s): numeric vector, data frame of quantitative variables or factor. |

`repet` |
an integer giving the number of times the whole 2CV procedure has to be repeated. |

`kout` |
an integer giving the number of folds in the outer loop (can be re-set internally if needed). |

`kinn` |
an integer giving the number of folds in the inner loop (can be re-set internally if needed). Cannot be |

`ncomp` |
an integer giving the maximal number of components to be tested in the inner loop (can be re-set depending on the size of the train sets). |

`scale` |
logical indicating if data should be scaled (see Details). |

`model` |
the model to be fitted (see Details). |

`crit.inn` |
the criterion to be used to choose the number of components in the inner loop. Root Mean Square Error of Prediction ( |

`Q2diff` |
the threshold to be used if the number of components is chosen according to Q2. The next component is added only if it makes the Q2 increase more than |

`lower` |
a vector of lower limits for power optimisation in CPPLS or PPLS-DA (see |

`upper` |
a vector of upper limits for power optimisation in CPPLS or PPLS-DA (see |

`Y.add` |
a vector or matrix of additional responses containing relevant information about the observations, in CPPLS or PPLS-DA (see |

`weights` |
a vector of individual weights for the observations, in CPPLS or PPLS-DA (see |

`set.prior` |
only used when a second analysis (LDA or QDA) is performed. If |

`crit.DA` |
criterion used to predict class membership when a second analysis (LDA or QDA) is used. See |

`...` |
other arguments to pass to |

### Details

Cross model validation is detailed is Szymanska et al (2012). Some more details about how this function works:

- when a discriminant analysis is used (`"PLS-DA"`

, `"PPLS-DA"`

, `"PLS-DA/LDA"`

, `"PLS-DA/QDA"`

, `"PPLS-DA/LDA"`

or `"PPLS-DA/QDA"`

), the training sets (test set itself in the inner loop, test+validation sets in the outer loop) are generated in respect to the relative proportions of the levels of `Y`

in the original data set (see `splitf`

).

- `"PLS-DA"`

is considered as PLS2 on a dummy-coded response. For a PLS-DA based on the CPPLS algorithm, use `"PPLS-DA"`

with `lower`

and `upper`

limits of the power parameters set to `0.5`

.

- if a second analysis is used (`"PLS-DA/LDA"`

, `"PLS-DA/QDA"`

, `"PPLS-DA/LDA"`

or `"PPLS-DA/QDA"`

), a LDA or QDA is built on scores of the first analysis (PLS-DA or PPLS-DA) also in the inner loop. The number of misclassifications, based on this second analysis, is used to choose the number of components.

If `scale = TRUE`

, the scaling is done as this:

- for each step of the outer loop (i.e. `kout`

steps), the rest set is pre-processed by centering and unit-variance scaling. Means and standard deviations of variables in the rest set are then used to scale the test set.

- for each step of the inner loop (i.e. `kinn`

steps), the training set is pre-processed by centering and unit-variance scaling. Means and standard deviations of variables in the training set are then used to scale the validation set.

### Value

`model` |
model used. |

`type` |
type of model used. |

`repet` |
number of times the whole 2CV procedure was repeated. |

`kout` |
number of folds in the outer loop. |

`kinn` |
number of folds in the inner loop. |

`crit.inn` |
criterion used to choose the number of components in the inner loop. |

`crit.DA` |
criterion used to classify individuals of the test and validation sets. |

`Q2diff` |
threshold used if the number of components is chosen according to Q2. |

`groups` |
levels of |

`models.list` |
list of of models generated ( |

`models1.list` |
list of of (P)PLS-DA models generated ( |

`models2.list` |
list of of LDA/QDA models generated ( |

`RMSEP` |
RMSEP computed from the models used in the outer loops ( |

`Q2` |
Q2 computed from the models used in the outer loops ( |

`NMC` |
NMC computed from the models used in the outer loops ( |

### Author(s)

Maxime Herv<e9> <mx.herve@gmail.com>

### References

Szymanska E, Saccenti E, Smilde AK and Westerhuis J (2012) Double-check: validation of diagnostic statistics for PLS-DA models in metabolomics studies. Metabolomics (2012) 8:S3-S16.

### See Also

`predict.MVA.cmv`

, `mvr`

, `lda`

, `qda`

### Examples

1 2 3 4 5 6 7 8 9 10 |