Description Usage Arguments Details Value Author(s) References See Also Examples
Computes Mfold or LeaveOneOut CrossValidation scores on a userinput
grid to determine optimal values for the sparsity parameters in spls
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 
X 
numeric matrix of predictors. 
Y 

ncomp 
the number of components to include in the model. 
test.keepX 
numeric vector for the different number of variables to test from the X data set 
already.tested.X 
Optional, if 
validation 
character. What kind of (internal) validation to use,
matching one of 
folds 
the folds in the Mfold crossvalidation. See Details. 
measure 
One of 
scale 
boleean. If scale = TRUE, each block is standardized to zero means and unit variances (default: TRUE) 
progressBar 
by default set to 
tol 
Convergence stopping value. 
max.iter 
integer, the maximum number of iterations. 
near.zero.var 
boolean, see the internal 
nrepeat 
Number of times the CrossValidation process is repeated. 
multilevel 
Design matrix for multilevel analysis (for repeated measurements) that indicates the repeated measures on each individual, i.e. the individuals ID. See Details. 
light.output 
if set to FALSE, the prediction/classification of each
sample for each of 
cpus 
Number of cpus to use. If greater than 1, the code is run in parallel. 
This tuning function should be used to tune the parameters in the
spls
function (number of components and the number of variables in
keepX
to select).
If validation = "loo"
, leaveoneout crossvalidation is performed.
By default folds
is set to the number of unique individuals. If
validation = "Mfold"
, Mfold crossvalidation is performed. How many
folds to generate is selected by specifying the number of folds in
folds
.
Four measures of accuracy are available: Mean Absolute Error (MAE
),
Mean Square Error(MSE
), Bias
and R2
. Both MAE and MSE
average the model prediction error. MAE measures the average magnitude of
the errors without considering their direction. It is the average over the
fold test samples of the absolute differences between the Y predictions and
the actual Y observations. The MSE also measures the average magnitude of
the error. Since the errors are squared before they are averaged, the MSE
tends to give a relatively high weight to large errors. The Bias is the
average of the differences between the Y predictions and the actual Y
observations and the R2 is the correlation between the predictions and the
observations. All those measures are averaged across all Y variables in the
PLS2 case. We are still improving the function to tune an sPLS2 model,
contact us for more details and examples.
The function outputs the optimal number of components that achieve the best performance based on the chosen measure of accuracy. The assessment is datadriven and similar to the process detailed in (Rohart et al., 2016), where onesided ttests assess whether there is a gain in performance when adding a component to the model.
See also ?perf
for more details.
A list that contains:
error.rate 
returns the prediction error
for each 
choice.keepX 
returns the number of variables selected (optimal keepX) on each component. 
choice.ncomp 
returns the optimal number of components for the model
fitted with 
measure 
reminds which criterion was used 
predict 
Prediction
values for each sample, each 
KimAnh Lê Cao, Benoit Gautier, Francois Bartolo, Florian Rohart, Al J Abadi
mixOmics article:
Rohart F, Gautier B, Singh A, Lê Cao KA. mixOmics: an R package for 'omics feature selection and multiple data integration. PLoS Comput Biol 13(11): e1005752
PLS and PLS citeria for PLS regression: Tenenhaus, M. (1998). La regression PLS: theorie et pratique. Paris: Editions Technic.
Chavent, Marie and Patouille, Brigitte (2003). Calcul des coefficients de regression et du PRESS en regression PLS1. Modulad n, 30 111. (this is the formula we use to calculate the Q2 in perf.pls and perf.spls)
Mevik, B.H., Cederkvist, H. R. (2004). Mean Squared Error of Prediction (MSEP) Estimates for Principal Component Regression (PCR) and Partial Least Squares Regression (PLSR). Journal of Chemometrics 18(9), 422429.
sparse PLS regression mode:
Lê Cao, K. A., Rossouw D., RobertGranie, C. and Besse, P. (2008). A sparse PLS for variable selection when integrating Omics data. Statistical Applications in Genetics and Molecular Biology 7, article 35.
Onesided ttests (suppl material):
Rohart F, Mason EA, Matigian N, Mosbergen R, Korn O, Chen T, Butcher S, Patel J, Atkinson K, Khosrotehrani K, Fisk NM, Lê Cao KA&, Wells CA& (2016). A Molecular Classification of Human Mesenchymal Stromal Cells. PeerJ 4:e1845.
splsda
, predict.splsda
and
http://www.mixOmics.org for more details.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15  data(liver.toxicity)
X < liver.toxicity$gene
Y < liver.toxicity$clinic
## Not run:
tune = tune.spls(X, Y, ncomp=4, test.keepX = c(5,10,15), measure = "MSE",
nrepeat=3, progressBar = TRUE)
tune$choice.ncomp
tune$choice.keepX
# plot the results
plot(tune)
## End(Not run)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.