Description Usage Arguments Details Value Author(s) References See Also Examples
Computes LeaveOneGroupOutCrossValidation (LOGOCV) scores on a
userinput grid to determine optimal values for the sparsity parameters in
mint.splsda
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18  tune.mint.splsda(
X,
Y,
ncomp = 1,
study,
test.keepX = c(5, 10, 15),
already.tested.X,
dist = c("max.dist", "centroids.dist", "mahalanobis.dist"),
measure = c("BER", "overall"),
auc = FALSE,
progressBar = FALSE,
scale = TRUE,
tol = 1e06,
max.iter = 100,
near.zero.var = FALSE,
light.output = TRUE,
signif.threshold = 0.01
)

X 
numeric matrix of predictors. 
Y 
Outcome. Numeric vector or matrix of responses (for multiresponse models) 
ncomp 
Number of components to include in the model (see Details). Default to 1 
study 
grouping factor indicating which samples are from the same study 
test.keepX 
numeric vector for the different number of variables to test from the X data set 
already.tested.X 
if 
dist 
only applies to an object inheriting from 
measure 
Two misclassification measure are available: overall
misclassification error 
auc 
if 
progressBar 
by default set to 
scale 
boleean. If scale = TRUE, each block is standardized to zero means and unit variances (default: TRUE) 
tol 
Convergence stopping value. 
max.iter 
integer, the maximum number of iterations. 
near.zero.var 
boolean, see the internal 
light.output 
if set to FALSE, the prediction/classification of each
sample for each of 
signif.threshold 
numeric between 0 and 1 indicating the significance threshold required for improvement in error rate of the components. Default to 0.01. 
This function performs a LeaveOneGroupOutCrossValidation (LOGOCV),
where each of study
is left out once. It returns a list of variables
of X
that were selected on each of the ncomp
components. Then,
a mint.splsda
can be performed with keepX
set as the
output choice.keepX
.
All component 1:\code{ncomp} are tuned, except the first ones for
which a already.tested.X
is provided. See examples below.
The function outputs the optimal number of components that achieve the best
performance based on the overall error rate or BER. The assessment is
datadriven and similar to the process detailed in (Rohart et al., 2016),
where onesided ttests assess whether there is a gain in performance when
adding a component to the model. Our experience has shown that in most case,
the optimal number of components is the number of categories in Y

1, but it is worth tuning a few extra components to check (see our website
and case studies for more details).
BER is appropriate in case of an unbalanced number of samples per class as it calculates the average proportion of wrongly classified samples in each class, weighted by the number of samples in each class. BER is less biased towards majority classes during the performance assessment.
More details about the prediction distances in ?predict
and the
supplemental material of the mixOmics article (Rohart et al. 2017).
The returned value is a list with components:
error.rate 
returns the prediction error for each 
choice.keepX 
returns the number of variables selected (optimal keepX) on each component. 
choice.ncomp 
returns the optimal number of
components for the model fitted with 
error.rate.class 
returns the error rate for each level of 
predict 
Prediction values for each sample, each 
class 
Predicted class for each sample, each

Florian Rohart, Al J Abadi
Rohart F, Eslami A, Matigian, N, Bougeard S, Lê Cao KA (2017). MINT: A multivariate integrative approach to identify a reproducible biomarker signature across multiple experiments and platforms. BMC Bioinformatics 18:128.
mixOmics article:
Rohart F, Gautier B, Singh A, Lê Cao KA. mixOmics: an R package for 'omics feature selection and multiple data integration. PLoS Comput Biol 13(11): e1005752
mint.splsda
and http://www.mixOmics.org for more
details.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26  data(stemcells)
data = stemcells$gene
type.id = stemcells$celltype
exp = stemcells$study
res = mint.splsda(X=data,Y=type.id,ncomp=3,keepX=c(10,5,15),study=exp)
out = tune.mint.splsda(X=data,Y=type.id,ncomp=2,near.zero.var=FALSE,
study=exp,test.keepX=seq(1,10,1))
out$choice.ncomp
out$choice.keepX
## Not run:
out = tune.mint.splsda(X=data,Y=type.id,ncomp=2,near.zero.var=FALSE,
study=exp,test.keepX=seq(1,10,1))
out$choice.keepX
## only tune component 2 and keeping 10 genes on comp1
out = tune.mint.splsda(X=data,Y=type.id,ncomp=2, study=exp,
already.tested.X = c(10),
test.keepX=seq(1,10,1))
out$choice.keepX
## End(Not run)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.