Description Usage Arguments Details Value Author(s) References See Also Examples
Computes Mfold or LeaveOneOut CrossValidation scores on a userinput
grid to determine optimal values for the sparsity parameters in splsda
.
1 2 3 4 5  tune.splsda(X, Y, ncomp = 1,
test.keepX = c(5, 10, 15), already.tested.X, validation = "Mfold",
folds = 10, dist = "max.dist", measure = "BER", scale = TRUE, auc = FALSE,
progressBar = TRUE, tol = 1e06,max.iter = 100, near.zero.var = FALSE,
nrepeat = 1, logratio = c('none','CLR'), multilevel = NULL, light.output = TRUE, cpus)

X 
numeric matrix of predictors. 
Y 

ncomp 
the number of components to include in the model. 
test.keepX 
numeric vector for the different number of variables to test from the X data set 
already.tested.X 
Optional, if 
validation 
character. What kind of (internal) validation to use, matching one of 
folds 
the folds in the Mfold crossvalidation. See Details. 
dist 
distance metric to use for 
measure 
Two misclassification measure are available: overall misclassification error 
scale 
boleean. If scale = TRUE, each block is standardized to zero means and unit variances (default: TRUE) 
auc 
if 
progressBar 
by default set to 
tol 
Convergence stopping value. 
max.iter 
integer, the maximum number of iterations. 
near.zero.var 
boolean, see the internal 
nrepeat 
Number of times the CrossValidation process is repeated. 
logratio 
one of ('none','CLR'). Default to 'none' 
multilevel 
Design matrix for multilevel analysis (for repeated measurements) that indicates the repeated measures on each individual, i.e. the individuals ID. See Details. 
light.output 
if set to FALSE, the prediction/classification of each sample for each of 
cpus 
Number of cpus to use when running the code in parallel. 
This tuning function should be used to tune the parameters in the splsda
function (number of components and number of variables in keepX
to select).
For a sPLSDA, Mfold or LOO crossvalidation is performed with stratified subsampling where all classes are represented in each fold.
If validation = "loo"
, leaveoneout crossvalidation is performed. By default folds
is set to the number of unique individuals.
The function outputs the optimal number of components that achieve the best performance based on the overall error rate or BER. The assessment is datadriven and similar to the process detailed in (Rohart et al., 2016), where onesided ttests assess whether there is a gain in performance when adding a component to the model. Our experience has shown that in most case, the optimal number of components is the number of categories in Y
 1, but it is worth tuning a few extra components to check (see our website and case studies for more details).
For sPLSDA multilevel onefactor analysis, Mfold or LOO crossvalidation is performed where all repeated measurements of one sample are in the same fold. Note that logratio transform and the multilevel analysis are performed internally and independently on the training and test set.
For a sPLSDA multilevel twofactor analysis, the correlation between components from the withinsubject variation of X and the cond
matrix is computed on the whole data set. The reason why we cannot obtain a crossvalidation error rate as for the splsDA onefactor analysis is because of the dififculty to decompose and predict the within matrices within each fold.
For a sPLS twofactor analysis a sPLS canonical mode is run, and the correlation between components from the withinsubject variation of X and Y is computed on the whole data set.
If validation = "Mfold"
, Mfold crossvalidation is performed.
How many folds to generate is selected by specifying the number of folds in folds
.
If auc = TRUE
and there are more than 2 categories in Y
, the Area Under the Curve is averaged using onevsall comparison. Note however that the AUC criteria may not be particularly insightful as the prediction threshold we use in sPLSDA differs from an AUC threshold (sPLSDA relies on prediction distances for predictions, see ?predict.splsda
for more details) and the supplemental material of the mixOmics article (Rohart et al. 2017).
BER is appropriate in case of an unbalanced number of samples per class as it calculates the average proportion of wrongly classified samples in each class, weighted by the number of samples in each class. BER is less biased towards majority classes during the performance assessment.
More details about the prediction distances in ?predict
and the supplemental material of the mixOmics article (Rohart et al. 2017).
Depending on the type of analysis performed, a list that contains:
error.rate 
returns the prediction error for each 
choice.keepX 
returns the number of variables selected (optimal keepX) on each component. 
choice.ncomp 
returns the optimal number of components for the model fitted with 
error.rate.class 
returns the error rate for each level of 
predict 
Prediction values for each sample, each 
class 
Predicted class for each sample, each 
auc 
AUC mean and standard deviation if the number of categories in 
cor.value 
only if multilevel analysis with 2 factors: correlation between latent variables. 
KimAnh Lê Cao, Benoit Gautier, Francois Bartolo, Florian Rohart.
mixOmics article:
Rohart F, Gautier B, Singh A, Lê Cao KA. mixOmics: an R package for 'omics feature selection and multiple data integration. PLoS Comput Biol 13(11): e1005752
splsda
, predict.splsda
and http://www.mixOmics.org for more details.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44  ## First example: analysis with sPLSDA
## Not run:
data(breast.tumors)
X = breast.tumors$gene.exp
Y = as.factor(breast.tumors$sample$treatment)
tune = tune.splsda(X, Y, ncomp = 1, nrepeat = 10, logratio = "none",
test.keepX = c(5, 10, 15), folds = 10, dist = "max.dist",
progressBar = TRUE)
# 5 components, optimising 'keepX' and 'ncomp'
tune = tune.splsda(X, Y, ncomp = 5, test.keepX = c(5, 10, 15),
folds = 10, dist = "max.dist", nrepeat = 5, progressBar = TRUE)
tune$choice.ncomp
tune$choice.keepX
plot(tune)
## End(Not run)
## only tune component 3 and 4
# keeping 5 and 10 variables on the first two components respectively
## Not run:
tune = tune.splsda(X = X,Y = Y, ncomp = 4,
already.tested.X = c(5,10),
test.keepX = seq(1,10,2), progressBar = TRUE)
## End(Not run)
## Second example: multilevel onefactor analysis with sPLSDA
## Not run:
data(vac18)
X = vac18$genes
Y = vac18$stimulation
# sample indicates the repeated measurements
design = data.frame(sample = vac18$sample)
tune = tune.splsda(X, Y = Y, ncomp = 3, nrepeat = 10, logratio = "none",
test.keepX = c(5,50,100),folds = 10, dist = "max.dist", multilevel = design)
## End(Not run)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.