tune.mint.splsda | R Documentation |
Computes Leave-One-Group-Out-Cross-Validation (LOGOCV) scores on a
user-input grid to determine optimal values for the parameters in
mint.splsda
.
tune.mint.splsda(
X,
Y,
ncomp = 1,
study,
test.keepX = NULL,
already.tested.X,
scale = TRUE,
tol = 1e-06,
max.iter = 100,
near.zero.var = FALSE,
signif.threshold = 0.01,
dist = c("max.dist", "centroids.dist", "mahalanobis.dist"),
measure = c("BER", "overall"),
auc = FALSE,
progressBar = FALSE,
light.output = TRUE
)
X |
numeric matrix of predictors. |
Y |
Outcome. Numeric vector or matrix of responses (for multi-response models) |
ncomp |
Number of components to include in the model (see Details). Default to 1 |
study |
grouping factor indicating which samples are from the same study |
test.keepX |
numeric vector for the different number of variables to
test from the |
already.tested.X |
if |
scale |
Logical. If scale = TRUE, each block is standardized to zero means and unit variances (default: TRUE) |
tol |
Convergence stopping value. |
max.iter |
integer, the maximum number of iterations. |
near.zero.var |
Logical, see the internal |
signif.threshold |
numeric between 0 and 1 indicating the significance threshold required for improvement in error rate of the components. Default to 0.01. |
dist |
only applies to an object inheriting from |
measure |
Two misclassification measure are available: overall
misclassification error |
auc |
if |
progressBar |
by default set to |
light.output |
if set to FALSE, the prediction/classification of each
sample for each of |
This function performs a Leave-One-Group-Out-Cross-Validation (LOGOCV),
where each of study
is left out once.
When test.keepX
is not NULL, all component 1:\code{ncomp}
are tuned to identify number of variables to keep,
except the first ones for which a already.tested.X
is provided. See examples below.
The function outputs the optimal number of components that achieve the best
performance based on the overall error rate or BER. The assessment is
data-driven and similar to the process detailed in (Rohart et al., 2016),
where one-sided t-tests assess whether there is a gain in performance when
adding a component to the model. Our experience has shown that in most case,
the optimal number of components is the number of categories in Y
-
1, but it is worth tuning a few extra components to check (see our website
and case studies for more details).
BER is appropriate in case of an unbalanced number of samples per class as it calculates the average proportion of wrongly classified samples in each class, weighted by the number of samples in each class. BER is less biased towards majority classes during the performance assessment.
More details about the prediction distances in ?predict
and the
supplemental material of the mixOmics article (Rohart et al. 2017).
The returned value is a list with components:
error.rate |
returns the prediction error for each |
choice.keepX |
returns the number of variables selected (optimal keepX) on each component. |
choice.ncomp |
returns the optimal number of
components for the model fitted with |
error.rate.class |
returns the error rate for each level of |
predict |
Prediction values for each sample, each |
class |
Predicted class for each sample, each
|
If test.keepX = NULL
, returns:
study.specific.error |
A list that gives BER, overall error rate and error rate per class, for each study |
global.error |
A list that gives BER, overall error rate and error rate per class for all samples |
predict |
A list of length |
class |
A list which gives the
predicted class of each sample for each |
auc |
AUC values |
auc.study |
AUC values for each study in mint models |
.
Florian Rohart, Al J Abadi
Rohart F, Eslami A, Matigian, N, Bougeard S, Lê Cao K-A (2017). MINT: A multivariate integrative approach to identify a reproducible biomarker signature across multiple experiments and platforms. BMC Bioinformatics 18:128.
mixOmics article:
Rohart F, Gautier B, Singh A, Lê Cao K-A. mixOmics: an R package for 'omics feature selection and multiple data integration. PLoS Comput Biol 13(11): e1005752
mint.splsda
and http://www.mixOmics.org for more
details.
# set up data
data(stemcells)
data <- stemcells$gene
type.id <- stemcells$celltype
exp <- stemcells$study
# tune number of components
tune_res <- tune.mint.splsda(X = data,Y = type.id, ncomp=5,
near.zero.var=FALSE,
study=exp,
test.keepX = NULL)
plot(tune_res)
tune_res$choice.ncomp # 1 component
## tune number of variables to keep
tune_res <- tune.mint.splsda(X = data,Y = type.id, ncomp = 1,
near.zero.var = FALSE,
study=exp,
test.keepX=seq(1,10,1))
plot(tune_res)
tune_res$choice.keepX # 9 variables to keep on component 1
## only tune component 3 and keeping 10 genes on comp1
tune_res <- tune.mint.splsda(X = data, Y = type.id, ncomp = 2, study = exp,
already.tested.X = c(9),
test.keepX = seq(1,10,1))
plot(tune_res)
tune_res$choice.keepX # 10 variables to keep on comp2
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.