tune | R Documentation |
This function uses repeated cross-validation to tune hyperparameters such as the number of features to select and possibly the number of components to extract.
tune(
method = c("pls", "spls", "plsda", "splsda", "block.plsda", "block.splsda",
"mint.plsda", "mint.splsda", "rcc", "pca", "spca"),
X,
Y,
test.keepX = c(5, 10, 15),
test.keepY = NULL,
already.tested.X,
already.tested.Y,
ncomp,
V,
center = TRUE,
grid1 = seq(0.001, 1, length = 5),
grid2 = seq(0.001, 1, length = 5),
mode = c("regression", "canonical", "invariant", "classic"),
indY,
weighted = TRUE,
design,
study,
tol = 1e-09,
scale = TRUE,
logratio = c("none", "CLR"),
near.zero.var = FALSE,
max.iter = 100,
multilevel = NULL,
validation = "Mfold",
nrepeat = 1,
folds = 10,
signif.threshold = 0.01,
dist = "max.dist",
measure = ifelse(method == "spls", "cor", "BER"),
auc = FALSE,
seed = NULL,
BPPARAM = SerialParam(),
progressBar = FALSE,
light.output = TRUE
)
method |
This parameter is used to pass all other argument to the
suitable function. |
X |
numeric matrix of predictors. |
Y |
Either a factor or a class vector for the discrete outcome, or a numeric vector or matrix of continuous responses (for multi-response models). |
test.keepX |
numeric vector for the different number of variables to
test from the |
test.keepY |
If |
already.tested.X |
Optional, if |
already.tested.Y |
if |
ncomp |
the number of components to include in the model. |
V |
Matrix used in the logratio transformation id provided (for tune.pca) |
center |
a logical value indicating whether the variables should be
shifted to be zero centered. Alternately, a vector of length equal the
number of columns of |
grid1 , grid2 |
vector numeric defining the values of |
mode |
character string. What type of algorithm to use, (partially)
matching one of |
indY |
To supply if |
weighted |
tune using either the performance of the Majority vote or the Weighted vote. |
design |
numeric matrix of size (number of blocks in X) x (number of
blocks in X) with values between 0 and 1. Each value indicates the strenght
of the relationship to be modelled between two blocks; a value of 0
indicates no relationship, 1 is the maximum value. Alternatively, one of
c('null', 'full') indicating a disconnected or fully connected design,
respecively, or a numeric between 0 and 1 which will designate all
off-diagonal elements of a fully connected design (see examples in
|
study |
grouping factor indicating which samples are from the same study |
tol |
Numeric, convergence tolerance criteria. |
scale |
a logical value indicating whether the variables should be
scaled to have unit variance before the analysis takes place. The default is
|
logratio |
one of ('none','CLR'). Default to 'none' |
near.zero.var |
Logical, see the internal |
max.iter |
Integer, the maximum number of iterations. |
multilevel |
Design matrix for multilevel analysis (for repeated measurements) that indicates the repeated measures on each individual, i.e. the individuals ID. See Details. |
validation |
character. What kind of (internal) validation to use,
matching one of |
nrepeat |
Number of times the Cross-Validation process is repeated. |
folds |
the folds in the Mfold cross-validation. See Details. |
signif.threshold |
numeric between 0 and 1 indicating the significance threshold required for improvement in error rate of the components. Default to 0.01. |
dist |
distance metric to estimate the
classification error rate, should be a subset of |
measure |
The tuning measure used for different methods. See details. |
auc |
if |
seed |
set a number here if you want the function to give reproducible outputs. Not recommended during exploratory analysis. Note if RNGseed is set in 'BPPARAM', this will be overwritten by 'seed'. |
BPPARAM |
A BiocParallelParam object indicating the type of parallelisation. See examples. |
progressBar |
by default set to |
light.output |
if set to FALSE, the prediction/classification of each
sample for each of |
See the help file corresponding to the corresponding method
, e.g.
tune.splsda
for further details. Note that only the arguments used in
the tune function corresponding to method
are passed on.
More details about the prediction distances in ?predict
and the
supplemental material of the mixOmics article (Rohart et al. 2017). More
details about the PLS modes are in ?pls
.
Depending on the type of analysis performed and the input arguments, a list that may contain:
error.rate |
returns the prediction error for each |
choice.keepX |
returns the number of variables selected (optimal keepX) on each component. |
choice.ncomp |
For supervised models; returns the optimal number of components for the model for each prediction distance using one-sided t-tests that test for a significant difference in the mean error rate (gain in prediction) when components are added to the model. See more details in Rohart et al 2017 Suppl. For more than one block, an optimal ncomp is returned for each prediction framework. |
error.rate.class |
returns the error rate for each level of |
predict |
Prediction values for each sample, each |
class |
Predicted class for each sample, each |
auc |
AUC mean and standard deviation if the number of categories in
|
cor.value |
only if multilevel analysis with 2 factors: correlation between latent variables. |
Florian Rohart, Francois Bartolo, Kim-Anh Lê Cao, Al J Abadi
Singh A., Shannon C., Gautier B., Rohart F., Vacher M., Tebbutt S. and Lê Cao K.A. (2019), DIABLO: an integrative approach for identifying key molecular drivers from multi-omics assays, Bioinformatics, Volume 35, Issue 17, 1 September 2019, Pages 3055–3062.
mixOmics article:
Rohart F, Gautier B, Singh A, Lê Cao K-A. mixOmics: an R package for 'omics feature selection and multiple data integration. PLoS Comput Biol 13(11): e1005752
MINT:
Rohart F, Eslami A, Matigian, N, Bougeard S, Lê Cao K-A (2017). MINT: A multivariate integrative approach to identify a reproducible biomarker signature across multiple experiments and platforms. BMC Bioinformatics 18:128.
PLS and PLS citeria for PLS regression: Tenenhaus, M. (1998). La regression PLS: theorie et pratique. Paris: Editions Technic.
Chavent, Marie and Patouille, Brigitte (2003). Calcul des coefficients de regression et du PRESS en regression PLS1. Modulad n, 30 1-11. (this is the formula we use to calculate the Q2 in perf.pls and perf.spls)
Mevik, B.-H., Cederkvist, H. R. (2004). Mean Squared Error of Prediction (MSEP) Estimates for Principal Component Regression (PCR) and Partial Least Squares Regression (PLSR). Journal of Chemometrics 18(9), 422-429.
sparse PLS regression mode:
Lê Cao, K. A., Rossouw D., Robert-Granie, C. and Besse, P. (2008). A sparse PLS for variable selection when integrating Omics data. Statistical Applications in Genetics and Molecular Biology 7, article 35.
One-sided t-tests (suppl material):
Rohart F, Mason EA, Matigian N, Mosbergen R, Korn O, Chen T, Butcher S, Patel J, Atkinson K, Khosrotehrani K, Fisk NM, Lê Cao K-A&, Wells CA& (2016). A Molecular Classification of Human Mesenchymal Stromal Cells. PeerJ 4:e1845.
tune.rcc
, tune.mint.splsda
,
tune.pca
, tune.splsda
,
tune.splslevel
and http://www.mixOmics.org for more details.
## sPLS-DA
data(breast.tumors)
X <- breast.tumors$gene.exp
Y <- as.factor(breast.tumors$sample$treatment)
tune= tune(method = "splsda", X, Y, ncomp=1, nrepeat=10, logratio="none",
test.keepX = c(5, 10, 15), folds=10, dist="max.dist", progressBar = TRUE)
plot(tune)
## Not run:
## mint.splsda
data(stemcells)
data = stemcells$gene
type.id = stemcells$celltype
exp = stemcells$study
out = tune(method="mint.splsda", X=data,Y=type.id, ncomp=2, study=exp, test.keepX=seq(1,10,1))
out$choice.keepX
plot(out)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.