tune.spls: Tuning functions for sPLS and PLS functions
In mixOmics: Omics Data Integration Project

Description Usage Arguments Value folds nrepeat measure-pls t-test-process more Author(s) References See Also Examples

This function uses repeated cross-validation to tune hyperparameters such as the number of features to select and possibly the number of components to extract.

tune.spls(
  X,
  Y,
  test.keepX = NULL,
  test.keepY = NULL,
  ncomp,
  validation = c("Mfold", "loo"),
  nrepeat = 1,
  folds,
  mode = c("regression", "canonical", "classic"),
  measure = c("cor", "RSS"),
  BPPARAM = SerialParam(),
  progressBar = FALSE,
  limQ2 = 0.0975,
  ...
)

`X`	numeric matrix of predictors with the rows as individual observations. missing values (`NA`s) are allowed.
`Y`	numeric matrix of response(s) with the rows as individual observations matching `X`. missing values (`NA`s) are allowed.
`test.keepX`	numeric vector for the different number of variables to test from the X data set.
`test.keepY`	numeric vector for the different number of variables to test from the Y data set. Default to `ncol(Y)`.
`ncomp`	Positive Integer. The number of components to include in the model. Default to 2.
`validation`	character. What kind of (internal) validation to use, matching one of `"Mfold"` or `"loo"` (Leave-One-out). Default is `"Mfold"`.
`nrepeat`	Positive integer. Number of times the Cross-Validation process should be repeated. `nrepeat > 2` is required for robust tuning. See details.
`folds`	Positive Integer, The folds in the Mfold cross-validation.
`mode`	Character string indicating the type of PLS algorithm to use. One of `"regression"`, `"canonical"`, `"invariant"` or `"classic"`. See Details.
`measure`	One of c('cor', 'RSS') indicating the tuning measure. See details.
`BPPARAM`	A BiocParallelParam object indicating the type of parallelisation. See examples in `?tune.spca`.
`progressBar`	Logical. If `TRUE` a progress bar is shown as the computation completes. Default to `FALSE`.
`limQ2`	Q2 threshold for recommending optimal `ncomp`.
`...`	Optional parameters passed to `spls`

A list that contains:

`cor.pred`	The correlation of predicted vs actual components from X (t) and Y (u) for each component
`RSS.pred`	The Residual Sum of Squares of predicted vs actual components from X (t) and Y (u) for each component
`choice.keepX`	returns the number of variables selected for X (optimal keepX) on each component.
`choice.keepY`	returns the number of variables selected for Y (optimal keepY) on each component.
`choice.ncomp`	returns the optimal number of components for the model fitted with `$choice.keepX` and `$choice.keepY`
`call`	The functioncal call including the parameteres used.

During a cross-validation (CV), data are randomly split into M subgroups (folds). M-1 subgroups are then used to train submodels which would be used to predict prediction accuracy statistics for the held-out (test) data. All subgroups are used as the test data exactly once. If validation = "loo", leave-one-out CV is used where each group consists of exactly one sample and hence M == N where N is the number of samples.

The cross-validation process is repeated nrepeat times and the accuracy measures are averaged across repeats. If validation = "loo", the process does not need to be repeated as there is only one way to split N samples into N groups and hence nrepeat is forced to be 1.

Two measures of accuracy are available: Correlation (cor), as well as the Residual Sum of Squares (RSS). For cor, the parameters which would maximise the correlation between the predicted and the actual components are chosen. The RSS measure tries to predict the held-out data by matrix reconstruction and seeks to minimise the error between actual and predicted values. For mode='canonical', The X matrix is used to calculate the RSS, while for others modes the Y matrix is used. This measure gives more weight to any large errors and is thus sensitive to outliers. It also intrinsically selects less number of features on the Y block compared to measure='cor'.

The optimisation process is data-driven and similar to the process detailed in (Rohart et al., 2016), where one-sided t-tests assess whether there is a gain in performance when incrementing the number of features or components in the model. However, it will assess all the provided grid through pair-wise comparisons as the performance criteria do not always change linearly with respect to the added number of features or components.

See also ?perf for more details.

Kim-Anh Lê Cao, Al J Abadi, Benoit Gautier, Francois Bartolo, Florian Rohart,

mixOmics article:

Rohart F, Gautier B, Singh A, Lê Cao K-A. mixOmics: an R package for 'omics feature selection and multiple data integration. PLoS Comput Biol 13(11): e1005752

PLS and PLS citeria for PLS regression: Tenenhaus, M. (1998). La regression PLS: theorie et pratique. Paris: Editions Technic.

Chavent, Marie and Patouille, Brigitte (2003). Calcul des coefficients de regression et du PRESS en regression PLS1. Modulad n, 30 1-11. (this is the formula we use to calculate the Q2 in perf.pls and perf.spls)

Mevik, B.-H., Cederkvist, H. R. (2004). Mean Squared Error of Prediction (MSEP) Estimates for Principal Component Regression (PCR) and Partial Least Squares Regression (PLSR). Journal of Chemometrics 18(9), 422-429.

sparse PLS regression mode:

Lê Cao, K. A., Rossouw D., Robert-Granie, C. and Besse, P. (2008). A sparse PLS for variable selection when integrating Omics data. Statistical Applications in Genetics and Molecular Biology 7, article 35.

One-sided t-tests (suppl material):

Rohart F, Mason EA, Matigian N, Mosbergen R, Korn O, Chen T, Butcher S, Patel J, Atkinson K, Khosrotehrani K, Fisk NM, Lê Cao K-A&, Wells CA& (2016). A Molecular Classification of Human Mesenchymal Stromal Cells. PeerJ 4:e1845.

splsda, predict.splsda and http://www.mixOmics.org for more details.

## Not run: 
data(liver.toxicity)
X <- liver.toxicity$gene
Y <- liver.toxicity$clinic
set.seed(42)
tune.res = tune.spls( X, Y, ncomp = 3,
                  test.keepX = c(5, 10, 15),
                  test.keepY = c(3, 6, 8), measure = "cor",
                  folds = 5, nrepeat = 3, progressBar = TRUE)
tune.res$choice.ncomp
tune.res$choice.keepX
tune.res$choice.keepY
# plot the results
plot(tune.res)

## End(Not run)

mixOmics documentation built on April 15, 2021, 6:01 p.m.

mixOmics index

Package overview mixOmics

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

mixOmics
Omics Data Integration Project

tune.spls: Tuning functions for sPLS and PLS functions
In mixOmics: Omics Data Integration Project

Description

Usage

Arguments

Value

folds

nrepeat

measure-pls

t-test-process

more

Author(s)

References

See Also

Examples

Related to tune.spls in mixOmics...

R Package Documentation

Browse R Packages

We want your feedback!

mixOmics Omics Data Integration Project

tune.spls: Tuning functions for sPLS and PLS functions In mixOmics: Omics Data Integration Project

Description

Usage

Arguments

Value

folds

nrepeat

measure-pls

t-test-process

more

Author(s)

References

See Also

Examples

Related to tune.spls in mixOmics...

R Package Documentation

Browse R Packages

We want your feedback!

mixOmics
Omics Data Integration Project

tune.spls: Tuning functions for sPLS and PLS functions
In mixOmics: Omics Data Integration Project