plsda: Partial Least Squares Discriminant Analysis (PLS-DA).

View source: R/plsda.R

plsdaR Documentation

Partial Least Squares Discriminant Analysis (PLS-DA).

Description

Function to perform standard Partial Least Squares regression to classify samples.

Usage

plsda(
  X,
  Y,
  ncomp = 2,
  scale = TRUE,
  tol = 1e-06,
  max.iter = 100,
  near.zero.var = FALSE,
  logratio = c("none", "CLR"),
  multilevel = NULL,
  all.outputs = TRUE
)

Arguments

X

numeric matrix of predictors with the rows as individual observations. missing values (NAs) are allowed.

Y

a factor or a class vector for the discrete outcome.

ncomp

Positive Integer. The number of components to include in the model. Default to 2.

scale

Logical. If scale = TRUE, each block is standardized to zero means and unit variances (default: TRUE)

tol

Positive numeric used as convergence criteria/tolerance during the iterative process. Default to 1e-06.

max.iter

Integer, the maximum number of iterations. Default to 100.

near.zero.var

Logical, see the internal nearZeroVar function (should be set to TRUE in particular for data with many zero values). Setting this argument to FALSE (when appropriate) will speed up the computations. Default value is FALSE.

logratio

Character, one of ('none','CLR') specifies the log ratio transformation to deal with compositional values that may arise from specific normalisation in sequencing data. Default to 'none'. See ?logratio.transfo for details.

multilevel

sample information for multilevel decomposition for repeated measurements. A numeric matrix or data frame indicating the repeated measures on each individual, i.e. the individuals ID. See examples in ?splsda.

all.outputs

Logical. Computation can be faster when some specific (and non-essential) outputs are not calculated. Default = TRUE.

Details

plsda function fit PLS models with 1,...,ncomp components to the factor or class vector Y. The appropriate indicator matrix is created.

Logratio transformation and multilevel analysis are performed sequentially as internal pre-processing step, through logratio.transfo and withinVariation respectively. Logratio can only be applied if the data do not contain any 0 value (for count data, we thus advise the normalise raw data with a 1 offset).

The type of deflation used is 'regression' for discriminant algorithms. i.e. no deflation is performed on Y.

Value

plsda returns an object of class "plsda", a list that contains the following components:

X

the centered and standardized original predictor matrix.

Y

the centered and standardized indicator response vector or matrix.

ind.mat

the indicator matrix.

ncomp

the number of components included in the model.

variates

list containing the X and Y variates.

loadings

list containing the estimated loadings associated to each component/variate. The loading weights multiplied with the deflated (residual) matrix gives the variate.

loadings.stars

list containing the estimated loadings associated to each component/variate. The loading weights are projected so that when multiplied with the original matrix we obtain the variate.

names

list containing the names to be used for individuals and variables.

nzv

list containing the zero- or near-zero predictors information.

tol

the tolerance used in the iterative algorithm, used for subsequent S3 methods

max.iter

the maximum number of iterations, used for subsequent S3 methods

iter

Number of iterations of the algorithm for each component

prop_expl_var

The proportion of the variance explained by each variate / component divided by the total variance in the data (after removing the possible missing values) using the definition of 'redundancy'. Note that contrary to PCA, this amount may not decrease in the following components as the aim of the method is not to maximise the variance, but the covariance between data sets (including the dummy matrix representation of the outcome variable in case of the supervised approaches).

mat.c

matrix of coefficients from the regression of X / residual matrices X on the X-variates, to be used internally by predict.

defl.matrix

residual matrices X for each dimension.

Author(s)

Ignacio González, Kim-Anh Lê Cao, Florian Rohart, Al J Abadi

References

On PLSDA: Barker M and Rayens W (2003). Partial least squares for discrimination. Journal of Chemometrics 17(3), 166-173. Perez-Enciso, M. and Tenenhaus, M. (2003). Prediction of clinical outcome with microarray data: a partial least squares discriminant analysis (PLS-DA) approach. Human Genetics 112, 581-592. Nguyen, D. V. and Rocke, D. M. (2002). Tumor classification by partial least squares using microarray gene expression data. Bioinformatics 18, 39-50. On log ratio transformation: Filzmoser, P., Hron, K., Reimann, C.: Principal component analysis for compositional data with outliers. Environmetrics 20(6), 621-632 (2009) Lê Cao K.-A., Costello ME, Lakis VA, Bartolo, F,Chua XY, Brazeilles R, Rondeau P. MixMC: Multivariate insights into Microbial Communities. PLoS ONE, 11(8): e0160169 (2016). On multilevel decomposition: Westerhuis, J.A., van Velzen, E.J., Hoefsloot, H.C., Smilde, A.K.: Multivariate paired data analysis: multilevel plsda versus oplsda. Metabolomics 6(1), 119-128 (2010) Liquet, B., Lê Cao K.-A., Hocini, H., Thiebaut, R.: A novel approach for biomarker selection and the integration of repeated measures experiments from two assays. BMC bioinformatics 13(1), 325 (2012)

See Also

splsda, summary, plotIndiv, plotVar, predict, perf, mint.block.plsda, block.plsda and http://mixOmics.org for more details.

Examples

## First example
data(breast.tumors)
X <- breast.tumors$gene.exp
Y <- breast.tumors$sample$treatment

plsda.breast <- plsda(X, Y, ncomp = 2)
plotIndiv(plsda.breast, ind.names = TRUE, ellipse = TRUE, legend = TRUE)


## Not run: 
## Second example
data(liver.toxicity)
X <- liver.toxicity$gene
Y <- liver.toxicity$treatment[, 4]

plsda.liver <- plsda(X, Y, ncomp = 2)
plotIndiv(plsda.liver, ind.names = Y, ellipse = TRUE, legend =TRUE)

## End(Not run)

mixOmicsTeam/mixOmics documentation built on Dec. 3, 2024, 11:15 p.m.