PLSDA_batch: Partial Least Squares Discriminant Analysis for Batch Effect...

View source: R/plsda_batch.R

PLSDA_batchR Documentation

Partial Least Squares Discriminant Analysis for Batch Effect Correction

Description

This function removes batch variation from the input data given the batch grouping information and the number of associated components with PLSDA-batch. For sparse PLSDA-batch, the number of variables to keep for each treatment related component is needed (keepX.trt). For weighted PLSDA-batch, the balance should be set to FALSE, and it cannot deal with the nested batch x treatment design.

Usage

PLSDA_batch(
    X,
    Y.trt = NULL,
    Y.bat,
    ncomp.trt = 2,
    ncomp.bat = 2,
    keepX.trt = rep(ncol(X), ncomp.trt),
    keepX.bat = rep(ncol(X), ncomp.bat),
    max.iter = 500,
    tol = 1e-06,
    near.zero.var = TRUE,
    balance = TRUE
)

Arguments

X

A numeric matrix as an explanatory matrix. NAs are not allowed.

Y.trt

A factor or a class vector for the treatment grouping information (categorical outcome variable). Without the input of Y.trt, treatment variation cannot be preserved before correcting for batch effects.

Y.bat

A factor or a class vector for the batch grouping information (categorical outcome variable).

ncomp.trt

Integer, the number of treatment associated dimensions to include in the model.

ncomp.bat

Integer, the number of batch associated dimensions to include in the model.

keepX.trt

A numeric vector of length ncomp.trt, the number of variables to keep in X-loadings. By default all variables are kept in the model. A valid input of keepX.trt extends PLSDA-batch to a sparse version.

keepX.bat

A numeric vector of length ncomp.bat, the number of variables to keep in X-loadings. By default all variables are kept in the model. We usually use the default setting.

max.iter

Integer, the maximum number of iterations.

tol

Numeric, convergence stopping value.

near.zero.var

Logical, should be set to TRUE in particular for data with many zero values. Setting this argument to FALSE (when appropriate) will speed up the computations. Default value is TRUE.

balance

Logical, should be set to TRUE, if the batch x treatment design is balanced (or complete). Setting this argument to FALSE extends PLSDA-batch to weighted PLSDA-batch. wPLSDA-batch can deal with highly unbalanced designs but not the nested design. Default value is TRUE.

Value

PLSDA_batch returns a list that contains the following components:

X

The original explanatory matrix X.

X.nobatch

The batch corrected matrix with the same dimension as the input matrix.

X.notrt

The matrix from which treatment variation is removed.

Y

The original outcome variables Y.trt and Y.bat.

latent_var.trt

The treatment associated latent components calculated with corresponding latent dimensions.

latent_var.bat

The batch associated latent components calculated with corresponding latent dimensions.

loadings.trt

The estimated treatment associated latent dimensions.

loadings.bat

The estimated batch associated latent dimensions.

tol

The tolerance used in the iterative algorithm, convergence stopping value.

max.iter

The maximum number of iterations.

iter.trt

Number of iterations of the algorthm for each treatment associated component.

iter.bat

Number of iterations of the algorthm for each batch associated component.

explained_variance.trt

The amount of data variance explained per treatment associated component.

explained_variance.bat

The amount of data variance explained per batch associated component.

weight

The sample weights, all 1 for a balanced batch x treatment design.

Author(s)

Yiwen Wang, Kim-Anh LĂȘ Cao

References

\insertRef

wang2020managingPLSDAbatch

\insertRef

wang2020multivariatePLSDAbatch

See Also

linear_regres and percentile_norm as the other methods for batch effect management.

Examples

## First example
## PLSDA-batch
library(TreeSummarizedExperiment) # for functions assays(),rowData()
data('AD_data')
X <- assays(AD_data$EgData)$Clr_value # centered log ratio transformed data
Y.trt <- rowData(AD_data$EgData)$Y.trt # treatment information
Y.bat <- rowData(AD_data$EgData)$Y.bat # batch information
names(Y.bat) <- names(Y.trt) <- rownames(AD_data$EgData)
ad_plsda_batch <- PLSDA_batch(X, Y.trt, Y.bat, ncomp.trt = 1, ncomp.bat = 5)
ad_X.corrected <- ad_plsda_batch$X.nobatch # batch corrected data

## Second example
## sparse PLSDA-batch
ad_splsda_batch <- PLSDA_batch(X, Y.trt, Y.bat, ncomp.trt = 1,
                                keepX.trt = 30, ncomp.bat = 5)


EvaYiwenWang/PLSDAbatch documentation built on Sept. 25, 2024, 8:54 p.m.