PLSDA_batch: Partial Least Squares Discriminant Analysis for Batch Effect...
In EvaYiwenWang/PLSDAbatch: PLSDA-batch

PLSDA_batch

R Documentation

Partial Least Squares Discriminant Analysis for Batch Effect Correction

Description

This function removes batch variation from the input data given the batch grouping information and the number of associated components with PLSDA-batch. For sparse PLSDA-batch, the number of variables to keep for each treatment related component is needed (keepX.trt). For weighted PLSDA-batch, the balance should be set to FALSE, and it cannot deal with the nested batch x treatment design.

Usage

PLSDA_batch(
    X,
    Y.trt = NULL,
    Y.bat,
    ncomp.trt = 2,
    ncomp.bat = 2,
    keepX.trt = rep(ncol(X), ncomp.trt),
    keepX.bat = rep(ncol(X), ncomp.bat),
    max.iter = 500,
    tol = 1e-06,
    near.zero.var = TRUE,
    balance = TRUE
)

Arguments

`X`	A numeric matrix as an explanatory matrix. `NA`s are not allowed.
`Y.trt`	A factor or a class vector for the treatment grouping information (categorical outcome variable). Without the input of `Y.trt`, treatment variation cannot be preserved before correcting for batch effects.
`Y.bat`	A factor or a class vector for the batch grouping information (categorical outcome variable).
`ncomp.trt`	Integer, the number of treatment associated dimensions to include in the model.
`ncomp.bat`	Integer, the number of batch associated dimensions to include in the model.
`keepX.trt`	A numeric vector of length `ncomp.trt`, the number of variables to keep in `X`-loadings. By default all variables are kept in the model. A valid input of `keepX.trt` extends `PLSDA-batch` to a sparse version.
`keepX.bat`	A numeric vector of length `ncomp.bat`, the number of variables to keep in `X`-loadings. By default all variables are kept in the model. We usually use the default setting.
`max.iter`	Integer, the maximum number of iterations.
`tol`	Numeric, convergence stopping value.
`near.zero.var`	Logical, should be set to `TRUE` in particular for data with many zero values. Setting this argument to `FALSE` (when appropriate) will speed up the computations. Default value is `TRUE`.
`balance`	Logical, should be set to `TRUE`, if the `batch x treatment design` is balanced (or complete). Setting this argument to `FALSE` extends `PLSDA-batch` to `weighted PLSDA-batch`. `wPLSDA-batch` can deal with highly unbalanced designs but not the nested design. Default value is `TRUE`.

Value

PLSDA_batch returns a list that contains the following components:

`X`	The original explanatory matrix `X`.
`X.nobatch`	The batch corrected matrix with the same dimension as the input matrix.
`X.notrt`	The matrix from which treatment variation is removed.
`Y`	The original outcome variables `Y.trt` and `Y.bat`.
`latent_var.trt`	The treatment associated latent components calculated with corresponding latent dimensions.
`latent_var.bat`	The batch associated latent components calculated with corresponding latent dimensions.
`loadings.trt`	The estimated treatment associated latent dimensions.
`loadings.bat`	The estimated batch associated latent dimensions.
`tol`	The tolerance used in the iterative algorithm, convergence stopping value.
`max.iter`	The maximum number of iterations.
`iter.trt`	Number of iterations of the algorthm for each treatment associated component.
`iter.bat`	Number of iterations of the algorthm for each batch associated component.
`explained_variance.trt`	The amount of data variance explained per treatment associated component.
`explained_variance.bat`	The amount of data variance explained per batch associated component.
`weight`	The sample weights, all `1` for a balanced `batch x treatment design`.

Author(s)

Yiwen Wang, Kim-Anh Lê Cao

References

\insertRef

wang2020managingPLSDAbatch

\insertRef

wang2020multivariatePLSDAbatch

Examples

## First example
## PLSDA-batch
library(TreeSummarizedExperiment) # for functions assays(),rowData()
data('AD_data')
X <- assays(AD_data$EgData)$Clr_value # centered log ratio transformed data
Y.trt <- rowData(AD_data$EgData)$Y.trt # treatment information
Y.bat <- rowData(AD_data$EgData)$Y.bat # batch information
names(Y.bat) <- names(Y.trt) <- rownames(AD_data$EgData)
ad_plsda_batch <- PLSDA_batch(X, Y.trt, Y.bat, ncomp.trt = 1, ncomp.bat = 5)
ad_X.corrected <- ad_plsda_batch$X.nobatch # batch corrected data

## Second example
## sparse PLSDA-batch
ad_splsda_batch <- PLSDA_batch(X, Y.trt, Y.bat, ncomp.trt = 1,
                                keepX.trt = 30, ncomp.bat = 5)

EvaYiwenWang/PLSDAbatch documentation built on Sept. 25, 2024, 8:54 p.m.