remove_batcheffect: Removing Batch Effect from Expression Sets

View source: R/remove_batcheffect.R

remove_batcheffectR Documentation

Removing Batch Effect from Expression Sets

Description

Removes batch effects from expression datasets using sva::ComBat (for microarray/TPM data) or sva::ComBat_seq (for RNA-seq count data). Generates PCA plots to visualize data before and after correction.

Usage

remove_batcheffect(
  eset1,
  eset2,
  eset3 = NULL,
  id_type = "ensembl",
  data_type = c("array", "count", "tpm"),
  cols = "normal",
  palette = "jama",
  log2 = TRUE,
  check_eset = TRUE,
  adjust_eset = TRUE,
  repel = FALSE,
  path = NULL
)

Arguments

eset1

First expression set (matrix or data frame with genes as rows).

eset2

Second expression set.

eset3

Optional third expression set. Use 'NULL' if not available.

id_type

Type of gene ID in expression sets (e.g., '"ensembl"', '"symbol"'). Required for count data normalization.

data_type

Type of data: '"array"', '"count"', or '"tpm"'. Default is '"array"'.

cols

Color scale for PCA plot. Default is '"normal"'.

palette

Color palette for PCA plot. Default is '"jama"'.

log2

Whether to perform log2 transformation. Default is 'TRUE'. Ignored for count data.

check_eset

Whether to check expression sets for errors. Default is 'TRUE'.

adjust_eset

Whether to adjust expression sets by removing problematic features. Default is 'TRUE'.

repel

Whether to add repelling labels to PCA plot. Default is 'FALSE'.

path

Directory where results should be saved. Default is 'NULL' (display only).

Value

Expression matrix after batch correction.

Author(s)

Dongqiang Zeng

References

Zhang Y, et al. ComBat-seq: batch effect adjustment for RNA-seq count data. NAR Genomics and Bioinformatics. 2020;2(3):lqaa078. doi:10.1093/nargab/lqaa078

Leek JT, et al. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics. 2012;28(6):882-883.

Examples

# Simulate data
set.seed(123)
sim_eset1 <- matrix(rnorm(100 * 5, mean = 10, sd = 2), 100, 5)
sim_eset2 <- matrix(rnorm(100 * 5, mean = 12, sd = 2), 100, 5)
rownames(sim_eset1) <- rownames(sim_eset2) <- paste0("Gene", 1:100)
colnames(sim_eset1) <- paste0("S1_", 1:5)
colnames(sim_eset2) <- paste0("S2_", 1:5)

# Run batch correction
if (requireNamespace("sva", quietly = TRUE) && requireNamespace("BiocParallel", quietly = TRUE)) {
  eset_corrected <- remove_batcheffect(sim_eset1, sim_eset2, data_type = "tpm")
  if (!is.null(eset_corrected)) head(eset_corrected)
}

IOBR documentation built on May 30, 2026, 5:07 p.m.