sva: Surrogate Variable Analysis

The sva package contains functions for removing batch effects and other unwanted variation in high-throughput experiment. Specifically, the sva package contains functions for the identifying and building surrogate variables for high-dimensional data sets. Surrogate variables are covariates constructed directly from high-dimensional data (like gene expression/RNA sequencing/methylation/brain imaging data) that can be used in subsequent analyses to adjust for unknown, unmodeled, or latent sources of noise. The sva package can be used to remove artifacts in three ways: (1) identifying and estimating surrogate variables for unknown sources of variation in high-throughput experiments (Leek and Storey 2007 PLoS Genetics,2008 PNAS), (2) directly removing known batch effects using ComBat (Johnson et al. 2007 Biostatistics) and (3) removing batch effects with known control probes (Leek 2014 biorXiv). Removing batch effects and using surrogate variables in differential expression analysis have been shown to reduce dependence, stabilize error rate estimates, and improve reproducibility, see (Leek and Storey 2007 PLoS Genetics, 2008 PNAS or Leek et al. 2011 Nat. Reviews Genetics).

Install the latest version of this package by entering the following in R:
AuthorJeffrey T. Leek <>, W. Evan Johnson <>, Hilary S. Parker <>, Elana J. Fertig <>, Andrew E. Jaffe <>, John D. Storey <>
Bioconductor views BatchEffect Microarray MultipleComparison Normalization Preprocessing RNASeq Sequencing StatisticalMethod
Date of publicationNone
MaintainerJeffrey T. Leek <>, John D. Storey <>, W. Evan Johnson <>

View on Bioconductor

Questions? Problems? Suggestions? or email at

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.