Description Usage Arguments Details Value Author(s) References Examples
sva has functionality to estimate and remove artifacts from high dimensional data
the sva
function can be used to estimate artifacts from microarray data
the svaseq
function can be used to estimate artifacts from count-based
RNA-sequencing (and other sequencing) data. The ComBat
function can be
used to remove known batch effecs from microarray data. The fsva
function
can be used to remove batch effects for prediction problems.
This function is the implementation of the iteratively re-weighted least squares
approach for estimating surrogate variables. As a by product, this function
produces estimates of the probability of being an empirical control. See the function
empirical.controls
for a direct estimate of the empirical controls.
1 2 3 4 5 6 7 8 9 10 11 |
dat |
The transformed data matrix with the variables in rows and samples in columns |
mod |
The model matrix being used to fit the data |
mod0 |
The null model being compared when fitting the data |
n.sv |
The number of surogate variables to estimate |
controls |
A vector of probabilities (between 0 and 1, inclusive) that each gene is a control. A value of 1 means the gene is certainly a control and a value of 0 means the gene is certainly not a control. |
method |
For empirical estimation of control probes use "irw". If control probes are known use "supervised" |
vfilter |
You may choose to filter to the vfilter most variable rows before performing the analysis. vfilter must be NULL if method is "supervised" |
B |
The number of iterations of the irwsva algorithm to perform |
numSVmethod |
If n.sv is NULL, sva will attempt to estimate the number of needed surrogate variables. This should not be adapted by the user unless they are an expert. |
A vignette is available by typing browseVignettes("sva")
in the R prompt.
sv The estimated surrogate variables, one in each column
pprob.gam: A vector of the posterior probabilities each gene is affected by heterogeneity
pprob.b A vector of the posterior probabilities each gene is affected by mod
n.sv The number of significant surrogate variables
Jeffrey T. Leek, W. Evan Johnson, Hilary S. Parker, Andrew E. Jaffe, John D. Storey, Yuqing Zhang
For the package: Leek JT, Johnson WE, Parker HS, Jaffe AE, and Storey JD. (2012) The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics DOI:10.1093/bioinformatics/bts034
For sva: Leek JT and Storey JD. (2008) A general framework for multiple testing dependence. Proceedings of the National Academy of Sciences , 105: 18718-18723.
For sva: Leek JT and Storey JD. (2007) Capturing heterogeneity in gene expression studies by ‘Surrogate Variable Analysis’. PLoS Genetics, 3: e161.
For Combat: Johnson WE, Li C, Rabinovic A (2007) Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics, 8 (1), 118-127
For svaseq: Leek JT (2014) svaseq: removing batch and other artifacts from count-based sequencing data. bioRxiv doi: TBD
For fsva: Parker HS, Bravo HC, Leek JT (2013) Removing batch effects for prediction problems with frozen surrogate variable analysis arXiv:1301.3947
For psva: Parker HS, Leek JT, Favorov AV, Considine M, Xia X, Chavan S, Chung CH, Fertig EJ (2014) Preserving biological heterogeneity with a permuted surrogate variable analysis for genomics batch correction Bioinformatics doi: 10.1093/bioinformatics/btu375
1 2 3 4 5 6 7 8 9 10 11 | library(bladderbatch)
data(bladderdata)
dat <- bladderEset[1:5000,]
pheno = pData(dat)
edata = exprs(dat)
mod = model.matrix(~as.factor(cancer), data=pheno)
mod0 = model.matrix(~1,data=pheno)
n.sv = num.sv(edata,mod,method="leek")
svobj = sva(edata,mod,mod0,n.sv=n.sv)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.