# svaseq: A function for estimating surrogate variables for count based... In sva: Surrogate Variable Analysis

## Description

This function is the implementation of the iteratively re-weighted least squares approach for estimating surrogate variables. As a by product, this function produces estimates of the probability of being an empirical control. This function first applies a moderated log transform as described in Leek 2014 before calculating the surrogate variables. See the function `empirical.controls` for a direct estimate of the empirical controls.

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12``` ```svaseq( dat, mod, mod0 = NULL, n.sv = NULL, controls = NULL, method = c("irw", "two-step", "supervised"), vfilter = NULL, B = 5, numSVmethod = "be", constant = 1 ) ```

## Arguments

 `dat` The transformed data matrix with the variables in rows and samples in columns `mod` The model matrix being used to fit the data `mod0` The null model being compared when fitting the data `n.sv` The number of surogate variables to estimate `controls` A vector of probabilities (between 0 and 1, inclusive) that each gene is a control. A value of 1 means the gene is certainly a control and a value of 0 means the gene is certainly not a control. `method` For empirical estimation of control probes use "irw". If control probes are known use "supervised" `vfilter` You may choose to filter to the vfilter most variable rows before performing the analysis. vfilter must be NULL if method is "supervised" `B` The number of iterations of the irwsva algorithm to perform `numSVmethod` If n.sv is NULL, sva will attempt to estimate the number of needed surrogate variables. This should not be adapted by the user unless they are an expert. `constant` The function takes log(dat + constant) before performing sva. By default constant = 1, all values of dat + constant should be positive.

## Value

sv The estimated surrogate variables, one in each column

pprob.gam: A vector of the posterior probabilities each gene is affected by heterogeneity

pprob.b A vector of the posterior probabilities each gene is affected by mod

n.sv The number of significant surrogate variables

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13``` ```library(zebrafishRNASeq) data(zfGenes) filter = apply(zfGenes, 1, function(x) length(x[x>5])>=2) filtered = zfGenes[filter,] genes = rownames(filtered)[grep("^ENS", rownames(filtered))] controls = grepl("^ERCC", rownames(filtered)) group = as.factor(rep(c("Ctl", "Trt"), each=3)) dat0 = as.matrix(filtered) mod1 = model.matrix(~group) mod0 = cbind(mod1[,1]) svseq = svaseq(dat0,mod1,mod0,n.sv=1)\$sv plot(svseq,pch=19,col="blue") ```

sva documentation built on Nov. 8, 2020, 8:16 p.m.