svaseq: A function for estimating surrogate variables for count based...

Description Usage Arguments Value Examples

View source: R/svaseq.R

Description

This function is the implementation of the iteratively re-weighted least squares approach for estimating surrogate variables. As a by product, this function produces estimates of the probability of being an empirical control. This function first applies a moderated log transform as described in Leek 2014 before calculating the surrogate variables. See the function empirical.controls for a direct estimate of the empirical controls.

Usage

1
2
3
svaseq(dat, mod, mod0 = NULL, n.sv = NULL, controls = NULL,
  method = c("irw", "two-step", "supervised"), vfilter = NULL, B = 5,
  numSVmethod = "be", constant = 1)

Arguments

dat

The transformed data matrix with the variables in rows and samples in columns

mod

The model matrix being used to fit the data

mod0

The null model being compared when fitting the data

n.sv

The number of surogate variables to estimate

controls

A vector of probabilities (between 0 and 1, inclusive) that each gene is a control. A value of 1 means the gene is certainly a control and a value of 0 means the gene is certainly not a control.

method

For empirical estimation of control probes use "irw". If control probes are known use "supervised"

vfilter

You may choose to filter to the vfilter most variable rows before performing the analysis. vfilter must be NULL if method is "supervised"

B

The number of iterations of the irwsva algorithm to perform

numSVmethod

If n.sv is NULL, sva will attempt to estimate the number of needed surrogate variables. This should not be adapted by the user unless they are an expert.

constant

The function takes log(dat + constant) before performing sva. By default constant = 1, all values of dat + constant should be positive.

Value

sv The estimated surrogate variables, one in each column

pprob.gam: A vector of the posterior probabilities each gene is affected by heterogeneity

pprob.b A vector of the posterior probabilities each gene is affected by mod

n.sv The number of significant surrogate variables

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
library(zebrafishRNASeq)
data(zfGenes)
filter = apply(zfGenes, 1, function(x) length(x[x>5])>=2)
filtered = zfGenes[filter,]
genes = rownames(filtered)[grep("^ENS", rownames(filtered))]
controls = grepl("^ERCC", rownames(filtered))
group = as.factor(rep(c("Ctl", "Trt"), each=3))
dat0 = as.matrix(filtered)

mod1 = model.matrix(~group)
mod0 = cbind(mod1[,1])
svseq = svaseq(dat0,mod1,mod0,n.sv=1)$sv
plot(svseq,pch=19,col="blue")

sva documentation built on May 2, 2018, 2:54 a.m.