sva: Estimate surrogate variables with an iterative algorithm from...

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

Estimate surrogate variables are estimated using either the iteratively re-weighted surrogate variable analysis algorithm of Leek and Storey (2008) or the two-step algorithm of Leek and Storey (2007).

Usage

1
sva(dat, bio.var, adj.var=NULL, n.sv=NULL, num.iter=NULL, diagnose=TRUE, verbose=TRUE) 

Arguments

dat

Either an m genes by n arrays matrix of expression data or an object of class edge obtained from a previous sva function call.

bio.var

A model matrix (see model.matrix) or data frame with n rows of the biological variables. If NULL, then all probes are treated as "null" in the algorithm.

adj.var

A model matrix (see model.matrix) or data frame with n rows of the probe-specific adjustment variables. If NULL, a model with an intercept term is used.

n.sv

Rank of dependence kernel. If equal to NULL (default) this value is estimated from the data.

num.iter

The number of iterations of the algorithm to perform.

diagnose

A flag telling the software whether or not to produce diagnostic output in the form of consecutive plots. TRUE produces the plot.

verbose

A flag telling the software whether or not to display a report after each iteration. TRUE produces the output.

Details

Surrogate variable estimates are formed based on unpublished modifications of the algorithms originally published in Leek and Storey (2007,2008). Surrogate variables can be included in a significance analysis to reduce dependence and confounding.

Value

An object of class edge with the following values: CURRENTLY THIS IS WRONG. THE OUTPUT NEEDS TO BE CLEANED UP A BIT.

sv

A n by n.sv matrix where each column is a distinct surrogate variable (the main quantity of interest)

pprob.gam

A vector with the posterior probability estimates that each row is affected by dependence.

pprob.b

A vector with the posterior probabiliity estimates that each row is affected by the variables in mod, but not in mod0.

n.sv

The number of suggorate variables estimated.

Author(s)

Brig Mecham brig.mecham@sagebase.org, John Storey jstorey@princeton.edu

References

Leek JT and Storey JD. (2008) A general framework for multiple testing dependence. Proceedings of the National Academy of Sciences, 105: 18718-18723. http://www.biostat.jhsph.edu/~jleek/publications.html

Leek JT and Storey JD. (2007) Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genetics, 3: e161. http://www.biostat.jhsph.edu/~jleek/publications.html

See Also

snm,edge, qvalue

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
  ## Not run: 

seed <- 1234 
sim.d1 <- sim.preProcessed(seed=seed,0.5,0.3,0.1)

# Update and fit model 
sva.obj <- sva(sim.d1$raw.data, sim.d1$bio.var, NULL, n.sv=5,num.iter=5,diagnose=TRUE)
ps <- f.pvalue(sim.d1$raw.dat, model.matrix(~-1+sim.d1$bio.var+sva.obj$svd[[5]]$v), model.matrix(~sva.obj$svd[[5]]$v))
ks.test(ps[sim.d1$true.nulls],"punif")$p

# Update model and fit again
sva.obj2 <- sva(sva.obj,num.iter=5)
ps <- f.pvalue(sim.d1$raw.dat, model.matrix(~-1+sim.d1$bio.var+sva.obj2$svd[[10]]$v), model.matrix(~sva.obj2$svd[[10]]$v))
ks.test(ps[sim.d1$true.nulls],"punif")$p

# Now include one of the adjustment variables and fit
sva.obj <- sva(sim.d1$raw.data, sim.d1$bio.var, NULL, n.sv=5,num.iter=5,diagnose=TRUE)
ps <- f.pvalue(sim.d1$raw.dat, model.matrix(~-1+sim.d1$bio.var+sim.d1$adj.var[,6] + sva.obj$svd[[5]]$v), model.matrix(~sim.d1$adj.var[,6] + sva.obj$svd[[5]]$v))
ks.test(ps[sim.d1$true.nulls],"punif")$p


 
## End(Not run)

Sage-Bionetworks/snm documentation built on May 9, 2019, 12:14 p.m.