sva: Estimate surrogate variables with an iterative algorithm from...
In Sage-Bionetworks/snm: Supervised Normalization of Microarrays

Description Usage Arguments Details Value Author(s) References See Also Examples

Estimate surrogate variables are estimated using either the iteratively re-weighted surrogate variable analysis algorithm of Leek and Storey (2008) or the two-step algorithm of Leek and Storey (2007).

1	sva(dat, bio.var, adj.var=NULL, n.sv=NULL, num.iter=NULL, diagnose=TRUE, verbose=TRUE)

`dat`	Either an m genes by n arrays matrix of expression data or an object of class edge obtained from a previous sva function call.
`bio.var`	A model matrix (see `model.matrix`) or data frame with n rows of the biological variables. If NULL, then all probes are treated as "null" in the algorithm.
`adj.var`	A model matrix (see `model.matrix`) or data frame with n rows of the probe-specific adjustment variables. If NULL, a model with an intercept term is used.
`n.sv`	Rank of dependence kernel. If equal to NULL (default) this value is estimated from the data.
`num.iter`	The number of iterations of the algorithm to perform.
`diagnose`	A flag telling the software whether or not to produce diagnostic output in the form of consecutive plots. TRUE produces the plot.
`verbose`	A flag telling the software whether or not to display a report after each iteration. TRUE produces the output.

Surrogate variable estimates are formed based on unpublished modifications of the algorithms originally published in Leek and Storey (2007,2008). Surrogate variables can be included in a significance analysis to reduce dependence and confounding.

An object of class edge with the following values: CURRENTLY THIS IS WRONG. THE OUTPUT NEEDS TO BE CLEANED UP A BIT.

`sv`	A n by n.sv matrix where each column is a distinct surrogate variable (the main quantity of interest)
`pprob.gam`	A vector with the posterior probability estimates that each row is affected by dependence.
`pprob.b`	A vector with the posterior probabiliity estimates that each row is affected by the variables in mod, but not in mod0.
`n.sv`	The number of suggorate variables estimated.

Brig Mecham brig.mecham@sagebase.org, John Storey jstorey@princeton.edu

Leek JT and Storey JD. (2008) A general framework for multiple testing dependence. Proceedings of the National Academy of Sciences, 105: 18718-18723. http://www.biostat.jhsph.edu/~jleek/publications.html

Leek JT and Storey JD. (2007) Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genetics, 3: e161. http://www.biostat.jhsph.edu/~jleek/publications.html

snm,edge, qvalue

  ## Not run: 

seed <- 1234 
sim.d1 <- sim.preProcessed(seed=seed,0.5,0.3,0.1)

# Update and fit model 
sva.obj <- sva(sim.d1$raw.data, sim.d1$bio.var, NULL, n.sv=5,num.iter=5,diagnose=TRUE)
ps <- f.pvalue(sim.d1$raw.dat, model.matrix(~-1+sim.d1$bio.var+sva.obj$svd[[5]]$v), model.matrix(~sva.obj$svd[[5]]$v))
ks.test(ps[sim.d1$true.nulls],"punif")$p

# Update model and fit again
sva.obj2 <- sva(sva.obj,num.iter=5)
ps <- f.pvalue(sim.d1$raw.dat, model.matrix(~-1+sim.d1$bio.var+sva.obj2$svd[[10]]$v), model.matrix(~sva.obj2$svd[[10]]$v))
ks.test(ps[sim.d1$true.nulls],"punif")$p

# Now include one of the adjustment variables and fit
sva.obj <- sva(sim.d1$raw.data, sim.d1$bio.var, NULL, n.sv=5,num.iter=5,diagnose=TRUE)
ps <- f.pvalue(sim.d1$raw.dat, model.matrix(~-1+sim.d1$bio.var+sim.d1$adj.var[,6] + sva.obj$svd[[5]]$v), model.matrix(~sim.d1$adj.var[,6] + sva.obj$svd[[5]]$v))
ks.test(ps[sim.d1$true.nulls],"punif")$p


 
## End(Not run)